Saturday, August 26, 2006

TopBraid Composer 1.2: Sesame and Multi-User Support

The new 1.2.0 release of the ontology development platform TopBraid Composer introduces multi-user support as well as full editing and querying support for Sesame.

Sesame is a popular and highly scalable open-source RDF store from Aduna. As far as I can tell, TopQuadrant is the first vendor to provide a complete ontology editor for the Sesame platform.

Sesame support in Composer comes in two flavors: Sesame native databases and Sesame remote repositories. The Sesame remote option is particularly attractive as it allows users to directly connect to any Sesame repository via an HTTP connection over the Web. This way, multiple users can access and edit a shared repository, committing their changes similar to a CVS system.

I did some testing, connecting to servers in Holland from my home in California. I was actually surprised about the performance of the stream of triples crossing the ocean: changing resources in the editor takes about one second only. Brilliant work by the Sesame developers! Of course we need more experience to fine-tune the system, and I hope to get some user feedback for real-world test cases, but the start is more than promising.

On another level, I am once more speechless by the power of RDF. No matter what some people argue - RDF is just great. The simplicity to reduce everything into triples opens many, many doors, for example from a software developer's point of view. Once you are in the triples world, it does not really matter whether the triples come from a local file, a database, a remote server or a virtual triple store that in reality wraps a legacy database. All these triples can be queried, cached, converted, mapped, filtered and edited consistently.

And in TopBraid Composer, it's triples all the way up to the top!

Thursday, August 17, 2006

Lifting Excel into the Semantic Web

While the overall Semantic Web vision is fairly, well, visionary, we shouldn't forget that a lot of people out there use plain old office documents like Excel spreadsheet to manage their data, knowledge and processes. Especially in engineering domains, Excel is used not only to enter tabular data, but enriched with Visual Basic scripts and macros, Excel is becoming almost a general-purpose platform.

A lot of this Excel-based data could be mined into a Semantic Web compliant format, for further processing and mapping. In addition, technologies such as OWL can also help to improve and simplify the document-driven work processes.

For a customer project, we therefore implemented a bridge to import Excel files into an ontology. The ontology itself is simple, and essentially contains concepts for workbooks, sheets and cells:



Generating instances of these spreadsheet classes is simple but just the first step. Once every cell is accessible with a unique URI and location, it can be further processed. For example, it can be automatically transformed into something else using a SPARQL query. Or, it can be mapped into another ontology. Using some mapping rules, the cell values can be inserted into other OWL instance documents to serve as input for other tools, including web services. Finally, these other tools can return values, put them back into the Excel file and then a reverse generator can serialize them back to Excel. With some clever semantic mediation service in the middle, this approach can serve as a way to get a handle on complex tool interoperability.

The base building blocks for this, including a mapping engine, will be available in the coming 1.2 version of TopBraid Composer. Stay tuned.

Friday, August 11, 2006

Update: Automated Database import into RDF/OWL

A few days ago, I reported on the integration of Chris Bizer's D2RQ library into the ontology development toolkit TopBraid Composer. Encouraged by a lot of positive feedback, we have significantly improved this integration for version 1.1.6. The round-tripping between relational databases and OWL ontologies is now better automated and the database browsing performance vastly improved.

Here is how to import any relational database into your OWL/RDF model: Install and start TBC, and open the D2RQ import wizard:




Specify your DB connection settings, enter a file name and namespaces, and press finish. This will in the background launch the new mapping generator, a Java engine that analyzes the database metadata to find suitable mappings between the database tables and OWL classes, properties and individuals. When done, the wizard reports




You can now open the test file to browse your database, as if it were an OWL model - all conversions of database rows into individuals are done on the fly.

You can then also look at the class structures that have been generated from the database. In my first design, there was exactly one OWL class for each database table:




In a sense this means that TopBraid can be used as a database browser. In the current version, more intelligence is applied to convert link tables into a pair of inverse object properties:




You can now do whatever you want with this new OWL/RDF model, except for changing the database contents. You can however modify the automatically generated files if you don't like the long stupid default names, or want to create the person's URI by combining first name and last name. Based on TopBraid's global refactoring support, you can edit both the mapping file and the schema at the same time. You can also add description logics semantics to your classes or add rules to the ontology to perform more interesting tasks on your database than you could do with conventional database technology.

Finally, if you are happy with your new ontology, you can convert the whole stuff into an OWL file or RDF Schema, using the export/merge/convert wizard:




Here you can select to stream the virtual instances from the database together with the classes from the schema into a single file, etc.

We have tried this new feature on a couple of customer databases and it appears to work very smoothly and with high runtime performance. One of the D2RQ developers, Richard Cyganiak, even reports that the database performance often exceeds the speed of a Jena-based triple store! I am eager to receive more benchmarks from our customers, so that I can annoy our friends in Berlin with additional feature requests.

Since a lot of our users have some really good use cases for this feature, I decided to publish the new version within only a few days since the previous build. The ability to visually link, query and perform reasoning over existing databases from within Eclipse will hopefully make it easier to develop semantic applications, and to perform Semantic Data Mining.

Tuesday, August 08, 2006

Importing Relational Databases into RDF/OWL using D2RQ

A common request from our customers is the ability to reuse existing (legacy) databases in the context of an RDF/OWL project. The mapping of relational databases into RDF is non-trivial and since I am not at all an expert on databases, I turned to the Web to find help. From some other entry on Planet RDF, I dimly remembered that Chris Bizer has been working on this, and linked from his pages I discovered D2RQ.

D2RQ is a mapping language formalized as an RDF/OWL model, i.e. you can use an OWL editor of your choice to edit which database table is mapped into which class, whether columns are treated as datatype properties or object properites etc. I very much like approaches that come with a clean RDF schema, so that much of the program logic can be expressed in terms of dynamic declarations without having to recompile any code. The perfect thing about D2RQ is that it also comes with an implementation of the mapping engine. This engine is written in Java for Jena and Sesame, and allows programmers to treat relational databases as virtual triple stores.

Well, since TopBraid Composer is based on Jena, this approach provides us with a perfect example of component reuse: Version 1.1.5 (out today) now comes with a feature to import relational databases using D2RQ. The manual describes details, but here is a screenshot of how databases can be displayed:



The picture above (click on it to see a full screen version) displays some instances that are directly taken from a MySQL database (not a triple store)! The projects consists of an umbrella ontology that imports a schema ontology (in OWL) and the virtual D2RQ graph. The D2RQ graph is instructed by a mapping ontology, and the mapping model itself can be edited with Composer as well. Since Composer treats the D2RQ graph just like any other triple store, it is possible to run SPARQL queries or perform reasoning on it. To keep things simple, D2RQ does not support write access to the database.

It is simply amazing how easy it was to integrate this component into the ontology editor. Once more the decision to use Jena as the foundation paid off.

We sincerely acknowledge Chris Bizer and his colleagues for this work and hope to collaborate on the future evolution of this neat library. Chris kindly allowed us to distribute their GPL library together with Composer even though Composer is a commercial product.

Monday, August 07, 2006

Editing Reified Statements

Reified statements are very useful to attach information about a statement, e.g. to annotate who has created the statement, or to attach other metadata for n-ary relationships. RDF Schema defines a standard class rdf:Statement and corresponding properties to enable people to talk about statements. However, tool support for editing these statements has been poor and therefore rdf:Statement does not seem to be used very much - at least not in hand-crafted ontologies.

This afternoon I have experimented with support for editing reified relationships in TopBraid Composer. The outcome will be available in version 1.1.5 soon.

Whenever you want to do a statement about a statement, you can now reify it using the statement's drop down menu:





In the following dialog you can assign template annotations to the new rdf:Statement. In the example below, the system will automatically insert the given dc:creator and insert the current date as a value of dc:date.





Later, if a statement is reified, then the tool tip text of the icon's statement will show the properties of the rdf:Statement:




It goes without saying that you can now use all the other Composer features such as SPARQL to find statements with certain characteristics, attach TODO tags to them etc.

There is more work on this in the queue, for example to support the same approach with other types of reified relationships, and to automatically manage such reifications as discussed in a previous posting. In the meantime I'd appreciate feedback on the usefulness of this approach :)

Saturday, August 05, 2006

Ontology Editor for Oracle 10g

As of version 1.1.4, TopBraid Composer is the first ontology editor to support Oracle 10g RDF databases. While previous versions already supported Oracle via the generic Jena database back-end, we now also have a back-end using Oracle's native RDF support. This means that users can directly edit an Oracle RDF database and use Oracle-specific features in their applications.

It is great to see that a big name like Oracle has entered the RDF world. While their RDF support still needs some more work (especially with the treatment of blank nodes - see my corresponding thread in their forum), I hope that having an ontology editor for it will spark some more interest in this platform.