Monday, August 31, 2009

Units ontology with SPIN support published

My co-workers at TopQuadrant have just published a new OWL ontology about Quantities, Units, Dimensions and Datatypes (QUDT). This is a result of a long term, ongoing project with NASA AMES, and our friends at NASA have permitted us to publish those ontologies to encourage the wider use outside of NASA.

The QUDT ontology is very carefully designed and provides comprehensive coverage of almost every unit of measurement that is known to humankind. For example, it defines the unit Centimeter as follows:

Each unit has a stable URI, making it possible to link to it from your own domain models in a reliable way. For each unit, the ontology defines some useful metadata including abbreviation, a link to DBpedia and a categorization of units into groups, such as length units.

I think this units ontology can fill an important gap in the current Semantic Web and Linked Data efforts. Numeric data without any formalized units is pretty useless for machines, and sometimes even for humans. Currently, a unit may be mentioned somewhere hidden in a comment or not at all, but the QUDT ontology allows ontology designers to clearly specify these implicit assumptions. With explicitly modeled units, linked data can be processed and transformed in much more useful ways. For example, if a height it specified in Centimeters, then a smart linked data browser can automatically translate it into Feet for US American readers.

There are two main ways of using the units ontology: you can use the unit resources to "annotate" your properties with a dedicated property such as qud:units. The values of your property would use built-in datatypes such as xsd:double. The other alternative is to embed the unit directly into the literals. For this use case, all units have also been declared to be rdfs:Datatypes. This makes it possible, to assign units as rdfs:ranges of a property as shown below:

Here, the property height has the range unit:Centimeter. An example instance would then show up like this:


The specific height will then be stored in RDF literals such as "8380"^^unit:Centimeter. (The upcoming version 3.2 of TopBraid Composer will show the unit in parantheses behind the property name, but I didn't want to play tricks here).

Now that the units have been formalized in an ontology, new ways of working with numeric data become possible. As described earlier, the SPIN framework can be used to define new SPARQL functions which can then be used to do things like unit conversion. We have published a SPIN Library which contains some generic unit conversion functions. For example, the qudspin:convert function can be used to convert any numeric value from a source unit (here: unit:Centimeter) to a target unit (unit:Foot):


If the units are used as datatypes, then the function qudspin:convertLiteral can be used, saving one argument in the function call:


In the following example, we iterate over all instances with a :height property and display the height (in cm) as well as the converted height (in feet) using the SPIN function:


Such unit conversion tasks have been made possible by adding conversion multipliers to the QUDT ontology. SPIN functions can use this extra metadata to drive mathematical computations. The function qudspin:convert is backed by a SPARQL query as shown below:

The SPIN framework makes it possible to define such SPARQL functions (and rules and constraints) in a completely declarative way. No extra hard-coding of anything is needed. Any SPIN-aware SPARQL engine can simply look up the definition of the qudspin:convert function on the web and learn about the underlying mathematics. Likewise, there is no need for humans to worry about the calculations themselves - they can treat the SPIN function as a black box.

Updated April 2012: The ontologies have been updated since the original blog entry was published. The easiest way to play with this (for example using TopBraid Composer Free Edition) is now to add an owl:imports statement to http://qudt.org/spin/unitconversion

Friday, August 21, 2009

Ontology Mapping with SPIN Templates

The question of how to transform data from one ontology to another comes up again and again, most recently in a question on the W3C Semantic Web list. The requirement is very real: for example, assume you have a class Person (with firstName/lastName) and a class Member (with fullName), and you want to construct one Member for each Person, so that the fullName is derived by concatenating firstName + " " + lastName. So basically you want to transform some (legacy) data into a format that some other application can understand.

Ideally, there should be a reusable standard mapping ontology for this purpose, which is also executable and user-friendly in visual editing tools. I am not aware of such a standard ontology, but I know how it could be built. Clearly, the typical complexity of such mapping tasks goes beyond what is provided by modeling languages like OWL. A graph matching language like SPARQL with rich built-in functions will be better suited. SPARQL CONSTRUCT queries can be used to define such mappings, as described on this blog three years ago.

The SPARQL Inferencing Notation (SPIN) provides a framework for organizing such SPARQL CONSTRUCT queries in a way that is easy to maintain and efficient to execute. In the following example I will walk through the steps needed to create a generic mapping ontology for tasks such as the one above, using SPIN Templates. The example is intentionally held very simple. The resulting file can be downloaded here and you can use TopBraid Composer (even the Free Edition) to execute it.

Let's assume we have two ontologies, person and member with the following classes:

An example instance of the source ontology may look like the following, with values for firstName and lastName filled in:

SPARQL can be used to create a mapping so that all instances of Person become Members, with a fullName derived from firstName and lastName. We would need two CONSTRUCT queries: one that adds the rdf:type triple to make the Persons also Members, and one that concatenates the firstName and lastName values into the fullName. You could attach those CONSTRUCT queries as SPIN rules to the classes as shown below. Note that the variable ?this means "for every instance of the class Person".

This mechanism will work fine, we can press the inferences button to run the SPIN rule engine and it will create the new RDF triples:

We can see that the Person is now also a Member with a fullName:

However, the solution above requires that the person creating the mapping is familiar with SPARQL. Additionally, the transformations can not easily be reused and similar SPARQL queries need to be entered the next time a string concatenation is required.

SPIN Templates can be used to encapsulate SPARQL queries so that they can be reused and edited easily. In the screenshot below I have replaced the hard-coded SPARQL queries with two SPIN template calls, which actually do the same but in a much nicer way:

Another way of visualizing these is using TopBraid's Diagram facility:


Let's look behind the scenes. The two entries under the spin:rule property are now SPIN Template Calls. A Template Call is an instance of a SPIN Template, but with arguments filled in. Here is the definition of the first SPIN Template, the concatenation rule:

The SPIN Template above is wrapping a SPARQL CONSTRUCT query (under spin:body). Templates can take arguments (under spin:constraint), which define how the template can be invoked. The values of the arguments will be "inserted" as variable bindings into the SPARQL query. In the example above, there are three arguments (sourceProperty1, sourceProperty2 and targetProperty) which are referenced in the body query as variables ?sourceProperty1 etc. In order to use such a template, the user simply needs to select the source class, go to "Create from SPIN Template..." under spin:rule, and fill in the arguments, as shown below.

The resulting Template Call will be associated with the class Person as a spin:rule, so that the SPIN (mapping) engine will infer the same new triples. The main achievement though is that the string concatenation module has now been generalized and could be reused in other ontologies. Since SPIN Templates are represented entirely in RDF, they can be shared on the web. Creating a library of such mapping modules would be a great topic for a Master's Thesis...