Monday, December 13, 2010

The Teamwork Ontology

The newly released TopBraid Suite 3.4 introduces the Enterprise Vocabulary Net (EVN), an out-of-the-box solution for web-based development and management of interconnected controlled vocabularies. The focus of this product for now is on collaborative editing of SKOS models, and the EVN link above will lead you to screenshots showing this in action. I should write a couple of blog entries about this, but here is one about one particular aspect of the EVN system - its support for keeping track of changes (on the SKOS models) in a multi-user environment.

The requirements of EVN include the ability of teams to collaborate on edits on an enterprise vocabulary. Imagine a media company wants to build a semantic web model of concepts that are relevant in its domain. This media company would have domain experts on the "news" channel, and for this news channel, a controlled vocabulary would be needed to be able to categorize incoming news items so that they can be processed more easily down the road. A simple SKOS hierarchy would define standard identifiers (URIs) for things like Sports, and under that there would be sub-concepts such as RacquetSports, with further specializations including Tennis and Badminton.

In order to build such a controlled vocabulary, the media company would create a team of domain experts, and each group of experts would collaborate independently to fill in the various sub-trees of the overall enterprise vocabulary. In the TopBraid EVN system, this is implemented through an editing process in which different people can play different roles. Assume the sports experts want to add another bunch of categories to classify the various kinds of football, then they would open a so-called "working copy". The working copy is a logical extension of the master model, but also includes local changes that are not yet visible to the rest of the team. The sports team can experiment with arbitrary edits in their working copy sandbox. When done, they can change the status of their working copy to trigger a review process. At this stage, no further edits will be done, until a reviewer (who "owns" the master copy) had a chance to OK or reject those edits. If OK, the working copy will be retired, and all changes will be applied to the master copy and thus published to the rest of the enterprise. If rejected, the sports people may want to make additional edits.

In order to support such workflows, we have designed an RDF based framework for tracking and managing changes on RDF models. This framework, internally called "teamworks support" is based on the teamwork ontology (http://topbraid.org/teamwork), an excerpt of which is outlined in the class diagram below, together with some technical details.

The teamwork ontology represents changes (teamwork:Change) made by users (sioc:UserAccount) on governed resources. The governed resources are either a whole model (the master copy, an instance of owl:Ontology), or working copies (teamwork:Tag). If a group of users wants to add a new category of sports concepts, then it would create a teamwork:Tag and give it a label such as "Add football sports". This will become the container of any number of smaller changes, each represented as instances of teamwork:Change, which are recorded together with a time stamp and creator. Each Change points to one or more added or deleted RDF triples, where the triples are stored as reified statements of the class teamwork:Statement. A Change can be associated with a working copy (teamwork:Tag) via the property teamwork:tag. The collection of Change objects associated to a Tag form a group of edits that can be tracked in the workflow using the property teamwork:status at the teamwork:Tag. Example status values are teamwork:Uncommitted, teamwork:FrozenForReview and teamwork:Rejected.

The teamwork user model manages user accounts as well as the roles that each user can play within a working copy and the master vocabulary. This is done through the sub-properties of teamwork:role: viewer (read-only), editor (write access) and manager (write access and workflow control).

The teamwork triples above are stored in separate graphs, called the teamwork graphs, that are linked to the edited vocabularies via a file ending with .tch.*. For example, the graph example.tdb may have a companion graph stored in example.tch.tdb. As soon as TopBraid finds a .tch file with a matching name, it will put the ontology under teamwork control, which means that changes will be automatically tracked whenever someone writes the the graph. Furthermore, all graphs under teamwork control will show up on the log in page of the EVN application. The following figure illustrates this set up:
The teamwork graph will contain any metadata about the changes, i.e. any teamwork:Change objects, the reified triples, information about the working copies etc. It may easily become larger than the actual main model, because it can keep track of the whole audit trail, and may use four triples for each changed triple. When a user logs into a working copy, the system creates a logical view - a graph that does not necessarily have a physical representation, but may be populated on demand. When created, this logical view will check the teamwork database for any uncommitted changes associated with the given working copy. Those changes will then be visible to the editing user, without having to be materialized in the actual master copy database. When the user makes some edits, the changes will only be logged into the teamwork database. Changes only make it into the master database, if the manager approves them. From that moment on, they will become automatically visible to any existing logical view.

There is much more to say about this whole architecture, but I'll leave these details for later. It suffices to say that the user interfaces of EVN and TopBraid Composer will shield the user from all those details. But if you ever want to get low level access to those teamwork repositories yourself, you can easily do that: just open the .tch files in TopBraid Composer and browse the content. If you have made some edits from EVN, you can see which triples are impacted in the teamwork graph. You can use SPARQL to query the changes, e.g. to find all changes that mention a given resource, or all changes within a certain time interval. You can also use scripting languages like SPARQLMotion to perform batch operations on the teamwork repositories.

The fact that we are using RDF all the way down makes the TopBraid teamwork support a very transparent and consistent architecture for change management.