The Joint International Semantic Technology Conference

Nara, Japan, 2 - 4 December 2012.

Event website.

By Heather Packer, Agent and Research fellow, the University of Southampton.

Highlights

The main highlights of the conference were the advances in cultural heritage work that uses the Semantic Web, in particular the Finnish cultural heritage work presented in the opening keynote. I was able to meet several EU researchers from Leipzig in Germany (specifically, Sebastian Hellman and Jorg Unbehaun), who may be able to collaborate on future EU bids. The SparqlMap software that was presented is of interest to me, because it may enable us to convert some of our larger data sets to RDF more simply.

Conference report

The SSI funded me to attend the second Joint International Semantic Technology conference [1] which was held in Nara, Japan, a beautiful setting in the new public hall in Nara's historical park. This year I was honoured to be asked to chair the Knowledge Building session, this meant I had to keep everything to schedule and it gave me the opportunity to read in detail about the work which was presented. This conference was extremely interesting as there was a balance between the research in the west and east. It was really nice to hear about the research happening in Microsoft's Research Asia (MSRA) lab based in China, which was presented by Junichi Tsujii who talked about the integration of natural language processing, information retrieval, databases and the Semantic Web. I was particularly interested in the following papers, which provided tools and services which were useful and I am considering using them in some of my work.

Sebastian Hellman from the University of Leipzig presented work on Navigation-Induced Knowledge Engineering, which aims to retrieve a set of similar items from the web by identifying relationships between those entities [4]. Their approach uses DBPedia to expose possible relationships a user is interested in by using user-reinforcement. For example, if you wanted to generate a list of US presidents, the user can first search for "bush" and then reinforce the entries for "George W Bush" and "George H W Bush" but reject those from "Bush (Band)" and "Shrub". The system then attempts to automatically select further examples based on the properties of the accepted entries. Additional results are then shown to the user, and the system keeps iterating until the complete list has been generated. The semantics behind the selected entries can then be exported by the user. The approach is interesting because it uses rich semantic web data (from DBPedia) and therefore produces relationships that are based on this data. I would like to see the relationships between different sets of entities because there may be unexpected relationships which may strengthen, weaken or not affect the outcome of the approach. I would also like to evaluate whether their approach could discover the most important relationships, and if it could, how many iterations would it take on average to be successful.

Jörg Unbehauen from the University of Leipzig presented his work on mapping relational databases into RDF, using SparqlMap which uses triple maps to provide translation rules between database tables and RDF, using the R2RML standard [2]. Once the database's entities and relationships have been mapped to RDF, using the mapping they then select candidate maps for any given SPARQL query. This process is optimised by extracting patterns of shared variables, and reference to entities in SPARQL queries. One of the benefits of SPARQLMap is the ability to access relational data on the web, and to share access to data. This in turn reduces deployment costs, and allows a simpler method to index data from relational databases. Their approach means that the original point is queried and not a mirror, unlike other approaches which use a database dump inserted into a triplestore, which means the data can quickly become out-of-date. Conveniently, a database that has been mapped with SPARQLMap can also be used to generate RDF of that relational database periodically. They evaluated their SPARQLMap against triplestores and against D2R, which is another mapping approach that was created before the R2RML language was conceived. SPARQLMap used the BSBM SPARQL query benchmarks to compare its query execution time, and was shown to return results more quickly by several factors than alternative approaches. This makes SPARQLMap a really interesting choice for deployment of large-scale data that is already hosted in a relational database. It was noted that one reason that SPARQLMap is more efficient than D2R is that D2R performs some types of joins in its own memory, whereas SPARQLMap always creates a SQL query and simply executes it on the host database. This approach would be exceptionally useful in cases where large amounts of relational data are readily available online, and exposing them as SPARQL endpoints would enable richer relationships with existing Semantic Web resources.

Jonas Brekle from the University of Leipzig presented work to represent an online dictionary as semantic linked data. Wiktionary is a wiki-style dictionary where definitions, synonyms, homonyms, antonyms, and other language features are marked up in a similar manner to Wikipedia. Using a similar idea to DBPedia, their work is turning Wiktionary into linked data, and linking it to DBPedia. They have focused on a small number of languages, but even so, their approach has yielded a greater amount of linguistic data than existing state of the art approaches. Their focus makes sense, and they intend to integrate their approach with the Wikipedia Live stream so that it is constantly up to date. In addition, they will publish the definitions as Linked Data so they can be utilised by everyone.

Sabrina Kirrane from DERI and Storm Technologies in Galway presented a paper called "Protect Your RDF Data!" on the topic of access control and which detailed access rules based on hierarchies of user groups. They use a "bottom-up" approach where individual RDF triples and RDF classes are annotated with access control, using Annotated RDF. This allows user access to be controlled for each individual triple. In order to limit the scope of results, the approach uses an enforcement framework that re-writes SPARQL queries based on the authenticated user. This means that their store must support SPARQL with annotations, although they demonstrated a store based on SWI-Prolog. They compared their novel "bottom-up" approach to existing "top-down" approaches where access control has typically been given on a graph-level, their approach improves on access control by enabling finer grained access control at the level of individual triples. They also suggested using existing mechanisms for authentication of the user. This approach would allow companies to easily get the access control they expect with less time spent on software development than other "top-down" approaches.

Daniel Smith from the University of Southampton presented FacetOntology, an ontology which defined how to use data for faceted browsing and tools to support the creation of a faceted browser [5]. The FacetOntology is used to specify which parts of a dataset are exposed as facets in a faceted browser. In particular, FacetOntology supports the transformation of data before it is used in a faceted browser. For example, you can extract just the year from timestamps to create a "Year" facet. A user only needs to define their data facets once, and the required data fields can be extracted from the data source on-the-fly. They also presented a tool called the Data Picker which provides a mockup of a user interface using real data so that the user can easily customise their faceted browser before they use either mSpace Maker or FacetOntology Exhibit to create their browser. They then went on to show examples of generated faceted browsers using musicbrainz and BBC's iPlayer.

The JIST conference brought together a large number of researchers with work that is increasingly focused on getting data into the hands of users. From the challenges of opening up a relational database as RDF, and understanding how to protect your data when it is open, through to choosing a faceted browsing framework to allow users to explore it. The work represents the state of the art in these fields, and it will be interesting to follow their developments in the near future.

Links

[1] JIST2012 Conference: http://www.ei.sanken.osaka-u.ac.jp/jist2012/
[2] SPARQLMap: http://aksw.org/Projects/SparqlMap.html
[3] Wiktionary RDF extraction: http://dbpedia.org/Wiktionary
[4] Navigation-induced Knowledge Engineering by Example: http://aksw.org/Projects/NKE.html
[5] FacetOntology: Expressive Descriptions of Facets in the Semantic Web: http://eprints.soton.ac.uk/345363/