By Mike Jackson, Software Architect.
Provenance is a well-established concept in arts and archaeology. It is a record of ownership of a work of art or an antique, used as a guide to authenticity or quality. In the digital world, data too can have provenance: information about the people, activities, processes and components, for example software or sensors, that produced the data. This information can be used to assess the quality, reliability and trustworthiness of the data.
Trung Dong Huynh, Luc Moreau and Danius Michaelides of Electronics and Computer Science at the University of Southampton research all aspects of the “provenance cycle”: capture, management, storage, analytics, and representations for end users. As part of their research, they have developed the Southampton Provenance Tool Suite, a suite of software, libraries and services to capture, store and visualise provenance compliant with the World Wide Web Consortium (W3C) PROV standards.
The W3C PROV standards define both how provenance information can be represented, and how it can be exchanged. The specifications include: the PROV data model; an OWL2 ontology that allows the data model to be mapped to RDF, TRiG, and Turtle; a human-readable provenance notation, and a set of constraints applying to the data model. These specifications are W3C recommendations meaning that, after extensive testing and review, the W3C believes they are suitable for widespread adoption as web standards and promote W3C's mission. An XML Schema and a JSON representation of the data model are also available.
These packages are used to provide Provenance Tool Suite services. These include: ProvStore, a free repository for PROV provenance documents that allows these to be stored, browsed and managed, and which currently hosts over 59,000 documents; ProvTranslator, a service to translate PROV documents from one PROV representation to another; and, ProvValidator, a service that validates PROV documents. Each of the services can be used via a browser-based interface or a REST API.
The Provenance Tool Suite is a unique toolkit, providing the core functionality required to work with PROV-compliant provenance. Each software package and service within the Provenance Tool Suite undergoes extensive testing of its conversion of provenance in various PROV representations. Because of this, it is a reference implementation which demonstrates interoperability of the W3C recommendations. Provenance Tool Suite continues to be exploited by the University of Southampton within their projects and it is gaining traction and visibility in the provenance community.
Offering top quality toolkits and services that are robust, embody software development best practice, and integrate with state-of-the-art libraries, goes well beyond the goal of the projects that initially funded the development of Provenance Tool Suite. What Provenance Tool Suite currently lacks is interoperability testing across the suite’s packages and services, when used collectively. For example, if one were to serialize a PROV Java object into RDF, deserialise it into ProvPy, serialize it back into RDF and deserialise it back into Java, would one end up with a PROV Java object that is equivalent to the original?
Dong successfully applied to our Research Software Group for help as part of our Open Call. We will work with Dong, Luc and Danius to develop a test infrastructure, which systematically checks convertibility and round-trip conversions across combinations of Provenance Tool Suite packages and services operating collectively. Improving round-trip interoperability testing across Provenance Tool Suite’s packages and services will help ensure that the suite continues to provide a valid reference implementation of PROV recommendations and give confidence that its packages and services preserve the validity of provenance when interoperating. A suite of round-trip interoperability tests will also provide reassurance that future refactoring of individual packages does not introduce bugs that are only discovered when packages are used together, and that, if bugs are introduced, then these are rapidly identified.
We look forward to reporting on our collaboration. For more details, please see our who do we work with page.