Provenance Tool Suite

Round-trip testing for Provenance Tool Suite

Family treeProvenance is a well-established concept in arts and archaeology. It is a record of ownership of a work of art or an antique, used as a guide to authenticity or quality. In the digital world, data too can have provenance: information about the people, activities, processes and components, for example software or sensors, that produced the data. This information can be used to assess the quality, reliability and trustworthiness of the data.

Trung Dong Huynh, Luc Moreau and Danius Michaelides of Electronics and Computer Science at the University of Southampton research all aspects of the “provenance cycle”: capture, management, storage, analytics, and representations for end users. As part of their research, they have developed the Southampton Provenance Tool Suite, a suite of software, libraries and services to capture, store and visualise provenance compliant with the World Wide Web Consortium (W3C) PROV standards.

W3C PROV standards

The W3C PROV standards define both how provenance information can be represented, and how it can be exchanged. The specifications include the PROV data model (PROV-DM), an OWL2 ontology that allows the data model to be mapped to RDF, TRiG, and Turtle (PROV-O), a human-readable provenance notation (PROV-N), and a set of constraints applying to the data model (PROV-CONSTRAINTS). These specifications are W3C recommendations meaning that, after extensive testing and review, the W3C believes they are suitable for widespread adoption as web standards and promote W3C's mission. Two other documents of note are PROV-XML, which defines an XML Schema for the data model, and PROV-JSON, a JSON representation of the data model.

Provenance Tool Suite

The Provenance Tool Suite includes a number of software packages. These include: ProvPy, a Python library supporting import and export of PROV-DM data as PROV-JSON and PROV-XML; ProvToolbox, a Java library to create Java representations of PROV-DM and convert them to PROV-O, PROV-XML, PROV-N, and PROV-JSON; and, ProvJS, a JavaScript utility for indexing and searching PROV-JSON objects within JavaScript objects.

These packages are used to provide Provenance Tool Suite services. These include: ProvStore, a free repository for PROV provenance documents that allows these to be stored, browsed and managed, and which currently hosts over 59,000 documents; ProvTranslator, a service to translate PROV documents from one PROV representation to another; and, ProvValidator, a service that validates PROV documents. Each of the services can be used via a browser-based interface or a REST API.

The Provenance Tool Suite is a unique toolkit, providing the core functionality required to work with PROV-compliant provenance. Each software package and service within the Provenance Tool Suite undergoes extensive testing of its conversion of provenance in various PROV representations. Because of this, it is a reference implementation which demonstrates interoperability of the W3C recommendations. Provenance Tool Suite continues to be exploited by the University of Southampton within their projects and it is gaining traction and visibility in the provenance community.

A round-trip test infrastructure

Offering top quality toolkits and services that are robust, embody software development best practice, and integrate with state-of-the-art libraries, goes well beyond the goal of the projects that initially funded the development of Provenance Tool Suite. Efforts to further develop and continuously maintain the suite are sporadic and an important aspect of testing Provenance Tool Suite packages and services has yet to be addressed. What is lacking is interoperability testing across the suite’s packages and services, when used collectively. For example, if one were to serialize a PROV Java object into RDF, deserialise it into ProvPy, serialize it back into RDF and deserialise it back into Java, would one end up with a PROV Java object that is equivalent to the original?

Dong successfully applied to our Research Software Group for help as part of our Open Call. We will work with Dong, Luc and Danius to develop a test infrastructure, which systematically checks convertibility and round-trip conversions across combinations of Provenance Tool Suite packages and services operating collectively. This will include testing of: round-trip interoperability between ProvPy and ProvToolbox; between these packages and ProvStore, ProvTranslator and ProvValidator services whether these be deployed locally, on a developer's own machine, or remotely; ProvJS-related operations; and, command-line utilities that are provided within ProvToolbox. The infrastructure will be runnable both locally, and on third-party test infrastructures (e.g. Travis CI, which is used to test the suite’s software packages at present).

Improving round-trip interoperability testing across Provenance Tool Suite’s packages and services will help ensure that the suite continues to provide a valid reference implementation of PROV recommendations and give confidence that its packages and services preserve the validity of provenance when interoperating. A suite of round-trip interoperability tests will also provide reassurance that future refactoring of individual packages does not introduce bugs that are only discovered when packages are used together, and that, if bugs are introduced, then these are rapidly identified.

We look forward to reporting on our collaboration.