Provenance Tool Suite: Tracking data to its origins

key-concepts.pngBy Selina Aragon, Communications Officer, in conversation with Trung Dong Huynh, University of Southampton

This article is part of our series: Breaking Software Barriers, in which we investigate how our Research Software Group has helped projects improve their research software. If you would like help with your software, get in touch.

From concept to software

Provenance is traditionally the record of ownership of a work of art or an antique, used as a guide to authenticity or quality. Although mostly used to track the origins of a work of art, the term is now used in an array of fields ranging from palaeontology to science. It refers to having knowledge of all the steps involved in producing a scientific result, such as a figure, from experiment design through acquisition of raw data, and all the subsequent steps of data selection, analysis and visualisation. Such information is necessary for reproduction of a given result, and can serve to establish precedence. This concept also applies to the digital world; that is, data also originates from a particular point, and provenance provides evidence of its point of origin or discovery by establishing its ownership, custody, and transformations.

Trung Dong Huynh, from the Electronics and Computer Science department at the University of Southampton and part of the Provenance Tool Suite team, commented on one of the most fundamental outcomes of his collaboration with the Software Sustainability Institute:

“We needed help to manage the bug reports we get from Provenance Tool Suite more effectively. Mike [Jackson] was able to improve the roundtrip interoperability across different libraries, which had a key impact on the way we manage bug reports. We used to manage them one by one, but, thanks to the Institute and Mike, this is now an automated process.”

Luc Moreau and Daniel Michaelides, also from the University of Southampton, and Trung Dong Huynh developed the Provenance Tool Suite—a suite of software, libraries and services to capture, store and visualise provenance. The software is compliant with the World Wide Web Consortium (W3C) PROV standards, which define how provenance information can be represented and how it can be exchanged.

The PROV standards allow linking data back to evidence of when it first originated following appropriate processes to evaluate the trustworthiness of such data. There are several organisations already using them, such as NASA and the UK National Archives. In particular, Provenance Tool Suite lets users around the world check the consistency of their data and expose where it’s coming from, while also making it accessible to the public.

Interoperability and better documentation

The goal of this collaboration was to develop an infrastructure that systematically checks convertibility and round-trip conversions across combinations of Provenance Tool Suite packages and services operating collectively. Mike Jackson, Research Software Engineer at the Software Sustainability Institute, went through the Provenance Tool Suite libraries and documentation and provided the Southampton team with concrete advice on how to make their software open source and improve its documentation. Dong stated that no one in his team is a professional software developer, but rather they “happen to develop software” for research:

“Getting help from the Institute has definitely saved us a significant amount of time and efforts by enabling us to identify issues early, allowing us to focus on development work that matters, while still providing us with the confidence in the quality of our products.”

Dong successfully applied to the Software Sustainability Research Software Group for help as part of our Open Call. According to Mike Jackson, the Institute work included:

“Testing of: round-trip interoperability between ProvPy and ProvToolbox; between these packages and ProvStore, ProvTranslator and ProvValidator services whether these be deployed locally, on a developer's own machine, or remotely; ProvJS-related operations; and, command-line utilities that are provided within ProvToolbox.”

Dong also reports that more users are engaging with the tool and interacting with it, as well as sending bug reports—which the team monitors and fixes more efficiently—improving thus the communication between Provenance Tool Suite users and developers.

If you'd like free help to assess or improve your software, submit an application into the Institute's Open Call

Posted by s.aragon on 11 October 2016 - 10:47am