Early impact from the open call - testing Provenance Tool Suite

Posted by m.jackson on 11 November 2015 - 11:35am

Boxes with TEST printed on them

By Mike Jackson, Software Architect.

In August I completed an open call project with Trung Dong Huynh, Luc Moreau and Danius Michaelides of Electronics and Computer Science at the University of Southampton. As part of their research into provenance, they have developed the Southampton Provenance Tool Suite, a suite of software, libraries and services to capture, store and visualise provenance compliant with the World Wide Web Consortium (W3C) PROV standards. I developed a test framework in Python, which tests Provenance Tool Suite toolkits and services operating collectively. Dong, Luc and Danius contacted me with their experiences on using the test framework to date...

The Provenance Research Team had populated a small repository of five test cases (sets of semantically-equivalent PROV documents) I had created, with over 470 more test cases. From these, the test framework is now used to create and run over 7,500 individual tests. The team had used the framework's extensibility points, and supporting documentation, to add support for using their ProvToolbox toolkit as a PROV document converter. Dong commented that the documentation and small set of initial test cases that had been provided had been "very useful, helping me up-to-speed quickly." While not always a developer's favourite task, investment in user and developer documentation is always worthwhile!

The team had run interoperability tests for their W3C PROV toolkits, ProvPy and ProvToolbox, and services, ProvStore and ProvTranslator. Dong commented that, thanks to the framework, he had "already found quite a few interop issues we haven't thought of."

Naturally, as this was the first time they'd used the test framework, there were many suggestions for improvements, Three significant requests were as follows.

HTML test report generation

Dong asked whether it was possible to create test reports in HTML. In the presence of 7,500 tests, being able to easily browse test results to quickly identify and understand why tests failed is essential.

I looked at both nose-htmloutput and nose-html-reporting for creating HTML from tests run via Python's nose test runner. I felt the latter, nose-html-reporting, was preferable as it colours failed and skipped tests differently and allows the browsing of test outputs for all tests, not just failed ones. Its only down-side is that it only runs under Python 2.7, whereas nose-htmloutput runs under both Python 2 and 3.

The hosted continuous integration server used by the team, TravisCI, does not allow direct access to build artefacts, such as HTML test reports. However, it does provide support for Uploading Artifacts on Travis CI to Amazon S3 (Simple Storage Service). Once uploaded, the test results can then be browsable online, as described, for example, in the StackOverflow question Access files stored on Amazon S3 through web browser.

Run tests in parallel

The 7,500 tests also motivated the suggestion to run tests in parallel to speed up the overall run time. There were two options, depending on whether the tests are run locally or on Travis CI.

For a local test environment, nose supports parallel test runs when run within a multi-core or multi-processor environment.

For Travis CI, jobs run on their virtual machines. To parallize builds, Travis CI recommend using a build matrix, part of a job configuration, which specifies a set of build environments. The job is run under each environment in parallel on separate virtual machines. A build environment can be as simple as a folder. Heirarchically structuring the PROV test cases into groups of folders, would allow the test framework to be run in parallel on each group of folders in parallel.

Either solution can be adopted with minimal changes to the test framework.

Modularise the test framework

The original test framework was a single component (consisting of packages, modules and classes) hosted within a single Git repository. Dong requested that the framework be modularised into a repository containing common framework code and repositories containing the framework code for testing each of ProvPy, ProvToolbox, ProvStore, and ProvTranslator. This would encourage other researchers to develop their own test framework components, without needing to modify the common framework code itself.

I split up the test framework into separate repositories, as requested, updated how the framework is configured, and refactored the code to allow the code in each repository to be built and installed as Python packages using Python's distribution utilities. I also updated the code to allow the test framework to be configured from tests, in-code, rather than solely via configuration files.

The next six months

The Provenance Research Team commented that:

With the help of the test infrastructure, we have already identified various compatibility issues that we had not been aware of. It is such an invaluable tool, particularly so for the development and maintenance of our tool suite, which requires compatibility across three different programming languages and various software services. In short, the outcome of the project fulfils our original goals set out in our application [to the open call].

The team's future plans are to: deploy a private test server for running the test framework (in part to avoid a restriction of Travis CI, which terminates jobs that take longer than 2 hours); provide guidance to their team in the use of this test server; address compatibility issues between the Provenance Tool Suite tools identified by the test framework with the aim of ensuring that the toolkits and services are fully compatible; and, demonstrate how their test suites can be used by the community to test their software for compatibility with the W3C's PROV standards.

They comment that the test framework:

will definitely save us a significant amount of time and efforts by enabling us to identify issues early, allowing us to focus on development work that matters, while still providing us with the confidence in the quality of our products.

They also commented that it "may offer potential benefits to other teams who might want to develop PROV-compatible software." As for Software Carpentry, helping researchers achieve more, in less time, and with less pain is one of the Institute's goals and it is rewarding to already see the impact this work has already made. I look forward to hearing about the Provenance Research Team's experiences 6 months from now!