How do we know research software is correct?

Posted by s.hettrick on 27 September 2013 - 2:26pm

By Ian Cottam, IT Services Research Lead at The University of Manchester.

We all know about testing code; but decades ago Dijkstra pointed out that "Testing can only show the presence of bugs and not their absence". So, what can we do to gain confidence that Research Software is correct?

As an ex-formalist, I could say "write a formal (mathematical) specification of what your software is supposed to do; transform it into executable and efficient code in a number of steps; prove the preservation of correctness of each transformation as you go." I am also reminded of a quote from Knuth "Be aware this code has not been tested, only proven correct."

Of course, expensive formal verification is never going to happen for anything but life-critical code. A good compromise might be Statistical Testing - which can be described as:

"The black box sampling of a system with sufficient and representative
pseudo-randomly generated data to assure correctness to some desired level
of confidence."

This approach requires an oracle: something you can compare results against. It could be an older, trusted version of the code, or a version written in a very (very) high-level language, where correctness is all and efficiency is nothing.

Rather than going into detail, I refer you to a paper that I contributed to some years ago when I worked for Adelard, and helped to develop a piece of software called Dust-Expert. (It could advise you on how to prevent
industrial accidents due to dust explosions.) Section 7 of the paper discusses statistical (aka probabilistic) testing in general and how we applied it to the problem of convincing ourselves and others that Dust-Expert did the right thing.

I'd appreciate your feedback on this approach.