By Neil Chue Hong.
One of the biggest challenges for researchers is understanding which software they should choose to reuse or modify for their own work. The diversity and variety of software used in a research environment means that personal recommendations and demonstrations at conferences are often the only way to get a sense of the reusability of the software.
Software reusability is important not just for correctness: it enables improved efficiency and productivity, ability to link related outputs, and a more sustainable research software ecosystem. What is required is a simple way of understanding and assessing the reusability and maintainability of a piece of research software.
The topic of software reuse has been considered from a software engineering perspective for many decades. From the first cost model for software reuse developed at CMU's Software Engineering Institute by Holibaugh et al in 1989, there have been successive iterations of research (Koltan and Hudson's Reuse Maturity Model, the reuse model for DARPAs STARS program, the CMMI, the SSMM, QSOS) into improving the reusability of software code and developing maturity models to describe the code.
Reusability can be applied at many layers: software, platforms, libraries, components, APIs, code, formats, models. Most models concentrate on defining a process for assessing the reusability of the code itself. One of the best known examples is the NASA Reuse Readiness Levels. This allows others to easily assess the reuse potential (from limited reusability to having demonstrated extensive reusability) across a number of topics (including documentation, packaging, licensing and portability). This gives a comprehensive and comparative framework for assessing the reusability of scientific codes.
However this is not a small undertaking, and it is still aimed at the software developer. What many researchers would like is something that provides an answer to C. Titus Brown's idea of the Ladder of Academic Software Reusability + Sustainability, perhaps along the lines of the easy to understand Five Stars of Linked Data or Five Stars of Online Journals.
Five Stars of Research Software
After consultation with various researchers, I have defined a draft Five Stars of Research Software. These are intended to allow a quick assessment of key aspects as they apply to the researcher.
- Community: There is a community infrastructure with a common investment (required for sustainability)
- Open: Software has permissive license (required for modification)
- Defined: Accurate metadata defines the software and its functionality, dependencies and constraints (required for preservation)
- Extensible: The software is usable, modifiable for different data, pipelines, purposes (required for reproducibility)
- Runnable: The software is available and provides the information to operate it (required for publication)
We will be working next to refine these, and create a simple framework for assessing each objectively, before applying it to some historical case studies. We're very interested in input from others. Do you think this categorisation is useful? Is anything missing? Please comment below or email firstname.lastname@example.org
Many thanks to Caitlin Bentley, Adam Crymble, Barry Rowlingson, Robin Wilson, and Mark Woodbridge for the discussions that have helped shape this blog post, and to Kevin Ashley, Martin Fenner, Ross Gardler, David Shotton, Kenji Takeda, and people at the EPSRC Software Strategy Town Meeting who inspired the original idea behind the Five Stars of Research Software.