The process of cataloguing software

Posted by n.chuehong on 20 February 2013 - 9:10am

By Neil Chue Hong and Mario Antonioletti.

How can we tell what software has been produced from projects funded by a particular organisation? Is there a way of maximising reuse of good code whilst recognising that not everything has been produced with reusability in mind?

Buried deep within a recent JISC news announcement is a note about a project called Software Hub - a platform for easy access to open source software for education and research. What isn't in that announcement is that this is a project involving the Software Sustainability Institute and OSS-Watch, two organisations which care greatly about the reusability of software.

This cataloguing of software isn't easy. In OMII a catalogue of e-Science software showed that it can take a lot of effort to comprehensively survey a field. What is clear is that there must be incentives for software developers to provide the information themselves. De facto standards like DOAP (Description Of A Project), an RDF and XML Schema vocabulary to describe projects, make it easier to make basic metadata available for federated catalogues to harvest. However, whilst extensible, on their own they don't provide all the information you might seek. Some projects (particularly in Europe) have looked at using ADMS to describe software assets in a metadata vocabulary to describe software called ADMS.SW which reuses existing specifications, such as DOAP, SPDX, ISO 19770-2, ADMS, and the Trove software map from Sourceforge.

What we aim to do in this project is establish the metadata that might be required to differentiate software produced by JISC projects, and understand the processes by which that information can be most efficiently collected. As part of that, we need to understand the incentives for developers to provide information (for instance, to get better publicity, to attract new users) and for funders to showcase certain software (for instance, to tie in with a particular theme, to promote reuse). Our role in this process is to interview a number of the stakeholders and suggest different levels of metadata that trade-off ease of collection/update with completeness/reuse potential.

We'll be providing more updates from this project as it progresses, but if you have any comments please share them below.