From hidden to visible: how can we recognise research software?

Posted by d.barclay on 1 August 2023 - 11:00am

By Stephan Druskat, Hugo Gruson, Carlos Martinez, João Morado, and Deborah Leem.

This blog post is part of our Collaborations Workshop 2023 speed blog series.

Introduction

At the SSI’s Collaborations Workshop 2023, we examined the challenges of assessing research software outputs for academic evaluation in a fair and useful manner. Our discussions focused on the implications of recent updates to guidelines and policies, particularly the potential of new quality-based indicators designed to align with the FAIR for Research Software principles. We explored how these indicators can be used to measure the impact of software work in research and mitigate challenges associated with evaluating software.

The emergence of the Research Software Engineer (RSE) role has contributed to the growing recognition of research software. RSEs have played a critical role in enhancing efficiency, driving innovation, and enabling reproducibility in scientific research. Their day-to-day experience working with research software projects can also contribute to better academic evaluation of software outputs.

To acknowledge the significance of research software, funding agencies are incorporating software in their guidelines [1, 2]. Likewise, institutions are developing policies to support its use (see also the ReSA task force “Research institution policies to support research software”). The Helmholtz Association of German Research Centers, for example, has introduced a new Open Science Policy [3], which includes a basic indicator for evaluating research software through citable publications. This is expected to be replaced by a quality-based indicator aligned with the FAIR for Research Software principles by 2024 [4].

If these indicators are to fulfil their potential, they need to take into account contributions that have hitherto gone unnoticed. While indicators can and should build on emerging strategies for software publication and citation, they need to avoid adopting existing or emerging metrics without questioning their impact and resilience against abuse.

Software citation

Citations have emerged early as a way to evaluate research software. Indeed, it makes sense to re-purpose common ways of evaluating academic performance. This can cause friction, as publications and their use in performance evaluation were not originally designed with software in mind. The ever-evolving nature of software makes it difficult to fit the underlying work into this static, rigid framework. This has led to the creation of alternative publication routes aimed at easing this process.

Nevertheless, citations can be a good way to measure any impact that goes beyond the simplest metrics, such as the number of downloads. Someone may download a software tool and then decide that they do not like it or that it does not answer their needs. But if they cite it, they are explicitly recognising the usefulness of the work. However, citations come with downsides and caveats as well:

Regarding citations via software publication, the feedback loop is often slow. Because the lifecycle of research is generally quite long, it might take years to get the first citations.
The same limitations apply to citations of research publications: some fields or types of tools (e.g., data visualisation libraries) are more prone to receive citations than deeper infrastructure tools that operate as indirect dependencies or in the background.

Recently, there has been work and progress to overcome a system where only the user-facing tools are cited. Since many packaging frameworks require developers to list their dependencies, tools could be created to automatically cite all the transitive dependencies alongside a given user-facing tool [5].

Software publications

A prerequisite to making a software citable is to link it to a unique permanent identifier, such as DOI, by publishing it. Current software publishing practices rely mainly on two resources: general-purpose open repositories and academic journals. Open repositories, such as Zenodo or The Dataverse Project, function as public archives where developers can publish their software in a way that makes it citable and identifiable. These repositories have the advantage of allowing multiple, individually citable versions of software to co-exist under a single citation. However, the main problem with this practice is the lack of quality control filters, which opens doors for less reliable software.

Alternatively, in recent years peer reviewed academic journals suitable for software publishing (e.g., The Journal of Open Source Software and SoftwareX) have begun to emerge. This publication route is still not ideal since it focuses on the software side and lacks the domain-specific review part. Furthermore, journal peer review is often unable to keep up with software version updates, since the reviewers' workload, if continuously integrated into the publication process, would increase substantially. There is a clear need to implement common standardised frameworks for code review and software publication that share minimum quality criteria and recognise all phases of software development. Checklists and static code analysis tools can leverage specific steps of this process, but they are only a part of a broader and more complex solution. Automated publication of research software may be part of that solution [9].

Metrics

The alternative approach to using traditional academic evaluation metrics, such as citations, would be to look at metrics inspired by the software engineering field. As previously discussed, software citations try to capture software contributions in a traditional sense. Other metrics for measuring the impact of software, such as the number of downloads or GitHub stars, also come with their issues. These metrics have the risk of being “gamed”. Additionally, these metrics may fail to capture the impact of software: one particular piece of software may have few downloads but lead to major scientific breakthroughs (software behind the Event Horizon Telescope is an example). Other software may have a larger number of downloads but limited impact.

This issue is not exclusive to software metrics. Traditional metrics used in academia, such as number of publications, also can be difficult to compare across different fields. Although metrics should be understood within the context in which they are used, their potential issues do not imply that they should not be used.

Invisible work

As software work goes unnoticed and current metrics do not fully recognise its impact, it is worth noting the similarities between this phenomenon and the idea of “invisible work”. The concept of "invisible work" is based on the work of [7] Star and Strauss (1999) and refers to routine background work that is often overlooked or absent. [8] Terras and Nyhan's (2016) work on Father Busa's female punch card operators also highlights the idea of invisible work and forgotten voices in the early period of humanities computing.

One way to address the problem of "invisible work" is to make technical documentation about tools and their integration publicly available. It is also important to have a "tacit knowledge audit" included in projects to make sure that practical knowledge and experience are not forgotten but shared explicitly [6]. Furthermore, software citation could help by giving credit to all contributing roles, citing dependencies, and referencing previous theoretical work.

To acknowledge the significance of research software, it is important to identify the core tenets and principles of this work and its outputs and find ways to incorporate invisible work into software documentation, contexts, and authorship. Only then all contributors will be recognised and their contributions properly documented and acknowledged.

Conclusion

Software work must be accurately represented in academic evaluations. However, the roadmap to creating an evaluation system that reflects the unique aspects of software remains unclear. Bootstrapping metrics and methods from existing evaluation frameworks may not provide an optimal solution.

At the heart of the solution lies an evaluation process for research software that mirrors the real-world practices of software development. This includes recognising the “invisible work” - community contributions, conceptual inputs, consultations, and more. RSEs have a significant role to play here. Their hands-on experience and unique insights can bridge gaps in our understanding, drive innovative evaluation methodologies, and contribute to the development of relevant guidelines and standards.

The upside is that over the last few years, progress has been made in software citation and recognition of the importance of research software. There is a clear benefit in building on this progress for academic evaluation of software work. We need to be careful, however, how we balance well-understood metrics adopted from other outputs, the specifics of current research software practices, and the need to acknowledge work hitherto unnoticed.

Our first steps should involve deepening our understanding of what constitutes a credible contribution to software, i.e., the criteria for software authorship. Next, we need to work towards meaningful quality criteria for research software that can be used in academic evaluation. Lastly, we need to develop novel methods to assess the quality and impact of research software and refine good practices for their application, where dynamic development makes traditional peer review impossible.

Only with these pieces in place can we confidently implement academic evaluation of research software.

References

[1] Deutsche Forschungsgemeinschaft. (2022). Guidelines for Safeguarding Good Research Practice. Code of Conduct. https://doi.org/10.5281/zenodo.6472827
[2] M. Barker, N. P. C. Hong, J. van Eijnatten, and D. S. Katz, “Amsterdam Declaration on Funding Research Software Sustainability,” Mar. 2023, doi: 10.5281/zenodo.7740084.
[3] Helmholtz-Gemeinschaft (Ed.) (2022): Helmholtz Open Science Policy. Version 1.0. Approved in the 119th General Assembly of the Helmholtz Association on 20-21 September 2022, Helmholtz-Gemeinschaft, 9 p. https://doi.org/10.48440/os.helmholtz.056
[4] Chue Hong, N. P., Katz, D. S., Barker, M., Lamprecht, A-L, Martinez, C., Psomopoulos, F. E., Harrow, J., Castro, L. J., Gruenpeter, M., Martinez, P. A., Honeyman, T., et al. (2022). FAIR Principles for Research Software version 1.0. (FAIR4RS Principles v1.0). Research Data Alliance. DOI: https://doi.org/10.15497/RDA00068
[5] Katz, D. S. (2014) Transitive Credit as a Means to Address Social and Technological Concerns Stemming from Citation and Attribution of Digital Products, Journal of open research software. DOI: https://doi.org/10.5334/jors.be
[6] Edmond, J., Morselli, F., 2020. Sustainability of digital humanities projects as a publication and documentation challenge. J. Doc. 76, 1019–1031. https://doi.org/10.1108/JD-12-2019-0232
[7] Star, S.L., Strauss, A., 1999. Layers of Silence, Arenas of Voice: The Ecology of Visible and Invisible Work. Comput. Support. Coop. Work CSCW 8, 9–30. https://doi.org/10.1023/A:1008651105359
[8] Terras, M., Nyhan, J., 2016. Father Busa’s Female Punch Card Operatives, in: Gold, M.K., Klein, L.F. (Eds.), Debates in the Digital Humanities. University of Minnesota Press, Minneapolis; London, pp. 60–65.
[9] Druskat, S., Bertuch, O., Juckeland, G., Knodel, O., & Schlauch, T. (2022). Software publications with rich metadata: state of the art, automated workflows and HERMES concept. ArXiv, https://doi.org/10.48550/arXiv.2201.09015