In order to enable attribution and credit for Research Software Engineers, and other developers of and contributors to research software, software must be made citable, and must be cited. One of the obstacles for correct and comprehensive software citation is the lack, or suboptimal discoverability, of relevant metadata. While, for instance, papers provide their metadata quite obviously (i.e., title, authors, containing publication, publication date, etc.), software hardly ever does.
The Citation File Format (CFF) has been developed to implement this standard. CFF is written in simply-structured YAML files called CITATION.cff, cf. example below. It retains a high degree of human-readability and -writability to preserve the recoverability of metadata from a CITATION.cff file by humans, while reliably providing correct metadata for other actors in the software citation workflow, such as repositories, indexers, converters, publication platforms, etc.
message: If you use this software, please cite it as below.
- family-names: Druskat
title: My Research Tool
The Citation File Format has a number of required fields to ensure that it is self-documenting (message) and implements the Software Citation Principles (https://doi.org/10.7717/peerj-cs.86, principles 1., 2., 3., 5., 6.). Additionally, it allows the definition of arbitrary secondary references for software, such as software, context, or algorithm papers. It is compatible with other metadata formats, such as the JSON-LD implementation of the multi-purpose software metadata model CodeMeta for machine exchange.
After a round of brief introductions from everyone, participants presented their hack ideas – some original, some based on issues reported to the main Citation File Format repository – formed groups, and got hacking. The resulting hacks were presented to the plenary at the end of the workshop. Together, we have made very diverse hacks on different levels: tooling implementation, policy, licensing, documentation, software citation workflow issues, and more. This is the magic behind hack events, and this one wasn’t different: a diverse group of people coming together, bringing their different sets of skills to the table, and collaborating on advancing a certain thing, just getting some work done, and creating all different sorts of output, be it code, text, knowledge, policy, or something else.
Report more formally on experiences with CFF at the Netherlands eScience Center
Jurriaan H. Spaaks and Jason Maassen have started working on a blog post detailing their adoption of CFF to provide and re-use citation metadata for the software listed in the Netherlands eScience Center’s Research Software Directory. Once the post is finalised, it will be submitted to the Software Sustainability Institute’s blog.
Clarify the relation and interfaces between CFF and CodeMeta
CodeMeta is a crosswalk table and format that “improve[s] how [different] resources [for research software] can talk to each other.” As a general software metadata exchange format, CodeMeta can also be used to record the metadata needed for software citation. Therefore, the relation between CFF and CodeMeta should be clarified, and potentially useful interfaces for an exchange between the two should be identified. In the context of the FORCE11 Software Citation Implementation Working Group, Daniel S. Katz (as Group Leader) and Stephan Druskat (as member) started the discussion that led to internal communication with the working group about the differences between the two formats and an assessment of their potential uses.
Solve the chicken/egg dilemma for DOIs
This has been a very important hack, not only for CFF, but for any effort that is affected by the following dilemma: The Software Citation Principles state that a software version should be uniquely identifiable via a unique, persistent, and machine-actionable identifier such as a DOI. DOIs are usually assigned to releases of a software version. The assignment can be automated, e.g., via the Zenodointegration for GitHub releases. This, however, precludes the citation metadata in a CITATION.cff file from being updated in time for the release. The dilemma can be solved by manually reserving a DOI at Zenodo, updating the CITATION.cff file, and then making the release including the up-to-date metadata manually. However, the manual steps involved make this approach prone to error, so Toby Hodges, Patricia Herterich, Cerys Lewis, David Perez-Suarez, and Robin Dasler started investigating the feasibility of automating the process. The group tested whether they can reserve DOIs on Zenodo through its API, and add the pre-released DOI to a zenodo.json file before the release to push from GitHub to Zenodo. In this workflow, the DOI would be added to the CITATION.cff file. However, it has turned out that they couldn’t get Zenodo to parse the pre-reserved DOI. Instead, it always created a new DOI for releases.
In lieu of the GitHub-Zenodo integration implementing a similar feature, one way to take this forward would be to implement a service pre-reserving a DOI, creating the .zenodo.json and CFF file for the GitHub repository, and then pushing all this through the API to Zenodo and upload the release tarball as a “standard” file upload instead of using the GitHub integration.
Jan Philipp Dietrich implemented an R package that offers read and write support for CITATION.cff files. Furthermore, it provides tools for the extraction of citation information from R packages, thus extending the citation function from the utils package. The package is available from the CRAN of the Potsdam Institute for Climate Impact Research.
Oliver Strickson has not only optimised the documentation for CFF and the repositories in the Citation File Format GitHub organisation, but also checked and updated LICENSE files where necessary, and led a discussion about licensing the format itself, which in turn led to a respective hack giving the format standard a license.
Create a generic CFF reader in Python
Peter Hill and Jennifer Radtke have created a Python library for reading CITATION.cff files. The great thing about the module is that it also provides a class (i.e., a data model) for the citation metadata, which allows the library to be used in the backend of Python applications reading, creating, manipulating and exporting CFF files.
Flagged by Oliver Strickson while he was working on updating and improving developer documentation and license information for the different CFF projects, we discussed whether we should license the format itself, in contrast to the specifications document which is licensed under CC BY-SA 4.0. After some discussion, we concluded that if the format can be licensed at all, it should be licensed under a maximally liberal licence, which should mostly avoid the creation of new projects under the same name. The Apache License, Version 2.0 was a candidate, but its licence text redistribution requirement and the fact that it is mainly used for software led to its dismissal. Instead we tentatively opted for CC BY 4.0 and included it in the main CFF repository’s README, but would like to ask the community for more informed input if this is a viable solution.
Submit CFF as a standard to fairsharing.org
Alexander Struck has submitted the Citation File Format as a standard to fairsharing.org, a resource on data and metadata standards, inter-related to databases and policies. This entry will make the format more visible and discoverable.
We are still awaiting DOI assignment. We ask interested parties to join the CFF team for future work.
The discussion will be documented, and will report on progress, in this GitHub issue.
Drag-drop web front end for a CFF editor
Matt Walker and Ana Costa Conrado developed a prototype for a drag’n’drop-enabled front end for a web application that can work on CFF files. This could tie in nicely with, e.g., the Python-based back end from the respective hack mentioned above. The drag’n’drop interface already works with single files, but can also be made to work with, e.g., whole directories, so that an entire software database could be read out with it.
Learning, exploration and discussions around the Citation File Format and software citation in general have also all been valid hacks, and we had the impression and supporting feedback that those who had not worked on a “product-oriented” hack have used the opportunity to do just that during the day.
The hack day saw overwhelming interest and enthusiasm by all participants. Software citation really is one of the current pressing issues that must be properly implemented in order to award Research Software Engineers and other creators of research software their due credit, and as a requirement for linkage, discovery, reproducibility, and provenance analysis of research software. The hack day seems to have shown that the Citation File Format is a suitable starting point for the software citation workflow, and we hope to run similar events in the future.
We also use analytics & advertising services. To opt-out click for more information.