Making code citable with Zenodo and GitHub

Posted by s.hettrick on 28 July 2015 - 10:48am

By Megan Potter & Tim Smith, CERN.

For Open Science, it is important to cite the software you use in your research, as has been mentioned in previous articles on this blog. Particularly, you should cite any software that made a significant or unique impact on your work. Modern research relies heavily on computerised data analysis, and we should elevate its standing to a core research activity with data and software as prime research artefacts.  Steps must be taken to preserve and cite software in a sustainable, identifiable and simple way. This is how digital repositories like Zenodo can help.

Best practice for citing a digital resource like code is to refer to a digital object identifier (DOI) for it whenever possible. This is because DOIs are persistent identifiers that can be obtained only by an agency that commits to the obligation to maintain a reliable level of consistency in and preservation of the resource. As a digital repository, Zenodo registers DOIs for all submissions through DataCite and preserves these submissions using the safe and trusted foundation of CERN’s data centre, alongside the biggest scientific dataset in the world, the LHC’s 100PB Big Data store. This means that the code preserved in Zenodo will be accessible for years to come, and the DOIs will function as perpetual links to the resources. DOI based citations remain valuable since they are future proofed against URL or even protocol changes, through resolvers such as doi which currently direct to URIs.  DOIs also help discoverability tools, like search engines and indexing services, to track software usage through different citations, which in turn elevates the reputation of the programmer.

Long-term digital stewardship requires many auxiliary functions, such as tiered storage, caches and distributed access, as well as processes such as bit preservation, media migration, and data exercising. This is why most code hosting sites are not equipped to make the commitment to long-term storage and preservation. To provide flexible and focused services for collaborative coding, the nature of these sites is necessarily more short-term focused. Although it is possible to identify software uploaded to places like GitHub when citing it in a paper, these organisations do not issue DOIs and there is not a perpetual guarantee of access to older software. Citing the URL where the code is currently hosted can also work in theory, but again you are faced with preservation and versioning issues, e.g. how long will the programmer or organisation maintain the host website, which version of the code is being referenced, and is that version still available on the hosting website? These problems can be handled by publishing your code in a digital repository like Zenodo.

Submitting your code to Zenodo and receiving a DOI has never been easier thanks to the Zenodo and GitHub integration. Additionally, preservation is based on releases, so as the software changes each release can be cited with its own DOI as appropriate, giving precise traceability of the exact code used in a published analysis. Since releases are both archived and public, they are considered published, and can be described with rich metadata, an explanatory abstract and a meaningful author list. This means that you can skip the journal publication step, if you would prefer a more streamlined option for publication.

Zenodo allows you to sign up using your GitHub account to avoid creating yet another account and to facilitate the immediate interlinking of services. If you already have a Zenodo account, linking your GitHub account is still as easy as clicking 'Connect' on the GitHub page in your Zenodo account. Once linked, you simply flip the switch on the repository you would like to preserve. Subsequently, whenever you make a release of that repository in GitHub, Zenodo will archive it as well. This process gives you a new DOI for each release. GitHub has a helpful and succinct walk-through for how to do this in more detail. 

Pro tip: if your research is funded by an EU grant, you can even directly connect your code to your grant by updating the grant section of the metadata on the repository’s Zenodo record – discoverability!

A little extra about Zenodo

Conceived within the OpenAIRE project to fill a need for a domain agnostic, free, open-access research repository, Zenodo targets the needs of the 'long tail' of research results. Launched at the CERN Data Centre in May 2013 with a grant from the European Commission, Zenodo has a special commitment to sharing, citing and preserving data and code.  Based on the Invenio open-source software, Zenodo profits from and contributes to the foundation of code used to provide Open Data services to CERN and other initiatives around the world.