Encouraging citation of software – introducing CITATION files

QuoteMarks.jpgBy Robin Wilson, Fellow and postgraduate at the University of Southampton.

Put a plaintext file named CITATION in the root directory of your code, and put information in it about how to cite your software. Go on, do it now: it’ll only take two minutes!

Software is very important in science – but good software takes time and effort that could be used to do other work instead. I believe that it is important to do this work, but to make it worthwhile, people need to get credit for their work, and in academia that means citations. However, it is often very difficult to find out how to cite a piece of software – sometimes it is hidden away somewhere in the manual or on the web-page, but often it requires sending an email to the author asking them how they want it cited. The effort that this requires means that many people don’t bother to cite the software they use, and thus the authors don’t get the credit that they need. We need to change this, so that software – which underlies a huge amount of important scientific work – gets the recognition it deserves.

As with many things relating to software sustainability in science, the R project does this very well: if you want to find out how to cite the R software itself you simply run the command:

citation()

If you want to find out how to cite a package you simply run:

citation(PROJECTNAME)

For example:

> citation ('ggplot2')
 
 To cite ggplot2 in publications, please use:

 H. Wickham. ggplot2: elegant graphics for data analysis. Springer New York,
 2009.

A BibTeX entry for LaTeX users is

@Book{,
   author = {Hadley Wickham},
   title = {ggplot2: elegant graphics for data analysis},
   publisher = {Springer New York},
   year = {2009},
   isbn = {978-0-387-98140-6},
   url = {http://had.co.nz/ggplot2/book},
 }

In this case the citation was given by the author of the package, in R code, in a file called (surprise, surprise) CITATION inside the package directory. R can even intelligently make up a citation if the author hasn’t provided one. Note also that the function provides a nice handy BibTeX entry for those who use LaTeX – making it even easier to use the citation, and thus reducing the effort involved in citing software properly.

I think the R approach is wonderful, but the exact methods are rather specific to R (it is all based around a citEntry object and the CITATION file contains actual R code). I’d like to suggest a simpler, slightly more flexible approach for use broadly across scientific software:

Create a CITATION file in the root directory of your project and put something in there that tells people how to cite it

In most cases this will probably be some plain-text which gives the citation, and possibly a BibTeX entry for it, but it could be some sort of code (in the language your project uses) which will print out an appropriate citation when run (and, of course, R users should stick to the standard way of writing CITATION files for R packages).

I know this approach isn’t perfect (machine-readability of citations is a problem using this method, but then again machine readability of citations is a big problem generally…) but I think it is a start and hopefully it’ll reduce the effort required to cite software, and thus encourage software citation.

So, go on – go and write a few CITATION files now!

You can read Robin's original post on his blog.

The Debian project is working

The Debian project is working on a machine-readable method for citations and other "upstream" metadata: http://dep.debian.net/deps/dep12/

In CP2K (www.cp2k.org), we

In CP2K (www.cp2k.org), we have a similar idea, but as there are so many features, all of which make use of different published algorithms, theory etc. a single CITATION file is not enough to as the user would have to figure out which references need to be cited depending on what they did with the code. In our model a developer adds an entry to a bibliography file in the source code, adds a subroutine call in their code to cite something in the bibliography, then at runtime, a list of references which should be cited by the user is printed to stdout. For small codes, I think the CITATION file idea is probably a good one, but it too simplistic for the large, complex codes around.

The molecular dynamics code

The molecular dynamics code LAMMPS has recently included a mechanism that actually pumps out relevant citations at run time (to a log file). LAMMPS has a great deal of functionality, and exactly what is used is determined by the user input at compile/run time. So it gives you citations relevant to particular options that have been employed, e.g., "This LAMMPS simulation made specific use of work described in the following references. See http://lammps.sandia.gov/cite.html for details. neighbor multi command: @Article{Intveld08, ..." This is only for those who actually run the executable.

"Good advice. An additional

"Good advice. An additional approach is listing the papers that have cited your software. Researchers want their work to get as much visibility as possible - their paper being listed here can help that (and thus provide further encouragement to cite your software). Now if only there were an easy way to auto-generate the list of papers citing my software.... [1]" Yannick Wurm. [1]: http://www.biostars.org/p/4251

Thanks! For the computational

Thanks! For the computational algebra system GAP, we had "How to cite GAP" section on the GAP homepage (http://www.gap-system.org/Contacts/cite.html) and also suggested how to cite GAP in the manual (http://www.gap-system.org/Manuals/doc/ref/chap1.html). Inspired by this post, we've added the CITATION file to the GAP distribution and introduced a function "Cite" which suggests how to cite the version that is being used in several formats: plain text, HTML, BibXML and BibTeX.

Post new comment

The content of this field is kept private and will not be shown publicly.
CAPTCHA
This question is for testing whether you are a human visitor and to prevent automated spam submissions.