Oh research software, how shalt I cite thee?

Citation needed placardBy Mike Jackson, Software Architect

The Institute are firm believers in software citation. Citing software, directly or via its associated publications, provides deserved credit for those who develop this vital research infrastructure. In this blog post I look at some ways in which research software developers are helping to promote the citation of software, by making it easier for researchers to do this. That's another thing we are firm believers in, automating the grunt work of using and developing software to free up time for research...​

As part of recent open call collaborations with both BoneJ and QuBIc I was taken aback by how involved citing software could get. For example, BoneJ request that their journal paper is cited, but, depending upon the plug-ins and additional features used, there are other papers that also need to be cited. Likewise, the FSL software library request citation of one of their three overview papers. Again, depending upon the specific tools used, there are additional papers to be cited. For example, using QuBIC's FABBER tool, bundled in FSL, requires citation of one paper, though citing three is recommended.

As our fellow Robin Wilson has noted, in his blog post on "Encouraging citation of software - introducing CITATION files", it can be difficult for researchers to know exactly what they need to cite, or to find this information. In the time-squeezed world of research, it's understandable that some citations may be overlooked, and, unfortunately, the software's authors are then denied the credit they deserve. 

Robin proposed CITATION files as one way to address this issue. However, in a comment on that blog post, Iain Bethune noted that, for complex codes "a single CITATION file is not enough as the user would have to figure out which references need to be cited depending on what they did with the code." There are now a number of research software developers who have implemented ways to make citation of their software easier.

R's citation function

As mentioned by Robin, R's citation function outputs the required citation for R:

> citation()

To cite R in publications use:

  R Development Core Team (2008). R: A language and environment for
  statistical computing. R Foundation for Statistical Computing,
  Vienna, Austria. ISBN 3-900051-07-0, URL http://www.R-project.org.

A BibTeX entry for LaTeX users is

  @Manual{,
    title = {R: A Language and Environment for Statistical Computing},
    author = {{R Development Core Team}},
    organization = {R Foundation for Statistical Computing},
    address = {Vienna, Austria},
    year = {2008},
    note = {{ISBN} 3-900051-07-0},
    url = {http://www.R-project.org},
  }

Given a package name, the function displays the citation for that package. For example:

> citation('ggplot2')

To cite ggplot2 in publications, please use:

  H. Wickham. ggplot2: elegant graphics for data analysis. 
  Springer New York, 2009.

A BibTeX entry for LaTeX users is

  @Book{,
    author = {Hadley Wickham},
    title = {ggplot2: elegant graphics for data analysis},
    publisher = {Springer New York},
    year = {2009},
    isbn = {978-0-387-98140-6},
    url = {http://had.co.nz/ggplot2/book},
    }

Carl Boettiger provides an example of how the citations that correspond to the packages that have been used can be automatically collected and saved to a file.

Behind the scenes of R

R's functions create both plain-text and BibTeX citations for packages used. R package developers place a CITATION file, consisting of R function calls to build the citation, in their package. This CITATION file is then read when the citation function is invoked. An example of a CITATION file is:

citHeader("To cite ggplot2 in publications, please use:")

citEntry(entry = "book",
  author = "Hadley Wickham",
  title = "ggplot2: elegant graphics for data analysis",
  publisher = "Springer New York",
  year = "2009",
  isbn = "978-0-387-98140-6",
  url = "http://had.co.nz/ggplot2/book",
  textVersion = "H. Wickham. ggplot2: elegant graphics for data analysis. 
    Springer New York, 2009.")

If there is no such file, then a default citation for the package is auto-generated from other package meta-data.

GAP

GAP (Groups, Algorithms, Programming) is an open source computational algebra system. Inspired by Robin's post, they not only added a web page on "How to cite GAP", but also implemented a Cite function similar to that of R. This provides citations in a number of formats including plain-text, HTML, BibXML and BibTeX. The function provides the citation for GAP itself, or, if given a package name, the function displays the citation for that package. For example:

gap> Cite("modisom");
Please use one of the following samples
to cite modisom version from this installation

Text:

[EK13]  Eick,  B.  and  Konovalov,  A., ModIsom, Computing automorphisms and
checking isomorphisms for modular group algebras of finite p-groups, Version
2.1 (2013), (GAP package), http://www.icm.tu-bs.de/~beick/soft/modisom/.

HTML:

<p class='BibEntry'>
[<span class='BibKey'>EK13</span>] <b class='BibAuthor'>Eick, B. and 
  Konoval\ov, A.</b>,
 <i class='BibTitle'>ModIsom, Computing automorphisms and checking 
  isomorphism\s for modular group algebras of finite p-groups,
  Version 2.1</i>
 (<span class='BibYear'>2013</span>)
(<span class='BibNote'>GAP package</span>),
<span class='BibHowpublished'>
  <a href="http://www.icm.tu-bs.de/~beick/soft/mod\isom/">
  http://www.icm.tu-bs.de/~beick/soft/modisom/</a></span>.
</p>

BibXML:

<entry id="ModIsom2.1"><misc>
  <author>
    <name><first>Bettina</first><last>Eick</last></name>
    <name><first>Alexander</first><last>Konovalov</last></name>
  </author>
  <title><C>ModIsom</C>, Computing automorphisms and checking 
    isomorphisms for modular group algebras of finite p-groups,
         <C>V</C>ersion 2.1</title>
  <howpublished><URL>http://www.icm.tu-bs.de/~beick/soft/modisom/</URL></howpublished>
  <month>Nov</month>
  <year>2013</year>
  <note>GAP package</note>
  <keywords>modular isomorphism problem; automorphism group; isomorphism testing; 
  nilpotent algebras; nilpotent quotient; Kurosh algebras</keywords>
</misc></entry>

BibTeX:

@misc{ ModIsom2.1,
  author =       {Eick, B. and Konovalov, A.},
  title =        {{ModIsom}, Computing automorphisms and checking
                  isomorphisms for modular group algebras of finite
                  p-groups, {V}ersion 2.1},
  month =        {Nov},
  year =         {2013},
  note =         {GAP package},
  howpublished = {\href   {http://www.icm.tu-bs.de/~beick/soft/modisom/}
                 {\texttt{http://www.icm.tu-bs.de/}\discretionary
                 {}{}{}\texttt{\texttt{\symbol{126}}beick/}\discretionary
                 {}{}{}\texttt{soft/}\discretionary
                 {}{}{}\texttt{modisom/}}},
  keywords =     {modular isomorphism problem; automorphism group;
                  isomorphism testing; nilpotent algebras; nilpotent
                  quotient; Kurosh algebras},
  printedkey =   {EK13}
}

Behind the scenes of GAP

Citations are derived from package-specific information provided by GAP package developers in PackageInfo.g files, which contain the meta-data about their packages. An example is pkg/modisom/PackageInfo.g:

#############################################################################
##  
##  PackageInfo.g for the package `modisom'                      Bettina Eick
##  
SetPackageInfo( rec(
PackageName := "ModIsom",
Subtitle := "Computing automorphisms and checking isomorphisms for modular 
 group algebras of finite p-groups",
Version := "2.1",
Date := "29/11/2013",

Persons := [
  rec( 
    LastName      := "Eick",
    FirstNames    := "Bettina",
    ...
  ),
  rec(
    LastName      := "Konovalov",
    FirstNames    := "Alexander",
    ...
  ) ],
...
PackageWWWHome := "http://www.icm.tu-bs.de/~beick/soft/modisom/",
...
AbstractHTML := 
  "The <span class=\"pkgname\">ModIsom</span> package contains various 
method for computing with nilpotent associative algebras....",
...
Keywords := ["modular isomorphism problem",
             "automorphism group", 
             "isomorphism testing",
             "nilpotent algebras",
             "nilpotent quotient",
             "Kurosh algebras"]
));

CP2K

CP2K is an open source molecular dynamics code. When CP2K completes, a bibliography based upon the components is executed, and the required citations are printed.

-------------------------------------------------------------------------------
-                                                                             -
-                           R E F E R E N C E S                               -
-                                                                             -
-------------------------------------------------------------------------------
 
CP2K version 2.4 (Development Version), the CP2K developers group (2013).
CP2K is freely available from http://www.cp2k.org/ .

VandeVondele, J; Hutter, J. 
JOURNAL OF CHEMICAL PHYSICS, 127 (11), 114105 (2007). 
Gaussian basis sets for accurate calculations on molecular systems in
gas and condensed phases.
http://dx.doi.org/10.1063/1.2770708

Krack, M. 
THEORETICAL CHEMISTRY ACCOUNTS, 114 (1-3), 145-152 (2005). 
Pseudopotentials for H to Kr optimized for gradient-corrected
exchange-correlation functionals.
http://dx.doi.org/10.1007/s00214-005-0655-y

VandeVondele, J; Krack, M; Mohamed, F; Parrinello, M; Chassaing, T;
Hutter, J. COMPUTER PHYSICS COMMUNICATIONS, 167 (2), 103-128 (2005). 
QUICKSTEP: Fast and accurate density functional calculations using a
mixed Gaussian and plane waves approach.
http://dx.doi.org/10.1016/j.cpc.2004.12.014

Frigo, M; Johnson, SG. 
PROCEEDINGS OF THE IEEE, 93 (2), 216-231 (2005). 
The design and implementation of FFTW3.
http://dx.doi.org/10.1109/JPROC.2004.840301
...
-------------------------------------------------------------------------------

Behind the scenes of CP2K

A single FORTRAN file, src/common/bibliography.F, holds the complete citation list for CP2K. These are represented using the Web of Knowledge (formerly ISI) citation format. Each citation is given a unique key e.g. Dudarev1997. An example entry is:

CALL add_reference(key=Dudarev1997,ISI_record=s2a(&
  "PT J",&
  "AU Dudarev, SL",&
  "   Manh, DN",&
  "   Sutton, AP",&
  "TI Effect of Mott-Hubbard correlations on the electronic",&
  "   structure and structural stability of uranium dioxide",&
  "SO PHILOSOPHICAL MAGAZINE B",&
  "SN 0141-8637",&
  "PD MAY",&
  "PY 1997",&
  "VL 75",&
  "IS 5",&
  "BP 613",&
  "EP 628",&
  "UT ISI:A1997WX94300001",&
  "ER"),&
  DOI="10.1080/13642819708202343")

CP2K developers make FORTRAN function calls to register any citations relevant to their cod. For example, in src/dft_plus_u.F:

USE bibliography, ONLY: Dudarev1997,Dudarev1998,cite_reference

CALL cite_reference(Dudarev1997)
CALL cite_reference(Dudarev1998)

The citations registered during CP2K execution are then displayed when CP2K completes.

LAMMPS

LAMMPS (Large-Scale Atomic/Molecular Massively Parallel Simulator) is another open source molecular dynamics code. When LAMMPS completes, it outputs a log.cite file with a list of references corresponding to the specific features used:

LAMMPS (30 Sep 2013)

Please see the log.cite file for references relevant to this simulation

Lattice spacing in x,y,z = 1.6796 1.6796 1.6796
Created orthogonal box = (0 0 0) to (67.1838 67.1838 67.1838)
  1 by 1 by 1 MPI processor grid
Created 256000 atoms

The log.cite file contains citations in BibTeX format. For example:

This LAMMPS simulation made specific use of work described in the following 
references. See http://lammps.sandia.gov/cite.html for details. 
neighbor multi command: @Article{Intveld08,
 author =  {P.{\,}J.~in{\,}'t~Veld and S.{\,}J.~Plimpton"
 and G.{\,}S.~Grest},
 title =   {Accurate and Efficient Methods for Modeling Colloidal
            Mixtures in an Explicit Solvent using Molecular Dynamics},
 journal = {Comp.~Phys.~Comm.},
 year =    2008,
 volume =  179,
 pages =   {320--329}
}

Behind the scenes of LAMMPS

LAMMPS developers embed citations in C++. For example, in src/neighbor.cpp:

static const char cite_neigh_multi[] =
  "neighbor multi command:\n\n"
  "@Article{Intveld08,\n"
  " author =  {P.{\\,}J.~in{\\,}'t~Veld and S.{\\,}J.~Plimpton"
  " and G.{\\,}S.~Grest},\n"
  " title =   {Accurate and Efficient Methods for Modeling Colloidal\n"
  "            Mixtures in an Explicit Solvent using Molecular Dynamics},\n"
  " journal = {Comp.~Phys.~Comm.},\n"
  " year =    2008,\n"
  " volume =  179,\n"
  " pages =   {320--329}\n"
  "}\n\n";

These are registered during execution via an API, in a similar way to CP2K:

if (style == MULTI && lmp->citeme) lmp->citeme->add(cite_neigh_multi);

PETSc

PETSc (Portable, Extensible Toolkit for Scientific Computation) is a set of data structures and routines for solving partial differential equations. PETSc is used by TPLS, who I have been working with on an open call project. It was PETSc that reminded me of Robin's blog post and, in turn, prompted this one.

PETSc have introduced, in version 3.5.0, an automated citations feature. Running an application compiled to use PETSc and providing a "-citations optionalfilename" flag will save, in the file, BibTeX entries corresponding to the PETSc features used. For example:

$ ./ex19 -ksp_monitor -citations -pc_type hypre -pc_hypre_type boomeramg
...
If you publish results based on this computation please cite the following:
===========================================================================
@TechReport{petsc-user-ref,
Author = {Satish Balay and Jed Brown and Kris Buschelman and Victor Eijkhout and 
William D. Gropp and Dinesh Kaushik and Matthew G. Knepley and Lois Curfman
McInnes and Barry F. Smith and Hong Zhang},
Title = {{PETS}c Users Manual},
Number = {ANL-95/11 - Revision 3.4},
Institution = {Argonne National Laboratory},
Year = {2013}
}
@InProceedings{petsc-efficient,
Author = {Satish Balay and William D. Gropp and Lois Curfman McInnes and 
Barry F. Smith},
Title = {Efficient Management of Parallelism in Object Oriented Numerical 
Software Libraries},
Booktitle = {Modern Software Tools in Scientific Computing},
Editor = {E. Arge and A. M. Bruaset and H. P. Langtangen},
Pages = {163--202},
Publisher = {Birkh{\"{a}}user Press},
Year = {1997}
}
@manual{hypre-web-page,
title = {{\sl hypre}: High Performance Preconditioners},
organization = {Lawrence Livermore National Laboratory},
note = {\url{http://www.llnl.gov/CASC/hypre/}}
}
===========================================================================

$ ./ex19 -ksp_monitor -citations -pc_type lu \
-pc_factor_mat_solver_package superlulid velocity = 0.0625, \
prandtl # = 1, grashof # = 1
0 KSP Residual norm 2.358581702743e-01
1 KSP Residual norm 7.147839725241e-17
0 KSP Residual norm 2.309061316849e-05
1 KSP Residual norm 2.989519344266e-21
Number of SNES iterations = 2
If you publish results based on this computation please cite the following:
===========================================================================
...
@article{superlu99,
author = {James W. Demmel and Stanley C. Eisenstat and John R. Gilbert and 
Xiaoye S. Li and Joseph W. H. Liu},
title = {A supernodal approach to sparse partial pivoting},
journal = {SIAM J. Matrix Analysis and Applications},
year = {1999},
volume = {20},
number = {3},
pages = {720-755}
}
==========================================================================

Unlike LAMMPS and CP2K, the user has to explicitly provide the command-line flag, requesting that the citations be output.

Behind the scenes of PETSc

PETSc developers embed citations in C, for example, in src/dm/dt/interface/dt.c:

const char GaussCitation[] = "@article{GolubWelsch1969,\n"
  "  author  = {Golub and Welsch},\n"
  "  title   = {Calculation of Quadrature Rules},\n"
  "  journal = {Math. Comp.},\n"
  "  volume  = {23},\n"
  "  number  = {106},\n"
  "  pages   = {221--230},\n"
  "  year    = {1969}\n}\n";

These are registered during execution via an API, in a similar way to CP2K and LAMMPS:

ierr = PetscCitationsRegister(GaussCitation, &GaussCite);

Alternatively, in src/mat/impls/aij/seq/superlu/superlu.c:

ierr = PetscCitationsRegister("@article{superlu99,\n 
 author  = {James W. Demmel and Stanley C. Eisenstat and\n             
 John R. Gilbert and Xiaoye S. Li and Joseph W. H. Liu},\n  
 title = {A supernodal approach to sparse partial pivoting},\n  
 journal = {SIAM J. Matrix Analysis and Applications},\n  
 year = {1999},\n  volume  = {20},\n  number = {3},\n  
 pages = {720-755}\n}\n",&cite);CHKERRQ(ierr);

PetscCitationsRegister is not just available to those developing components for use within PETSc. It can also be called by anyone developing applications that use PETSc. This allows researchers developing applications that use PETSC, e.g. TPLS, to register their own citations so they are output as part of the citations list when PETSc completes.

Debian

Debian Pure Blends projects develop Debian flavours to support specific target communities. For example, DebianScience develops package suites targeting specific science communities e.g. astronomy, chemistry, biology or high-energy physics. A Debian Enhancement Proposal is underway to collect meta-data from these packages and ensure it is readily available. One motivation is to collect and display bibliographic information about which papers to cite when using these packages.

A service called Umegaya - Umegaya is a MEtadata GAtherer using YAml - gathers meta-data from the Subversion or Git repositories holding the packages, within a special debian/upstream/metadata file. This information is aggregated and outputted as a table or YAML that can be loaded into central information hubs like the Ultimate Debian Database.

Oh citable research software, how shalt I write thee?

In their paper on "Accurately Citing Software and Algorithms used in Publications", the PETSc dvelopers, Knepley, Brown, Curfman McInnes, and Smith comment that:

We believe approaches such as [PETSc's citations feature] should be adopted by the entire open source scientific software community to ensure that full and accurate citations are made for libraries used in scientific applications.

The Institute seconds this! So, if you're developing research software, you might want to consider implementing support for citations within it.

If developing extensible, pluggable frameworks, you could adopt solutions similar to those of CP2K, LAMMPS or PETSc. You could provide, as part of your pluggable APIs, support for specifying meta-data that includes citation information. The framework can then collect this citation information and present it to the user.

If you are using one of these pluggable frameworks, then make your users aware of their citation-related features, and use them yourself, if, like PETSc, they allow you to. If the use of such frameworks is hidden from your users, then ensure that the citations are still exposed to them - for example, via wrappers developed by yourself to expose these features. Remember to add any citations of your own!

If you are developing your own stand-alone software you could support a command-line flag that prints out your own citations in plain-text or BibTeX formats.

And, if very pressed for time, just print a message to the command-line, or pop-up a dialog, listing the papers your users need to cite.

Finally, as Robin proposed, there is always the plain-text CITATION file.

If you are aware of any other research software that provides automated support for its citation, or any ideas as to how software can be written to do this, please comment below.

Acknowledgements

This blog post reworks parts of Robin Wilson's post and is based in part on the comments on that post by Iain Bethune (CP2K), Alexander Konovalov (GAP) and Michael Crusoe (Debian).

Posted by m.jackson on 30 July 2014 - 3:00pm

Add new comment

The content of this field is kept private and will not be shown publicly.
By submitting this form, you accept the Mollom privacy policy.