By Stephen Eglen and Laurent Gatto, Software Sustainability Institute Fellows.
R is a well-established environment for statistical computing. It is often seen as an alternative to computing environments such as matlab or python. In this post, we give our five reasons for why we chose to use R for research.
R generates beautiful graphics with minimal effort. Publication-quality plots can be rendered in a wide range of vector- and raster-based formats. Recent extensions to the plotting system allow for complex visualisations to be expressed succintly. See R Graphics Gallery for example plots along with the code that generated the plots.
R comes with a robust packaging system to allow developers and domain experts to easily distribute their code. Packages come complete with documentation, vignettes (see point 3 below), and data files. Windows and Mac users can download packages in binary form, where C and Fortran code is pre-compiled. As of January 2013, the Comprehensive R Archive Network (CRAN) contains 5088 packages. The packaging build system is rigourous to ensure that packages will work for for other users. Within the field of Computational Biology, the Bioconductor project has 749 packages. Many packages accompany scientific papers within Computational Biology such that it typically takes under a minute between reading about a method in a paper and using it.
3. Reproducible research
Reproducible research is a key strength of R. It has several ways of supporting literate programming, where source files contain both code and documentation. Figures and tables for papers can be generated dynamically when, for example, compiling a LaTeX document. We currently favour the knitr system for efficiently generating reproducible research documents such as these two papers: A data repository and analysis framework for spontaneous neural activity recordings in developing retina and R for Proteomics.
4. A strong open-source community
R has been publically available for over 20 years, and has regular release cycle driven by a stable core team of about 20 developers. Many of these developers are senior academics with expertise in robust statistical computing. The software has a yearly release cycle, with point releases during the year to handle any serious issues.
5. Programming with data
Although R can be treated as just a regular programming language, a key strength is working interactively with it at the command line to analyse data. There are many features of the language suitable for rapidly processing data. For example, the data frame concept provides a convenient way of working with tabular data of mixed types. The system is also robust to missing data.