By Laurent Gatto, Software Sustainability Institute Fellow.
This past week saw the yearly Bioconductor conference take place at the Dana-Farber Cancer Institute, Boston, MA. It started with a Developer Day on July 30th and continued with scientific talks and workshops until August 1st.
Bioconductor is an R-based open-source, open-development software project that provides tools for the analysis and comprehension of high-throughput genomics data. It was set up in 2001 by Robert Gentleman, co-founder, alongside Ross Ihaka, of R and is overseen by a core team based primarily at the Fred Hutchinson Cancer Research Center in Seattle, WA and by other members coming from a range of other US-based and international institutions.
From a programming point of view, the Bioconductor benefits from the features of the R language, including a high-level and expressive language to easily and quickly prototype new computational methods. In addition, there is a well-established system for packaging together software, data and annotation with documentation, and state-of-the-art support for statistical computing, data mining and visualisation.
Since the very beginning of the project, special emphasis was put on documentation and reproducible research. Each Bioconductor package must include a vignette, a dynamically generated document that provides a task-oriented description of package functionality. The project also promotes good practices in software development. For instance, unit tests are highly recommended for inclusion in packages. All the software released through the Bioconductor project is open source and distributed through a public subversion server in addition to standard R packages. Developers and user are invited to contribute to the project by submitting their own packages which get individually reviewed before acceptance.
One of the major strengths of the Bioconductor project is that it brings together a wide range of skills, including statisticians, computational biologists, computer scientists and biologists. The collaborative nature of the project is also reflected is the interoperability of its component by promoting re-using existing packages and classes. Finally, the project is also dedicated totraining researchers in computational and statistical methods for the analysis of biological data.
The Bioconductor project is extremely well regarded in the field of computational biology and has produced some of the most respected software used within Biology. It provides a friendly and constructive environment for both beginners and experienced computational biologists and bioinformaticians.