Docker helps biofuels research

Posted by s.hettrick on 30 July 2015 - 9:31am

By Scott Edmunds, Executive Editor at GigaScience.

With greater awareness of the difficulties in making scientific research more reproducible, numerous technical fixes are being suggested to move publishing away from static and often not reproducible papers - which have changed little since the 17th century - to more reproducible digital objects that better fit 21st century technology. New research in the Open Access journal GigaScience demonstrates a potential approach through publishing open data and code in containerised form using Docker, and also allowing scientists to tackle another scourge of the 21st century – climate change, through better understanding of the production of biofuels.

One of the most promising areas in biofuel development is biogas, which has huge potential as a renewable and clean source of energy. Biogas is the production of methane gas through the anaerobic digestion (fermentation) of organic matter such as agricultural or food waste. Detailed knowledge on the functioning of the fermentation process is key for optimising this process. However, the vast majority of the microbes involved remain unknown and cannot be cultivated in laboratories.

In new research just published in GigaScience, researchers from Bielefeld University in Germany have now characterised the complex communities of micro-organisms in a biogas plant that generates heat and power from maize silage and pig manure. The authors made their research more reproducible by disseminating their data and tools​ in a Docker container.

For their study, the researchers carried out metagenomic and meta-transcriptomic analyses, which resulted in the generation of DNA and RNA sequences from the thousands of microbial species present. From this they were able to create a catalogue of 250,000 genes that enabled them to begin defining the underlying biology of methane production. While this data production only scratches the surface of the vast amount of information gathered, the authors furthered the usefulness of this resource by releasing all of the data and computational methods as a shareable docker container to enable others to execute the same analyses in the cloud. This not only makes the research reproducible, but also allows researchers around the world to build on these resources to more rapidly delineate the important processes involved in biogas generation and to better explore its use for biofuel.

As experiments become more data-intensive, reviewing and publishing the methods and results of scientific studies become increasingly challenging. To get around this the authors used Docker, which effectively wraps software in a system that includes everything needed to rerun it. This removes the need for other researchers to install and maintain the many complex bioinformatics tools and software libraries: something that can be very technically challenging for researchers without the computational resources and skills.

Bioboxes for Biogas

Reviewing and publishing the methods and results of scientific studies becomes increasingly challenging, especially as experiments become more data-intensive. To ensure reproducibility, journals are increasingly asking authors to make their code and data publicly available. Nevertheless, complex analysis workflows with their dependency on certain versions of bioinformatics tools and software libraries are not trivial to install and maintain. "We decided to use virtualisation techniques to encapsulate our analysis workflow and make it basically independent from the host it is executed on" says Andreas Bremges, first author of the study. "We containerised our analysis workflow in Docker which can be executed virtually anywhere".

Reproducibility is an important aspect of science, and one that GigaScience is trying hard to tackle, highlights Peter Li, Lead Data Manager at GigaScience, who undertook the step of trying to exactly recreate the results in the paper. "Andreas and his colleagues provided a Docker container that encapsulated the method used to process the data from their biogas study. This made my job of checking the reproducibility of their results much easier as their Docker container took care of installing the bioinformatics tools and their dependencies on my cloud server".

"We like the idea of Docker to ensure reproducibility of analysis workflows so much that we adopted the approach in our CAMI challenge" says Alexander Sczyrba, senior author of the study and one of the organisers of CAMI, the Critical Assessment of Metagenome Interpretation. In collaboration with nucleotid.es we started the bioboxes project to standardise interchangable bioinformatics software containers. Peter Belmann, core team member of bioboxes, helped in building the Docker container for the biogas study. "The container for this study is not yet bioboxes-conforming, but the next step will be to define a bioboxes standard for this kind of workflow". The bioboxes community is currently gathering feedback on the standards they are developing, and contributions are welcomed via their GitHub RFC page.

References

1. Bremges, A., Maus, I., Belmann, P., Eikmeyer, F., Winkler, A., Albersmeier, A., Puhler, A., Schluter, A., Sczyrba, A.: (2015) Deeply sequenced metagenome and metatranscriptome of a biogas-producing microbial community from an agricultural production-scale biogas plant. GigaScience 4:33 doi:10.1186/s13742-015-0073-6

2. Bremges, A., Maus, I., Belmann, P., Eikmeyer, F., Winkler, A., Albersmeier, A., Puhler, A., Schluter, A., Sczyrba, A.: Supporting data and materials for “Deeply sequenced metagenome and metatranscriptome of a biogas-producing microbial community from an agricultural production-scale biogas plant". GigaScience Database (2015).