Hameau de l'Etoile, 18-22 June, 2012.
By Elisa Elisa Loza, Agent and scientific statistician, Rothamsted Research.
Evolutionary Biology is the study of the processes that gave rise to the diversity of life on Earth.
Formal mathematical, statistical and computational methods are required to analyse the increasing amount of molecular sequence data that arise in evolutionary studies.
MCEB2012 was an opportunity to network with other scientists, to learn about current progress towards developing the tools required in studies of evolution, and to understand the challenges ahead.
Download Elisa's presentation from the conference.
The Mathematical and Computational Evolutionary Biology 2012 (MCEB2012) meeting was held 30 km North of Montpellier in France. The focus was on the mathematical and computational methods for the study of evolution. The setting was ideal for nature and science lovers: an exceptionally beautiful 12th century hamlet surrounded by French countryside, equipped with function rooms for seminars and conferences, and distinguished for its cuisine. The organisers limited the number of attendees to favour interaction between participants and allowed plenty of time for discussion after each session. Not much more can be asked of a scientific meeting!
The event was attended by around 60 people, all scientists mostly from across Europe but also from further afield locations such as Canada, New Zealand and the USA. Only seven people, including myself, attended from the UK and they represented the following institutions: University of East Anglia, University of Manchester, the European Bioinformatics Institute and University of Edinburgh. The participants were at a variety of stages in their career, from PhD students and postdocs to senior scientists. As the subject of this event was much specialised, it was not surprising to see the usual group of scientists (and some of their usual research interests) as in past meetings. However, I was pleasantly surprised to find several new faces and to hear about fresh new approaches to the existing challenges in the field.
This meeting was held over the course of five days; each day started with two one-hour-and-a-half lectures that introduced a field of research including approximate Bayesian computation, phylogenetic networks, the gene/species tree dichotomy and the coalescent theory. The afternoon comprised short presentations and posters outlining recent scientific progress.
MCEB2012's opening lecture was delivered by Vincent Moulton, who introduced phylogenetic networks and presented some of his latest results. One of the leading software tools in this area is SplitsTree4; a user-friendly and visually pleasant application for computing unrooted phylogenetic networks from molecular sequence data. SplitsTree4 was co-written in Java by Daniel Huson, from the Tubingen University in Germany. Although not directly related to phylogenetic networks, Vincent and his colleagues at the Universtiy of East Anglia, UK, recently released the UEA sRNA workbench; a suite of tools for analysing and visualising next generation sequencing microRNA and small RNA datasets.
Arnaud Estoup, from INRA France, presented his DIYABC software. DIYABC is a computer program with a graphical user interface and a fully click-able environment that allows population biologists to make inferences based on Approximate Bayesian Computation (ABC). ABC is a computational technique that bypasses exact likelihood calculations by operating on summary data, such as population mean or variance. It allows making broad inferences at reduced computational cost relative to the effort required if all available data were analysed in detail. ABC is especially useful when exact computation of the likelihood function is prohibitive, or in cases where suitable likelihoods are unavailable. DIYABC can be used to fit many complex evolutionary scenarios including admixtures and population size changes.
Simulation experiments are used throughout computational evolutionary biology to make model comparisons, validate new methods and argue for particular hypotheses. Tanja Stadler, from ETH Zurich, has developed an R package to simulate phylogenetic trees under a constant-rate birth-death process. She subsequently used this simulation package to study the rate of mammalian evolution, from over 33 millions of years ago to present day. A second R package that resulted from her study is TreePar; a package for estimating maximum likelihood speciation and extinction rates.
Alexandros Stamatakis, from HITS Heidelberg, presented his impressive work on high-performance phylogenetics. Phylogenetics is the study of the evolutionary relationships between organisms using graphical, computational and statistical methods. Alexandrosí techniques for adapting phylogenetic computations to modern hardware architectures and for reducing memory consumption are implemented in the popular software package RAxML. A blog post that I wrote about RAxML can be found in on the Institute's website. Another piece of software for phylogenetic analysis that was presented during this meeting was BEAST. From a statistical perspective, RAxML and BEAST differ in their approach to the same problem; RAxML draws inferences about parameters of interest (e.g. evolutionary relationships between organisms, rate of evolution) based solely on the observed data (e.g. DNA sequences of a set of organisms) while BEAST makes use of both the observed data and any prior information that the user may have on the ëtrueí values of the parameters of inferential interest. In Statistics, the first approach is called frequentist while the latter is Bayesian.
Eric Tannier, from INRIA France, talked about an algorithm that he has co-developed to study the evolution of relationships between genes by speciation, gene duplication, gene loss or rearrangement. His algorithm is available in the software DeCo.
The poster session in MCEB2012 presented several interesting computer programs for evolutionary studies including
RevBayes, an R-like environment for Bayesian phylogenetic inference that allows you not only to analyse molecular sequence data but also to simulate sequence data under hypothetical scenarios;
Armadillo, a workflow platform for designing and conducting phylogenetic analyses and simulations;
Dawg 2.0, an application to simulate molecular sequence data.
Overall, I found this meeting extremely useful and enjoyable. The key message that I took from it was that molecular sequence data is not at shortage anymore. Sequencing technologies currently produce vast amounts of data at very reasonable costs. On the contrary, formal mathematical/statistical methods and efficient computational tools to analyse these data are increasingly required. MCEB2012 was an opportunity to understand what the gaps in this field of research are, and to hear about the communityís recent efforts in filling them up.