By Dr. Róisín Moriarty Tyndall Centre for Climate Change Research.
I recently collaborated on a massive data collection exercise. Why? The pursuit of knowledge and run-of-the-mill scientific endeavor, that’s why! In truth, we needed data: the datasets of interest had yet to be compiled. On a pretty limited timescale and with almost no budget, we had no choice but to bite the bullet and compile the data ourselves. What happens when early career researchers pick up an idea from within their community and run with it? We might not have set the world of scientific research alight, but we achieved more than we set out to, and new data is now in the public domain. That is in the wider scientific interest and good for everyone. Hey, we made a contribution, and that is always good.
I work as ocean biogeochemical/ecosystem modeler and my main interest is the biological component of the ocean carbon cycle. I want to know the amount, type and quality of carbon that macrozooplankton (large zooplankton like krill, amphipods, jellyfish and a myriad of other beasties) ingest and assimilate (turn into biomass by incorporating into their bodies) and what happens to that carbon once the macrozooplankton are finished with it.
During my undergraduate biogeochemistry lectures biota (the total collection of organisms) in the oceans were treated as a biogeochemical black box. I was intrigued. I found it hard to believe that the importance of the biology could be so easily dismissed. After some more in-depth discussion with the lecturer, it became clear that people were not dismissing marine biota as having a limited effect, or no effect at all. It was being dismissed because biota was seen as such a small part of the total ocean inventory of carbon. This was compounded by the fact that biological systems are very complex - even on the level of an individual plant or animal, not to mention the huge complexity of an entire ecosystem. (If you want to learn more about phytoplankton and zooplankton, watch this short video which gives a good description of the role that biology they play in the ocean carbon cycle.)
Modelling the macrozooplankton requires a few things. First, a model is needed that captures enough zooplankton processes and ecosystem structure, to give a realistic interpretation of the processes involved. Data is needed to allow the processes in the model to be described: the what, how much, and how fast macrozooplankton eat, respire, grow, excrete, egest and die. Once you get the model to work and start generating results, you need a data set that helps to figure out if your model is making realistic predictions. For obvious reasons, this comparison to the real world, or model validation, cannot use the same data that you used to parameterise your model.
The model I was working with, a Plankton Functional Type (PFT) model, was a pretty new development at the time. There were no global datasets of the information I required to carry out my investigation into the role of macrozooplankton in the global carbon cycle. It took a long time, but eventually I had a parameterisation dataset and I could begin the modeling part of my project. I began to use and understand how the model worked, and more specifically how our model ecosystem worked. Once I started to get realistic results the race was on to validate the predicted macrozooplankton biomass concentrations and distributions with real world data. I was not the only one stuck with this predicament, there were several other people working in my group and as part of a wider network, the Dynamic Green Ocean Project that were also interested in gathering and using datasets that would allow the validation of PFTs in the global ocean biogeochemical models.
In October 2008 we gathered a small bunch of people who had a vested interest in the gathering and synthesis of data for biogeochemical model validation. The meeting was held over two days at the British Antarctic Survey, Cambridge, UK (the outcomes of which were published in EOS Transactions American Geophysical Union). This meeting was successfully brought key people together to discuss possibilities and limits that helped to frame the path ahead. It also brought the who do the hard work - collection, identification and analysis of plankton specimens - into contact with ocean modelers who work with a data that can appear rather different to the data that was originally collected. There were small hang-ups on both sides but, we had a plan and the wider plankton community had helped shape how best to proceed.
It is very difficult to publish a dataset or a data paper as opposed to a discussion paper. In order to submit a dataset to a data repository, it is necessary to give details of the scientific paper associated with the data. So now that we had the datasets, we still had a bit further to go. We knew the value of our data, as did PANGAEA (a data publisher for earth and environmental science) who were interested in archiving our datasets. We also knew that there was a new up-and-coming data journal, Earth System Science Data (ESSD) that would help us to make it available. Late in 2010, PANGAEA and ESSD were approached with the idea of a special issue MAREDAT (Towards a World Ocean Atlas of MARine Ecosystem DATa). Stephane Pesant, who had been involved in the project since its implementation by the Dynamic Green Ocean Project and co-authored the EOS paper, was asked to act as a guest editor.
It was clear from the start that both the data, at PANGAEA, and the data description papers, at ESSD, which would make up the special issue would go hand-in-hand. It meant that we could make the dataset available through PANGAEA each dataset receiving a doi (digital object identifier; equivalent to a publication) and the metadata, methodology and caveats associated with the datasets would be documented in the data paper, which would be a useful reference to anyone that wanted to use the data.
Meike Vogt, with the help of Erik Buitenhuis and the lead authors, drew up a list of requirements for manuscripts that might be considered for inclusion in the special issue. Having a common core to all the manuscripts made them much more informative and it ensured that the very different types of data could be assessed and then compared. Different papers discussed very different types of organisms, at different taxonomic levels and incorporated a myriad of collection and analysis techniques. The common goal (provide PFT biomass coverage for the global ocean) and core of these papers helped to drive the special issue forward. It also enabled a meta-analysis across PFT groups in the introductory paper.
After endless battling with referees, great support from co-authors and the editorial staff at ESSD and the invited editors for the MAREDAT special issue and finally, finally the special issue was completed. The MAREDAT special issue covered 11 PFTs and one dataset of phytoplankton pigment data. Our overall goal is to provide, in due course, global gridded data products with coverage of all planktic components of the global ocean ecosystem. The special issue was the first step towards achieving this.
Once all the hard work was done we had to push again to get the wonderful MAREDAT datasets out there and talked about – the work never ends, but a change is as good as a break they say! There is sustained interest, and the papers and data are being cited in the scientific literature. Of course the hunt for big data is in vogue now and most research councils are pushing individuals and institutes to make the datasets the have collected to be made freely available.
The ideas that the research published came about, as quite a lot of new developments in science do, collaboration between the old(er) guard (i.e. PhD supervisors) and the more junior members across the community. Biological oceanographers have collected datasets for more than 200 years, but it often takes a new perspective (and young researchers) to breakdown barriers. We did not do it all alone, we had strong and determined leadership and input from the wider community through the Dynamic Green Ocean Project. I think that Early Career Researchers have a huge amount to offer. Time is limited throughout academia, but early career Researchers can push the boundaries, self-organise and collaborate effectively, to help develop disciplines and lines of enquiry in their fields that are not much older than their PhD projects.