22-24 May 2014, LGGE, Grenoble, France
By Leanne Wake, SSI Fellow and Anniversary Research Fellow, Department of Geography, University of Northumbria
- Data (e.g. sea level index points, marine limits, ice extent) generally exists in isolated silos with individual researchers. Often repeated, not necessarily up-to-date or complete.
- Conference delegates realized the need and benefits of a ‘living, breathing’ database, rather than data existing in static, segmented, repetitive, isolated ‘archives’
- Funding opportunities (e.g. from Quaternary Research Association (QRA), ca. £1000 and Scientific Committee on Antarctic Research (SCAR)) could be accessed to employ a software engineer to set up a basic resource that may later be developed by a call to the SSI.
- Exemplars suggested for a database include: PANGAEA (Alfred Wegener Institute), NSIDC (National Snow and Ice Data Centre, USA) and MaNIDA. (Alfred Wegener Institute)
The event was specifically engineered to be an open interaction between data scientists and modellers working on the evolution of the Greenland and Antarctic Ice Sheets during the Pleistocene. I attended both out of scientific interest and as an SSI Fellow. The opportunity for data and model people to come together and iron out issues, misunderstandings and highlight questions for exploration was the main driver behind the organization of the workshop.
I was drawn to a session entitled “Consider what can be done to make paleo records and model output more accessible to various users” in which I delivered a presentation suggesting how we might kick off such a project and what assistance was available through the Software Sustainability Institute. International leaders in the fields of Quaternary climate and ice sheet and sea level modelling attended the event. I presented background to the issues with regards to data sharing in the wider scientific sector. In general, most are willing to share data and actively do this in order to support and evaluate modelling output. Speaking privately to participants, many identified with concerns relating to ‘scooping’, data misuse and citation issues associated with hosting data in an open access resource. A schematic workflow of such a resource that could address these concerns was outlined in the presentation. During the discussion, we identified current practices within our ‘small’ community, and a number of points were discussed and action points tentatively agreed. Conference organisers Prof. Antony Long and Dr. Pippa Whitehouse (Durham University) produced a ‘What If…in Five Years’ presentation. Aside from desired scientific achievements, two points stood out in which the Software Sustainability Institute may be able to provide assistance:
- ‘Good documentation for the other side’: Many participants had their own database of proxy sea level and ice sheet constraints and used it to support modelling output. Data are commonly gathered by approaching individual authors and asking them to share their data. In this current form data therefore exist as stagnant ‘archives’ rather than open, accessible evolving databases. This is not acceptable for some types of climate data, such as dates gained from calculating the age of exposure of previously ice-covered terrain by measuring 10Be - 26Al concentrations. These are types of measurements which must be constantly updated and recalculated to reflect modern-day cosmogenic nuclide production and differences in calculation methods (standardization) and therefore cannot exist inside an ‘archive’. One such online calculation method is an online resource, ‘Cronus’ which can be used to update exposure dates. For those not familiar with data processing of exposure dates, they would benefit form a centralized, living database which used this type of calculator as a ‘plug-in’ to update exposure dates contained therein.
- ‘Treasure Maps’: On the other side of the coin, access to model output for ‘data’ scientists is also vital. One role of the data scientist is to produce observations that support or disprove model output. But how do data scientists know where to situate targets for data collection, in order to provide such valuable constraints? Again, full, open access to model output (e.g. global sea level maps and modelled ice extent) can address this yet no facility currently exists within this community in a shape where it is user friendly to those without programming experience. Nor is it generally any more complicated than providing a .pdf graphic of model output. Access to so-called ‘Treasure Maps’ is an important driver of ‘proof of concept’ research.
As part of the follow up to this conference, it was tentatively agreed that a sub-group of participants would embark on a data collation exercise with the aim of creating a database specifically geared for both computational and field scientists in the field of glaciology (e.g. resource is sufficiently accessible, understandable and documented so that users are able to understand the strengths and weaknesses of model producing output and constraining data).
How big should a database be?
- Parameter specific (e.g. dated glacial moraines, modelled extent, both)?
- Time-period specific (Last Glacial Maximum to present)?
- Geographically specific (Greenland/Antarctic/both)?
Delegates also prioritized criteria such as a ‘credit system’, ‘submission/download alerts’ and consistent data standardization if they were to submit data to such a site.
As the north east of England is home to a number of institutions and academics actively involved in research in this area, it was agreed that it was practical that colleagues at the universities of Durham and Northumbria should lead the initiative. Delegate Dr. Bethan Davies (University of Reading) already hosts a successful online glaciology resource, partly funded by Scientific Committee on Antarctic Research (SCAR) and the Quaternary Research Association (QRA) and built by Stefan Senk.
It was highlighted that PANGAEA, although an excellent resource was found to be overfull with many forms of climate data. A delegate suggested the development of a ‘Finding’ tool to target and search such databases - thus intimating that although this database stores a vast quantity of relevant data and is updated sustainably, it falls short of being intuitively searchable.