By Rosa Filgueira, Research Assistant, School of Informatics, Data Intensive Research Group, University of Edinburgh.
I attended for the first time the American Geophysical Union (AGU) fall meeting in December 2015 in San Francisco. The main purpose of AGU is to promote discovery in Earth and space science for the benefit of humanity. In AGU, you can find a wide range of scientific communities (e.g. Geology, Meteorology, Oceanography, Computer Science, etc.) presenting their latest work in their fields. And normally, the number of attendees is huge! This year, there were 25,000 attendees, which means a lot of people to interact with, and a lot of work and presentations to learn about. My view is necessarily a small fragment of the total.
My background is in Computer Science, more specifically in High Performance Computing. However, during the last four years, I have been working in three data-intensive challenging projects, called EFFORT, VERCE and ENVRIplus, which all have Earth science research goals. These projects have given me the opportunity to address multi-disciplinary challenges with Rock Physics and Geosciences communities by enabling them access to HPC resources via advanced services and tools such as workflows and scientific gateways. Therefore, attending AGU gave me the opportunity to present our latest work, “dispe4py: An Open Source Python Framework for Encoding, Mapping and Reusing Continuous Data Streams” , that we developed during the VERCE project . This talk was presented at the “Enabling Scientific Analysis, Data Reuse, and Open Science through Free and Open Source Software” session, where I met new people interested in dispel4py , and learnt from different talks, which I have summarised in the following link . In that link, you will find more summaries from other sessions, including “Data ought not to be in the darkness: They should be open, accessible, transparent and reproducible” and “Big Data in Earth Science: From Hype to Reality”.
However, in this post I review three works that were presented at the same session as my talk, which reflect very well the attitude towards software among AGU members, and illustrate the current issues relating to software.
The first was presented by Ethan Davis (University Corporation for Atmospheric Research), “Unidata: 30 Years of FOSS for the Geosciences”. Unidata's core mission is to serve academic research and education communities by facilitating access and use of real-time weather data . To this end, Unidata develops, distributes, and supports several Free and Open Source Software (FOSS) packages. These packages are largely focused on data management, access, analysis and visualization. From my point of view, this talk was extremely useful, since Ethan gave numerous recommendations for good practice in terms of software development and community engagement. For example, to sustain software projects, Ethan remarked that software should be free and open source, it needs good software engineering practices (e.g. source-code version control, with regression and usability testing) and it should engage and activate its user’s community. About how to engage the community, Ethan gave greater insight into the subject, highlighting that, software projects should try to foster high levels of community ownership and participation, build connections within the community, support community integration, and give user support. He remarked that all staff should engage with the community “Be part of the community!”.
The second work is “Development practices and lessons learned in developing SimPEG”, presented by Rowan Cockett (University of British Columbia). SimPEG  is an open source python package for simulation and gradient-based parameter estimation in geophysical applications. Its main goal is to support a community of researchers with well-tested, extensible tools, and to encourage transparency and reproducibly. SimPEG provides an effective framework for scientists to write and share their equations and models, which is flexible enough and increases the speed of their research. Rowan stressed the importance of building a community to conserve and improve the open-source software and to facilitate the research of a community of scientists with common technical challenges.
The last is “OntoSoft: An Ontology for Capturing Scientific Software Metadata”, presented by Yolanda Gil (University of Southern California). Yolanda pointed out that scientists value their software much more when they can share it. For example, there are a large number of research resources for developing first versions of stimulation software. However, there are very few geosciences repositories for data preparation and visualisation. OntoSoft , is an ontology to describe metadata for scientific software. The ontology has been designed considering how scientists would approach the reuse and sharing of software. They have used OntoSoft to create a software registry for geosciences, and to develop user interfaces to capture their metadata. Yolanda finished her talk by giving us two key messages: 1) there is a big need to keep data and software together, 2) and the best way to preserve scientists’ software is by creating a community and teaching them (in training sessions) how to produce/reproduce software, e.g. by using provenance tools and persistent URLs. More information about this talk and others can be found at the link .
The three works share several points of view and highlight the importance of building a user community and engaging with it for long-term success in open source projects
To finish this post, I highlight two other AGU events, which I found very interesting and I highly recommend for future AGU attendees.
The first one is the presentations given at the Exhibition Hall. I specially liked the ones by NASA and Google Earth booths. Both booths gave high-level introductions to their current projects in a very attractive format (quick presentations, nice pictures and educative talks for all kinds of backgrounds). For example, with the “The Lunar Mapping and Modeling project” , NASA presented a new interactive web-based tool that incorporates observation from past and current lunar missions creating the most comprehensive lunar research website to date. In the following link  you can find pictures and links about this project and others, which I collected at the Exhibition Hall.
The second event that I really enjoyed is the posters at the Poster Hall. I found that poster sessions are the easiest way to interchange ideas/opinions and get/give feedback directly. During different poster sessions I learnt about different tools and projects, which are very relevant to my work. Some examples are: “Obspy”, “EarthCube”, “Light-weigh Parallel Python Tools for Earth System Modeling Workflows”, Rosetta: Ensuring the Preservation and Usability of ASCII-based Data into the Future . All of these are good examples of tools, which have a big user community and in my opinion would be worth a follow up. You can find more information about these and other posters collected during several days at the Poster Hall in .