The Wavelength 2013 Conference
Glasgow, Scotland, 11-13 March 2013
By Robin Wilson, SSI Fellow and a PhD Candidate at the University of Southampton
- Software is very important in the field - with a wide range of commercial and open-source tools being used.
- More of the delegates than I expected (80%) have programmed as part of their PhDs, but very few use good programming practices such as version control
- Processed data from satellites may become admissible in international courts within the next decade (within the REDD+ framework), which requires very robust validation and reproducibility of the software tools used.
- Many people had not really considered reproducibility, but my presentation encouraged them to think about it - and people said they will do things differently in future because of this
- A priority in the field is now the development of algorithms which can work completely automatically on large national or global datasets, and long time-series, to extract useful data.
The Wavelength 2013 conference is the Remote Sensing and Photogrammetry Society's student and young professional conference. The majority of attendees are PhD students, with some ECRs and MSc students as well. The views and experiences of these students are key as they are the future of the discipline.
My presentation at the conference was entitled "Software Sustainability and Reproducible Research in Remote Sensing" and is available (with extra resources) at my blog,
and it was received very well by the delegates. I think I successfully managed to engage the whole range of people - not just those who write code. The discussion questions at the beginning went well and made people think about these issues: around 5 out of 40 people thought they could reproduce their research (the others got scared that they couldn't!) and around 30 out of 40 people thought their data would be usable in 20 years time. Questions
at the end focused on the legal aspects of releasing code (IP, patents etc) and how to convince supervisors and PIs that this is the right way to go forward - I directed some of these people to the SSI for more information. A lot of people said they were going to do things differently from now on - or at least keep my advice in mind. Some said they were going to go home and do some of the practical ideas I gave at the end of the presentation - which showed the benefit of those slides. All SSI leaflets and pens were taken, and a number of people were keen to contact the SSI for advice on certain things (eg. the legal issues), and would pass the information on to others (eg. PIs working on specific bits of software).
I asked delegates to fill out a questionnaire and 35 people responded: 27 PhD students, 6 MSc students, and 2 commercial researchers/consultants. The results show that:
- Software usage is split fairly widely between various tools, both commercial and open-source
- More people are programming than I thought (80%), and over half of these have received formal training in programming: probably more than in many scientific fields. Usage of good programming practices is very poor though, which suggests that these aren't being taught at all in these courses.
- Comments in some of the longer answer questions (not reported in detail here) suggest that a number of people scripted software such as ArcGIS or ENVI as part of a course at university, and have never done it again since - this may reflect badly on the courses, as many people could do with using these techniques but may have been put off during the course.
- A number of people told me they were embarassed about not being able to program, as they thought it would be very useful, but were very scared of it and had no idea how to begin (and no idea what language to use - as there are lots of very vocal language wars within the community).
- People were very interested in my prototype provenance tool, and thought it would be very useful (96% would like it!). I would love to develop this into a fully-featured tool for the most popular remote sensing packages (ArcGIS and ENVI, and possibly Erdas Imagine) but would need funding and/or time to do that.
Generally the presentations at the conference - from both keynote presenters and delegates - showed that software is very important in the field, a wide variety of tools are used, and algorithm development and comparison are very important. Software use in the field - as demonstrated through the presentations - can be split into a number of categories:
- Interactive processing and visualisation using GUI-based tools such as ArcGIS/ENVI (sometimes with scripting of these tools)
- Use of algorithms built-in to a range of software packages (from ArcGIS/ENVI to more specialised packages such as BEAM), and then comparison of the results.
- Implementation of custom algorithms and automated processing tools (ranging from simple conversions to complex mathematical algorithms) in a variety of languages
Those uses for software are fairly standard across a range of scientific fields - the things that make this field unique are:
- The large data volumes - some people were processing airborne data which can easily be 60Gb for a single flightline, or large volumes of satellite data which can also become large (eg. 200 Landsat scenes requiring around 40Gb).
- Use of simulation (either programmed yourself, or openly available tools) tools to produce large volumes of data (eg. simulating airborne data or satellite data, or simulating large time-series of data).
A number of presentations mentioned the use of open-source software such as Quantum GIS, GRASS, GDAL and CloudCompare, and one presenter made specific reference to the fact that he was deliberately using open-source languages (Python and R rather than packages such as Matlab) to make his science more widely available - and so that less-wealthy institutions in the developing
world could work with him.
Two of the keynote speakers mentioned software a lot, and both focused on the need for remote sensing to produce national or global data products on an automatic basis. At this level you can't do any manual work on the data, so all algorithms must be robust and validated, and be able to work without human intervention on large datasets. The aspects of validation, verification and reproducible research were emphasised by Iain Woodhouse who talked about the policy implications of remote sensing - particularly within the framework
of Reducing Effects of Deforestation and Forest Degradation (REDD+) - a UN initiative he has been involved in for many years. Over the next decade a legal framework may be developed for reducing the amount of deforestation, with countries being prosecuted in an international court if they deforest areas without permission. Remotely-sensed data would be the main way to monitor this, and thus processed data from satellites may be used in court, requiring robust reproducibility and validation. Colm Jordan echoed this point, and also emphasised that we need to be creating software that can be used by scientists: geologists like hammers and don't like computers, but we have lots we can provide to them if we do it in the right way!
Ways for the SSI to get involved in the future
- Many of this group could really do with a Software Carpentry course - particularly one tailored to their needs. I've no idea how we could run this (they're from all over the country and use a diverse range of languages), but it'd be very useful for them. I would be happy to be involved in helping with something like this, but I don't really know how - it's a lot harder than the course I am running at the ICSS in Southampton!
- There is a distinct lack of training materials for people who want to get started with programming for remote sensing, photogrammetry and GIS. However, a number of universities are teaching this at MSc level, and it seems that those who don't are putting their students at a disadvantage. Is there a place here for introductory programming courses - at a far more introductory level than SWC, but bringing in things like Version Control and Testing right from the beginning (so that they don't know there is any other way...).
- A provenance tool would be very useful and lots of people want it - and a very useful reproducible research tool. My version is a very early prototype at the moment, but could be extended. Do the SSI ever fund the development of pieces of software?
- There are no specific bits of software that it would be particularly relevant for the SSI to get involved with at the moment - I think that may come later as I follow up some of my discussions at the conference. However, a number of people have taken SSI literature to discuss with their PIs and research groups, and there may be some people who contact the SSI because of that. On a longer-term basis it may be useful to get in contact with some of the research groups who tend to have a lot of people programming in them: places like the Mullard Space Science Laboratory at UCL, but this will need more discussion.
Detailed questionnaire results
- There is a wide variety of software in use in the field, with the most popular tools being ArcGIS, ENVI, Erdas Imagine, Idrisi, GRASS and QGIS (in that order) with many other tools like PhotoModeller, Cyclone, BEAM, eCognition, SeaDAS and CloudCompare also being used. There could be issues with sustainability of these software - both the commercial software and the open-source tools. For example, ArcGIS 10.0 broke compatability with .mxd files saved with ArcGIS 9.3, and ENVI 5.0 will not run extensions that were written for ENVI 4.8.
- 80% have done some programming as part of their research, but only 57% of these had received some formal training in programming.
- The most popular languages were Matlab (64%), Python (42%), IDL (25%) and R (21%), with other languages including C, Fortran, Visual Basic, Mathematica, C++ and Batch Scripts also being used. This is as expected, as Matlab, Python, IDL and R have many tools for image processing, and IDL is the language which ENVI is written in, so is often used for extensions.
- 36% of the people who have programmed have released some of their code, but only 10% have released code outside of their department/research group.
- Use of good programming practices as taught by Software Carpentry is very limited: only 16% of the programmers use version control, 18% use automated testing, 11% use automated build and 36% use object-based programming (often because the language forces them to).
- Around 50% of the respondents would use Excel for running any statistical analysis they would need to go (eg. a t-test or a regression), but 50% would use Matlab - a far better choice from the reproducibility (and accuracy) point of view. R came third (35%) with GUI tools like SPSS and Minitab next (25% and 5% respectively). A number of people stated that they would perform these statistics in C or Visual Studio - a slightly strange result!
- 40% of people have scripted a software package such as ArcGIS or ENVI, although many of these seemed to do it only as part of a course.
- 96% of the respondents thought that a 'Provenance Tool' as demonstrated in my presentation would be useful to them - if it were implemented within their preferred remote sensing software.
- 17 of the respondents (approx 50%) gave their email addresses for any follow-up.