Current and future directions of Citizen Science 2013

London, UK, 13th September 2013

By Robin Wilson, SSI Fellow and PhD Candidate at the Institute for Complex Systems Simulation, University of Southampton

Highlights

Inspiring keynote speech by Rob Simpson.
Focus on usability of software - but this should translate to the whole of science anyway!

Event report

Overall this was an interesting conference/workshop day on Citizen Science. It was run by E-Infrastructure South - a consortium of southern England universities that are aiming to "link the research community and
technologies".

Unlike a number of other events I've been to, this wasn't about software itself, but about developing projects where normal citizens can contribute to science. The three types of these projects described during the keynote were:

Resource contribution projects, where citizens contribute their computing resources through distributed computing efforts such as SETI@home and ClimatePrediction.net
Citizen analysis of researcher-collected data, which includes various types of time-consuming analysis that can't be automated and thus requires humans to carry it out. Examples include Galaxy Zoo and Snapshot Serengeti and all of the other Zooniverse projects.
Citizen collection of data - for example the Mappiness project and other projects requiring special sensors that attach to phones etc (like the project I am hoping to run).

The key thing connecting all of these types of project are that they are all heavily reliant on software, in various different formats. In these software tools the interface is *really* important: people won't use Snapshot Serengeti if it isn't easy-to-use, fun and pretty! And, luckily, it is (have a go - the interface is wonderful). Rob Simpson, from Zooniverse, described their work as being at the boundary between science, web technology, data and people - all of which are very important. A key point raised by him was why we create special tools for Citizen Scientists - surely a normal scientist classifying images taken in the Serengeti would prefer to use an interface like Snapshot Serengeti? But instead, people normally end up typing animal names into an Excel spreadsheet while viewing images in Windows Picture Viewer! Taking this further - surely all science tools should be accessible, easy-to-use and fun?! The two directions that he felt this was going was increasing the flexibility of the tools for those users who want them: for example, a whole online statistics toolbox has been developed for the Galaxy Zoo project, and extending this to mobile (being able to classify a few Serengeti images while waiting for the bus would be awesome).

Other key issues raised by other speakers are below - and most of them are relevant outside of citizen science:

Don't start with the software: start with the science question, and only then develop the software/hardware infrastructure you need.
Wonderful, easy-to-use and attractive-looking software really helps people get things done!
A graphical component is key - particularly for processor-intensive tasks such as the ClimatePrediction.NET simulations.
Making things easy for users is important - simple installation, quick-start guide, easy annotated tutorial. A lot of open-source software tools could do with learning this!
Frameworks to build on are key: eg. the Zooniverse or BOINC.
Software is a key component in the concept of 'Social Machines', as espoused by David De Roue.
Getting the context (ie. metadata) of data collection is key: a lot easier when using sensors on/connected to mobile devices.
We need to keep track of data provenance a lot more (this applies to the whole of science!) - for example, which user classified this, do they get lots wrong etc.
How should we balance time-consumption and quality? For example, we can get each snapshot classified by 10 people, but it'll take five times as long as doing it with 2 people - is it worth it?
When the public have contributed to datasets (and, I believe, in general with scientific datasets), we have a moral duty to keep the data safely. This is often hard.