Sharing data & code for reproducible neuroimaging

Posted by s.aragon on 13 April 2018 - 7:45am

By Cyril Pernet, University of Edinburgh.

This blog post was first published at neurostatscyrilpernet.blogspot.co.uk.

Feedback from reproducible science workshops

Only a minority of scientists think there is no 'reproducibility crisis' (Nature 533, 452–454), yet many are not engaging in reproducible practices. Results from a recent survey among psychology researchers suggest that discussion and education about the utility and feasibility of practices like data sharing are needed if we want the community to adopt those standards. In short, people don't know those practices and don't want to journey there.

One of the things I did during my Software Sustainability Institute Fellowship, was to run a series of small group workshops for post-graduate students and principal investigators, on data sharing, code sharing, and good practices around code. This took place the last of September 2018 in Oxford the 25th, Birmingham the 27th and Glasgow the 29th, and in collaboration with Sanjay Manohar and Chris Gorgolewski, who came all the way from Stanford.

Teaching activities

The workshops were more or less structured the same. In the morning, we started with an introduction to reproducibility followed by data sharing, talking about the brain imaging data structure and helping people preparing data. In the afternoon, we would talk about and do a few exercises for good practices to code and good coding practices (think about it, it's not the same thing...). We even had time for a Git tutorial in Birmingham (thanks to Chris).

Teacher's feedback

Discussions showed that despite the will to share (since they came to the workshop to learn how to do it!), there are still fears, and in particular of misusage the data and finding mistakes. The fear of misusage is the idea that someone takes the data you collect and comes out with results that you know cannot be accurate. To me, this makes sense, if I know my data do not contend what people using them say it does, as a scientist I just cannot let that happen. When it comes to code, our students did not seem that concern about usage, or misusage - but mistakes. What if you make a mistake and people see it? It turns out there is nothing to fear. A survey on how we (scientists) perceive scientific lack of replication is that it has less consequence on scientific reputation than we think it does and unless you persist on not acknowledging your mistake, no-one will think you are a bad scientist.

Feedback

It has been over 6 months now, so I asked people to answer a quick survey, as I wanted to find out if what we did had a real-life impact (17 answer out of 30ish people).

people were satisfied by the workshop (4/5 on average)
and it did help them to understand better reproducible research (4/5 on average )
we could have done better to answer data sharing issue (2.5/3 on average)
the good news is: more than half (53%) are using BIDS, our standard for sharing neuroimaging data
and coding practices have changed (30% not changed, and thus 70% do now differently: 46% changed the way they code, 12% did not change but now use version control, the 12% change the way they code and now use version control).

Take-away message

Practical workshops work, and people who want to learn to change their practices do. We need, however, to be able to address issues about data and code misusage - as there is no structural way to deal with this at the moment.