A quantitative biologist’s journey towards teaching data skills with The Carpentries

Posted by s.aragon on 2 August 2019 - 11:00am
Photo by Chris Lawton

By Edward Wallace, Sir Henry Dale Fellow and Group Leader in the School of Biological Sciences at The University of Edinburgh. 

Edward Wallace writes about his experiences getting involved with skills development for biologists through The Carpentries, for which the Software Sustainability Institute is the UK coordinator. 

I’m a biologist who started out as a mathematician. In my career I’ve repeatedly seen that good code is a powerful ingredient in the success of scientific projects. By contrast, scientists who don’t code can find themselves staring impotently at data, or investing in time-consuming workarounds that break easily and are hard to repeat. Working with The Carpentries, the Software Sustainability Institute, and colleagues in Edinburgh’s School of Biological Sciences, I’m now helping to deliver research computing skills workshops to biologists and beyond in Edinburgh. This blog post is about how I got here, and why I think it’s important to devote some of my time to skills development.

I learned to code out of necessity during my PhD, mostly MATLAB from my grad student colleague Marc Benayoun who was far more capable, indeed the co-author of a textbook. Our joint papers were driven by MATLAB simulations, whose code we made available, so this learning could be said to have kick-started my career in science. At the same time, I was teaching calculus and linear algebra to undergraduates, and realised that the theories and algorithms we were teaching them didn’t really come alive until one learned how implement them by programming. The experiences I’d had until then – logo at school, DOS batch scripts at home, a terrible self-taught C course during undergraduate – had not prepared me for the power of programming in pursuit of a scientific goal.

Upon starting my postdoc, in the first week I was told to install Python and R, so I began learning these new languages with a lot of help from colleagues, borrowed undergraduate textbooks, and the stack overflow website. The structured thinking I’d learned in mathematical studies was hugely helpful, and then there was a lot to learn about the most basic principles of software engineering, such as functions and objects. It was very helpful that my postdoc advisor Allan Drummond had a previous career at a software company and links to the open science community. We tried to follow open science practices in our papers, having others in the lab test code before publishing it, and putting effort into making supplementary data as usable as possible. I read more about open science, including the wonderful “Best Practices in Scientific Computing” paper from Greg Wilson and The Carpentries team. Allan, Emily Davenport, John Blischak, and others organised a Software Carpentry workshop in Chicago which was great: as a participant/helper I could see lights going on in people’s eyes. This gave me a sense of how I might be able to achieve more than the informal mentoring/code teaching I was doing with a few colleagues, by working with well-developed teaching material within a bigger structure.

By chance I moved to Edinburgh soon after the Institute started delivering Carpentries workshops there. Making scientific software sustainable – making it work at all – depends entirely on the people who conceive, write, maintain, and use that software. This is core to the Institute’s mission, and I was lucky to rock up at the same institution as these fine people. Giacomo Peru, who’s leading the Edinburgh Carpentries initiative, let me help at a Software Carpentry workshop and then nominated me for instructor training. The training, in Manchester in October 2017, was thought-provoking and inspiring. As designed, the Carpentries curriculum taught me a great deal about the theory and practice of teaching and learning, and how to use the community to improve one’s communication skills. Then I won a major fellowship to start my own research group in Edinburgh, giving me a great deal of control over my time and how to use it for the most impact.

Why spend my precious hours as a research fellow on skills development for others? First, there is a huge need: my colleagues in experimental biology are now easily able to generate datasets of size and complexity that exceed their training in data analysis. Gene expression data with thousands of genes effectively breaks the capabilities of popular spreadsheet programs, and point-and-click analysis is very hard to replicate or port to the next dataset. Then there is published data: every week other labs publish work relevant to mine, and my being able to write code to extract results from their public data means that I write better papers. I also don’t waste effort unnecessarily redoing experiments. Graduate students and postdocs in particular realise this need for themselves, and the Carpentries workshops we run typically fill up within hours of the email announcement.

Second, my participation in The Carpentries is good for me, my career and my research. I enjoy seeing those light bulbs go on, and when months later a PhD student tells me they routinely use the skills they learned in a workshop. I learn best practices and communication skills from teaching, I learn what others are doing in the School, and I hope that my effort to meet an important need will help my tenure case. Importantly, students and staff in my group also need to develop their coding and data organisation skills, they take the workshops too, and they do better work as a result of my participation in Carpentries. My research grants are strengthened by the inclusion of The Carpentries training and highlighting its wider impact; for example, a recent BBSRC-funded EPCC collaboration.

Third, The Carpentries supports an agenda important to me: everyone in science is included and can build the skills they need. I was always disgusted by the sexist attitudes that swirled round my education, for example “girls can’t code”. My participation in The Carpentries can do something positive in this space. The Carpentries organisation strives for equality, inclusion, and accessibility, and their code of conduct reflects this. Our workshops support better practices in inclusive science as well as open science.

Now, Edinburgh Carpentries and the School of Biological Sciences are running an annual program of Data Carpentry workshops aimed at research students and staff, allied to a wider program in the city and internationally. In a future blog post, I’ll write about the survey we did to ask our researchers about research computing skills: what do they do now and what do they need next?