Software Development Essentials for Scientists
By Douglas Lowe, RSE at the University of Manchester.
This article was first posted on the University of Manchester website.
At the end of March, members of our Research Software Engineering (RSE) team ran a Carpentries-style training course for researchers in the environmental sciences. The training took place over five days and introduced the researchers to intermediate-level software development skills and practices.
Prof. David Schultz, within the Department of Earth and Environmental Sciences, approached our training team last year for help to run a computational science training course on behalf of the Natural Environment Research Council (NERC).
Applying for Funding
NERC holds yearly open funding calls for provision of training courses relevant to their remit, and our training team have previously worked with Prof Schultz to provide an online Scripting for Environmental Scientists training course for NERC researchers. However, this time we proposed a more ambitious training course, to address the need for wider experience with software development tools and practices. This is a need which has been identified by the UK RSE community, and course material to address this need is already available, as explained below. With this aim in mind, we applied in May last year to NERC for funding to run a training course on Intermediate Research Software Development Skills in Python. Our bid was selected for funding in June 2022, and we delivered the course itself over the last week in March 2023.
Training Material Development
The training course we delivered is based on one being developed by the Software Sustainability Institute (SSI) called Intermediate Research Software Development. Their course has been designed to teach researchers a core set of established, intermediate-level software development skills and best practices for working as part of a team within a research environment. Typically researchers working in a computational science environment will pick up basic software coding skills, enough to get by when working on a small individual project. However this experience rarely prepares researchers for working in teams on larger development projects, and so the SSI designed this course to bridge that gap. The original course is available for use via the Carpentries Course Incubator service.
Our RSE team adapted the original material to make it more relatable for our target audience: environmental scientists. The original example dataset and software package were clinical-research focused; we replaced these with a sample of real measurement data from the Lowland Catchment Research (LOCAR) Programme. To accommodate this dataset, we switched from using numpy to pandas dataframes, and we adapted the unit testing and object-oriented lessons to reflect these changes. Extra materials to cover the use of geopandas and to explain command-line interfaces were added. Where appropriate, this extra material has been shared with the SSI for inclusion in the original course, helping with the development of that, as well as of our local course.
Thirty participants attended the in-person course, run on the University campus over the last week of March 2023. Of the participants, 21 were PhD students and 9 were Research staff; 14 were based at the University of Manchester, while the rest travelled from institutions across the UK. The training team was large; across the week we had the help of 6 Research IT staff and 4 external staff (2 based at the University of Manchester, 1 at Bradford University, and 1 at the Earlham Institute). This helped us keep a good participant to trainer ratio, so we could provide support for all learners as we went through the material.
To help prepare participants who were not so familiar with Python, an online, one-day, Introduction to Python course was run a fortnight before the week-long course. The week-long course itself introduced students to technical aspects such as unit testing, object-oriented programming, version control using git, and software packaging, as well as theoretical aspects such as software architecture and design, programming paradigms, code review processes, and basic project management.
The teaching style used was based on the Carpentries didactic method with students following the instructor's guidance and typing in the worked examples at the same time. Strong emphasis was also placed on group work, and through the week we encouraged the students to work together to find solutions for the problems set for them, as well as introducing them to working on shared projects. We aimed to build an open and friendly atmosphere during the course, and as a consequence had strong engagement from the students throughout. They would ask questions during the teaching, both for clarification and to prompt elaboration from the helpers on the topics being taught. We received good feedback from the students on the course afterwards, which we will use to extend and improve the course.
One participant Dr Yongchao Huang (Senior Research Associate at the University of Cambridge) commented:
“The teaching was delivered in high quality - the instructors are very knowledgeable, patient, hands-on, both high-level and down to details; the helpers have great problem-solving skills and are always helpful; classmates and group members are collaborative and provide insights from different backgrounds.
I do think this kind of software training activity is extremely useful for early career academics, particularly for people working in the data science arena. It is valuable and unique in that, it equips attendees with end-to-end software skills which are essential and advantageous for accelerating research, knowledge sharing and collaboration (e.g. git version control, CI, package publishing, etc), as data-driven/quantitative approaches (e.g. spatio-temporal modeling with GeoPandas) are becoming more relevant in many domains nowadays. These skills (e.g. project management) help boost efficiency and are directly applicable across subjects. This workshop in particular helps students and researchers dive deep into the essentials via dedicated teaching (the teachers are very experienced and considerate), well-written materials (the web resources are really good for both entry-level and intermediate level learners), carefully designed syllabus and exercises.
After this 5-day journey, I am much more confident with my software development skills, and I am sure it will benefit my future research and beyond, and hopefully contribute to the wider community (e.g. environmental data science, probabilistic programming, etc) in some collaborative way. I would definitely like to re-join such an event and encourage others to apply for it in the future.”
Following the positive feedback we have received from students, we are applying for NERC funding to run this course again next year.
Get in Touch
If you would like to know more about the other training courses, please check out the Research IT Training Catalogue for further details. If you would like a bespoke course for your group, please get in touch!