By SSI Fellow Emma Karoune, @ekaroune.
This blog post is part of the series of resources on learning to code, hosted as part of the Research Software Camp: Beyond the Spreadsheet.
Learning to code can seem extremely daunting. Many researchers that are now postdoctoral researchers, or even higher up the academic ladder, received no formal teaching in computer skills during their university degrees, let alone computer coding. I am one of these researchers.
I have picked up my computer skills on a needs basis and although I am pretty adept at doing this for software packages that have a friendly user interface, computer coding is another level.
However, I recognise to now work in research and produce good quality research, and by that I mean reproducible research, I must learn to code! It is part of my reproducible research journey to transition my practices to be as transparent and sustainable as possible and I can only do this if I move away from button pressing analysis to open source forms of analysis software such as R or Python.
How I started to code
One thing I found really daunting was where do you start learning to code? There are lots of different programming languages so it’s difficult to work out which is the best one to learn.
I took inspiration from those in my discipline that had already started down this road and explored how they were working in a reproducible manner. This meant implementing version control, documenting my workflow extensively and learning a scripting language for analysing and visualising data.
I needed to step away from analysis in Excel and other packages such as SPSS that do not record the steps of analysis. I found out that R was used by others in my discipline and so it was the best language for me to learn to achieve transparent analysis.
As well as transparency, there are other benefits to learning these open forms of analysis, such as cost and even time. When you finish being a student, and therefore may be outside of a university, you have to pay for software packages. They can be very costly and many institutions only invest in certain software. Investing time in learning free and open source software, such as R, will in the end save you money and time. You can carry on your work seamlessly when moving workplaces and reuse code from one project to another.
I actually started coding when I was introduced to the wonders of Git and Github through the Open Life Science Programme. Learning Markdown to use with Github is actually a great place to start if you have never done coding before. Markdown is a very human readable code that is used to format text for web documents and it is a script used to write documents in Github.
You can also learn to make web pages through Github and this was my next step into coding that introduced me to HTML and CSS. I managed to produce a website with a bit of initial help and this did make me more confident in my computing skills.
After this, I decided to enrol in a Carpentries R course (R for social scientists) as I had done a few online courses to get familiar with the set up of R Studio and thought that it was about time to take the plunge.
Where I got stuck
I found the Carpentries course great! And I would recommend it to other beginners as not only do you get the intensive course, but also the course materials to use afterwards. After doing this course, I felt more confident with using R Studio and understood the basics of all the different aspects of reading in data, data transformation, basic calculations and plotting figures.
But, moving from this course to doing my own analysis was where I got stuck. In the reality of a different dataset and slightly different requirements needed to transform tables was just a stretch too far. So I knew I needed help!
I do want to point out at this stage that there are several workarounds that you can use to achieve a reproducible workflow while you upgrade your own coding skills.
Here are my two suggestions:
- Getting a team member that is good at coding to do the analysis for you - the benefit of this is you get the reproducible workflow you want and they might well teach you how to do it along the way.
- The other method would be to fully document the analysis steps you take in Excel or other similar software. This is in fact the simplest form of coding - a set of instructions that describes how to do a task. And it would make your analysis reproducible. It does come with drawbacks as it is time consuming to write everything down and you have to be very clear with your instructions. You need to keep in mind that someone else needs to fully understand what you did.
This documentation can be written in an analysis file or even the README file of your project. It is now good practice to accompany research articles with a research compendium containing the README file, open data, analysis and methods to create a fully reproducible publication. Not having an analysis code and writing down the analysis steps instead would be the low tech way of being reproducible.
How having a mentor helped me
I was keen to have a mentor for two reasons: to ask questions and for accountability.
I had a lot of questions that I wanted to ask - these might have been deemed silly questions by some people, but for me there are no silly questions when you are trying to learn something. If you don’t understand, you need to ask.
The other reason was for accountability. I had been stuck at a certain point with R and I wanted to be able to move on. Part of this was that I was not making time for myself to explore and work out some of the issues I had. Learning to code in R is not immediately fundamental to my current job, so it has been hard for me to justify making time. But I do feel that improving my own practices is important. So having a mentor to check up on my progress and having a date in my calendar as a point when I needed to complete a task has really helped me to progress.
Where I will go in the future with coding
I can say that I have become more confident in R from having a mentor to help me and I plan to go on and redo the whole of the analysis in R that I started. I’m also planning to go fully R with my next research project and I have planned for my team members to have various levels of R skill so we can build in learning of this method as part of the project outcomes.
I plan to do more beginner Carpentries courses, such as on the Unix shell and Python, so that I can expand my knowledge and skills further.
Moving towards reproducible research takes time and requires upskilling so you need to take those small steps to move along your own journey. If I can do it, so can you.
I want to thank my coding mentor, Jamie Quinn, who was very patient and had answers to all my questions. Thanks so much for all your help and support!
Want to discuss this post with us? Send us an email or contact us on Twitter @SoftwareSaved.