Home News and blogs hub

Aim for understandability if you want to write good research software

Author(s)

Luke Abraham

SSI fellow

Thibault Lestang

SSI fellow

David Wilby

SSI fellow

Flic Anderson

Sherman Lo

Posted on 4 July 2022

Estimated read time: 13 min

Sections in this article

Aim for understandability if you want to write good research software

Posted by j.laird on 4 July 2022 - 10:00am question mark on a road sign

Image by Colin Kinner

By Felicity Anderson, Luke Abraham, Thibault Lestang, Sherman Lo and David Wilby.

This blog post is part of our Collaborations Workshop 2022 speed blog series.

What is considered good code is subjective. For example, an end-user would want the code to be correct, easy to learn and simple to use. There isn’t a need to touch the source code as long as the code fulfills the end-user’s objective. On the other hand, a developer may want the code to be readable and understandable to maintain or develop the code further. The book Code Complete [1] lists further values an end-user or developer may have such as efficiency, robustness, reusability, testability and many more!

Different communities will also have different values when using or developing a piece of code. In industry, a product is designed and produced to fulfill a business mission to earn profits. The product cycle is constrained by the economy and the market, such as labour and material costs and marketing strategies. Companies would hire strong candidates to develop code to produce the best product under these constraints.

In academia, code is developed and used to answer research questions and to teach students, from various fields, backgrounds and expertise. Some students may embark on a research problem with little programming experience. Because universities are typically publicly funded, there is an emphasis on making code open-source and on being made as accessible as possible. Already, the difference between industrial and academic codes is quite apparent.

We believe that understandability is the most important value in academic code to ensure that it remains sustainable over generations of researchers. This allows the code to be studied, used and developed further by users, especially if the user is a novice. The more users the code reaches, the more impact the research has achieved. Furthermore, code understandability strengthens academics’ future research findings by allowing them to explain the code they’ve used to obtain such findings.

How to Write Understandable Code

Our understanding of code, including our own, decays over time. Writing code that remains easy to understand is difficult, especially if the algorithms or numerical methods implemented are intrinsically complex. There are, however, several simple approaches that most programmers can use to write more readable and understandable code.

Style guides, for instance, are specifications of how to format code: number of spaces for indentation, number of newlines in between function definitions, the maximum length of lines and many more. Style guides, such as PEP8 [2] (Python) or the tidyverse style guide [3] (R) are designed to enhance code readability and help programmers adopt a similar layout to each other's code. In most cases, software tools are able to reformat your code automatically (see black [4] (Python), styler [5] (R), or clang-format [6] (C++)). This makes style a great access point to better quality code, accessible to beginners or newcomers to a particular code. Style guides, however, will not go beyond modifying your code layout. Particularly, they will not help with other aspects of code understandability, like naming variables or functions.

Vague or inaccurate names can make the most straightforward of algorithms cryptic. On the other hand, explicit and descriptive names can make complex logic easier to visualize and understand. Make sure to choose names that describe what data your variables refer to. Equally, give functions a name that describes what the function is doing. Naming things, however, requires derailing your current train of thought. Furthermore, it is perfectly normal to struggle to come up with good names straight away. Don't force it, and carry on with implementing the logic you have in mind. Once you reach a stable point, make sure you spend some time looking back at what you wrote, specifically at the name you chose. Good names describe what instructions are. However, we are left with describing why these instructions are. This is where comments come in.

As soon as you start writing more descriptive code, you'll find that many of your comments become redundant:

# remove weight column

weightless_data = raw_data.drop(columns="weights")

Instead of describing what is happening, use comments to explain the reason why these particular instructions are happening.

# The weights column is removed to fit the sampling function

# interface (uniform sampling)

weightless_data = raw_data.drop(columns="weights")

Style guides, good names and useful comments are several ways to help readers understand what your code is doing - and why you wrote it in this way. Unfortunately, there are no hard and fast rules to achieve crystal clear code. In fact, it is easier to identify common patterns in code that are difficult to read and modify. These patterns are commonly referred to as "Code Smells", an idea originally introduced by M. Fowler and K. Beck in their book Refactoring [7]. Make yourself comfortable with the main code smells, and learn the ways you can avoid them. A good starting point is Jenny Bryan's talk Code Smells and Feels [8].

There is, however, only so much you can achieve on your own to make your code understandable to all. It is difficult to objectively assess how readable your code will be to others - for this reason, code review is common practice in the software industry and open source communities. Code review enables high standards of code quality as well as sharing of knowledge between developers. Ask around for colleagues willing to provide feedback on your code, and offer to provide feedback on theirs. When reviewing code, it’s a good idea to start with the points mentioned above: is the code following a style guide? Do the names make sense? Are the comments useful? Can I spot some common code smells? These questions are valid for most source code, independently of the language or research field. They provide good entry points to a code review, before moving to more field- or language- specific considerations.

“Is it just me?”: Code Review & the Social Side of Coding

In contrast to good programming practices, code review is both a technical and social exercise.

It is generally very difficult to disentangle the human element from software. Our code is written by us for a purpose, and so we naturally will become emotionally invested in it. We may struggle for some time on a tricky problem, but are rewarded by a rush of exhilaration once the code works and gives us the desired result. We may only use this code for a short period of time or we may use it as part of a daily workflow, expanding it and maintaining it in different environments or for different problems.

It is natural to be apprehensive when showing work to others, especially if you have spent a long period of time developing a particular piece of software, and we may be plagued by “what if” questions. What if they find a problem with it, or disagree with how the code has been implemented? What if they can’t understand it, or suggest a shorter or more efficient method?

Sharing code and asking for feedback as part of code review will make the code better, if we let it. More people looking at code means that it is more likely that any existing problems can be found, and improvements to implementation or documentation can be suggested. It also encourages us (however subconsciously) to think more about comments and code structure. However, we should also be aware of our emotional connection to our software - while the code reviewer may make suggestions, they are not judging us personally!

Similarly, for those who review code, remember to do this with empathy. Clearly discuss your findings with the developer and explain your reasoning. What about the code do you feel must be changed, and what comes down to style or is an equally valid approach? Here we should remember the developer behind the software, and give our feedback accordingly as they will have spent time and effort in producing this code prior to asking for feedback.

Facing Feedback To Improve Your Code

Receiving feedback, such as the critiques and suggestions from a code review, is a skill that improves with practice, which allows us to engage with it and respond professionally. Researchers become ‘experts’ at navigating and interpreting feedback: we encounter it in collaborative discussion between colleagues, during grant applications, via reviewers or editors while publishing, or in comments and questions from peers during presentations.

Our first instinct when responding to feedback can be an emotional one - especially after encountering the equivalent of the dreaded ‘reviewer 2’ (or even 3 [9]) from code reviewers who didn’t take our previous advice about reviewing code with empathy. Managing your initial emotions can be demanding, but as you have seen, you likely already have a lot of practice in doing this successfully.

By turning on our ‘researcher brain’ (perhaps after a short coffee break), we can take advantage of helpful suggestions and formative input from the feedback. You might be introduced to new methods, concepts or techniques which could improve your research, efficiency and research productivity.

Preparing For Successful Code Reviews

Another academic comparison illustrates an important facet of feedback - “what you give is what you get”. In the same way that exam markers are likely to be forgiving when identifying smaller mistakes or otherwise apply ‘benefit of the doubt’ if the work they are marking is neat and sensibly laid out, code reviewers will be inclined to give more detailed explanations of their points, or suggest alternatives and fixes if your code is easy to read and uses a consistent style.

Conversely, If you write code that is hard to read or contains unfortunate ‘code smells’, reviewers may give less constructive feedback, or have less patience for ambiguities in your scripts. But, as with (the majority of) exam markers, code reviewers are likely to be doing this because they hope it will improve your skills, and will want to give you credit for the things you’ve done well. Writing understandable code gives reviewers the best opportunity to help you improve your code quality further.

Similarly, we know how to ask for the feedback we need. If you have a fairly good idea about code style but wonder if your code could be more modular, ask the reviewer to focus on this. It saves them time if they know what you want help with and where to find it.

Asking for feedback on selected parts of your code (e.g. a key function, or step in a workflow) is also polite - you’re likely to receive a quicker response to a schedule-friendly ‘drop’ of code, compared to an overwhelming flood of it!

If you aren’t certain what a code reviewer means, it’s also fine to go back and ask specific questions (e.g. “Did you mean that only the data-analysis code should be split into separate functions?”) before making any changes. Iterating through it and checking in with the initial reviewer (if they have agreed to this and have time) is a great way to improve your scripts and your skills at the same time.

Increasing Understandability Brings Biggest Code Quality Gains

In our experience, researchers can significantly improve the quality of their code by focusing on increasing its understandability. This brings benefits to the research that relies on the software, while also making life considerably easier for any researchers who interact with it. In contrast to academic writing, code is a living and evolving resource. In addition to being easier to read, code you can understand is easier to fix, explain and maintain. On the other hand, code that is difficult to understand is unlikely to be reused by others, let alone be built upon. If you struggle to understand what the code tries to do, you (and others) will find it difficult to use or extend it.

While writing your code, we recommend concentrating initially on a small number of techniques which will improve your code’s understandability: paying attention to code style; using sensible naming throughout; and documenting the ‘whys’ with comments.

Once you have written your code, the ultimate test of its understandability is getting someone else to look at it in a code review and confirm that they can understand it too. Programmers of all levels can easily engage with these suggestions and will see immediate benefits. Those new to the concept may initially be intimidated, but overcoming these barriers is worth the reward.

We believe that if you have written understandable code that incorporates feedback from others, you are likely most of the way to ‘good enough’ code, and are ready and able to engage with resources and best practices such as those set out by Wilson et al. [10] which can help you get the rest of the way there.

References

[1] McConnell, S., (2004) “Code Complete”. Pearson Education.

[2] van Rossum, G., Warsaw, B. & Coghlan, N. (2001) "PEP 8 – Style Guide for Python Code". Available at: https://peps.python.org/pep-0008/

[3] Wickham, H. (2017) “The Tidyverse Style Guide”. Available at: https://style.tidyverse.org/

[4] “Black - The Uncompromising Code Formatter”. (2018) Available at: https://github.com/psf/black

[5] Müller K. & Walthert L. (2022) “styler: Non-Invasive Pretty Printing of R Code”. Available at: https://styler.r-lib.org

[6] “ClangFormat” (2007). Available at: https://clang.llvm.org/docs/ClangFormat.html

[7] Fowler, M., & Beck, K. (1999) “Refactoring: Improving the design of existing code”. Reading, MA: Addison-Wesley.

[8] R Consortium. (2018) “Code Smells and Feels”. [Online Video]. Available at: https://www.youtube.com/watch?v=7oyiPBjLAWY.

[9] D. A. M. Peterson. (2020) “Dear Reviewer 2: Go F’ Yourself”. Social Science Quarterly. 101(4): 1648–1652. DOI: 10.1111/ssqu.12824.

[10] Wilson G., Aruliah D.A., Brown C.T., Chue Hong N.P., Davis M., Guy R.T., et al. (2014) “Best Practices for Scientific Computing”. PLoS Biol 12(1): e1001745. DOI: 10.1371/journal.pbio.1001745.