Writing readable source code

Readable.jpgBy Mike Jackson.

Readable source code is vital

If our peers are to quickly and easily understand our source code, it must be readable. The Software Sustainability Institute can provide advice and guidance on producing readable source code. We can even review your source code to see whether it can be improved.

Why write this guide?

With increasing requirements for researchers to make their source code available, we thought this guide would be a useful resource for researchers developing software, to ensure that the code they produce is understandable by their peers.

Why is readable source code important?

Source code is designed for us. It may end up being processed by a machine, but it evolves in our hands and we need to understand what the code does and where changes need to be made. We may understand our code now, but what about six months or a year from now? Readable code helps us to reaquaint ourselves with what we wrote and why we wrote it.

Our code may embody some unique aspect of our research. Readable code can help our fellow researchers to understand what we've done and so to assess whether this aspect of our research is correct. Or, to put it another way, would we rather have a peer spot a problem now, or, six months later when we've published a paper based on flawed results produced using our software?

There's also our image to consider. If our code is badly laid out, messy and cryptic, others will assume that it is also buggy and sloppily written. They may assume that we undertake our research in a similarly slack manner.

If we're working in a team to develop some code then readable source code can ensure that everyone can understand the code written by everyone else. This can help improve a team's "truck factor", which is defined as the number of developers who need to be put out of action before noone understands the code.

Writing readable code costs only a fraction more time than writing unreadable code, but the payback is immense.

Formatting

The formatting or appearance of code determines how quickly and easily the reader can understand what it does. A compiler will see no difference between this...

	// Example 1: unformatted code.
public class Functions
{
public static int fibonacci(int n)
{
if (n < 2)
{
return 1;
}
return fibonacci(n-2) + fibonacci(n-1);
}

public static void main(String[] arguments)
{
for(int i=0;i<10;i++)
{
print(“Input value:”+i+” Output value:”+power(fibonacci(i), 2)+1);
}
}
}  
  

...and this... 

	//Example 2: formatted code.

public class Functions
{
  public  static int fibonacci(int n)
  {
    if (n < 2)
    {
    return 1;
    }
     return fibonacci(n-2) + fibonacci(n-1);
  } 

  public st atic void main(String[] arguments)
  { 
    for (int i = 0; i < 10; i++)
    { 
      print(“Input value:” + i +
            ” Output value:” +
            power(fibonacci(i), 2) + 1);
    }
  }
}

...but the second example will be more easily understood by the reader.

Indentation makes a clear connection between blocks of code and the classes, functions or loops to which they belong. If a statement is longer than a single line on screen, indentation helps the reader understand where the statement begins and ends. White-space makes the code appear less cluttered and allows the grouping together of logically-related elements like constants or local variable declarations.

In many languages, indentation is purely cosmetic (e.g. C/C++ or Java) and the number of spaces used to indent code is left to the developer to decide. However, in certain languages (e.g. Python or Occam) indentation is more restrictive because it has semantic significance: it defines a loop body or a function body.

Many programming environments (e.g. Eclipse, JBuilder, NetBeans and Microsoft Visual Studio) provide support for code formatting, and certain text editors (e.g. Emacs) can be extended with support for language-specific indentation. These tools may need careful configuration to ensure that the appearance of the code on screen is the same as the appearance of the code in the text editor.

Good formatting can impact upon design. A function with seven arguments might not be very readable on-screen. To make it more readable, you could create a new data structure or class to hold some of the arguments. We could also break up a function that cannot be viewed on one screen into a number of smaller functions that can, if the function can be logically decomposed in this way (we should never break up functions into smaller chunks purely on aesthetic grounds!).

Naming

The careful selection of names is very important to understanding. Cryptic names of components, modules, classes, functions, arguments, exceptions and variables can lead to confusion about the role that these components play. Good naming is fundamental to good design, because source code represents the most detailed version of our design. Compare and contrast the ease with which the following statements can be understood:

out(p (f(v), 2) + 1);

print(pow er(fibonacci(argument), 2) + 1);

There are common naming recommendations. Modules, components and classes are typically nouns (e.g. Molecule, BlackHole, DNASequence). Functions and methods are typically verbs (e.g. spliceGeneSequence, calculateOrbit). Boolean functions and methods are typically expressed as questions about properties (e.g. isStable, running, containsAtom).

Naming also relates to the use of capitalisation and delimiters, which can help a reader to quickly determine if something is a function, variable or class. Common guidelines for C and Java include:

  • Constants should be capitalised: PI, MAXIMUM_VALUE.
  • Class names should start with an initial capital with the first letter of subsequent words capitalised (this is called Camel Case): Molecule, BlackHole, DNASequence.
  • Functions should start with a lower-case letter with the first letter of subsequent words capitalised: spliceGeneSequence, calculateOrbit.

Comments

Source code tells the reader what the code does. Comments allow us to provide the reader with additional information. The reader should be able to understand a single function or method from its code and its comments, and should not have to look elsewhere in the code for clarification. It can be easy to get lost in code, and others  will not have the same knowledge of our project or code as we do.

The kind of things that need to be commented are:

  • Why certain design or implementation decisions were adopted, especially in cases where the decision may seem counter-intuitive.
  • The names of any algorithms or design patterns that have been implemented.
  • The expected format of input files or database schemas.

There are some restrictions. Comments that simply restate what the code does are redundant. Comments must be accurate, because an incorrect comment causes more confusion than no comment at all.

Coding conventions

As each language has its own syntax, semantics and sets of built-in commands, what constitutes readable code differs across programming language. What is readable is also affected by the opinions and preferences of the individual reader. Nevertheless, a number of language-specific coding conventions have evolved, reflecting both general and language-specific good practice.

It’s recommended that projects adopt a set of coding conventions. Not only does this promote readable code, it helps ensure that the code looks consistent, even if it the software consists of hundreds of source code files and is worked on by many developers. Projects as varied as Mozilla, Linux, Apache, GNU, and Eclipse all have their own project-specific conventions that their developers are expected to conform to.

Project-specific conventions can also embody requirements specific to our project. They promote consistency of naming across packages, components, classes, or functions: 'All test classes must have the suffix Test, e.g. FourierUtilitiesTest'. They ensure that others know who owns the copyright on our source code: 'All source code files must have a comment with the statement Copyright © My Organisation, 2010'. They ensure that others know about restrictions on our source code: 'All source code files should have a comment with the text "Licensed under the Apache License, Version 2.0".'

Many programming environments allow templates to be defined, which help us to conform to coding conventions for source code files. But, we must remember that templates are just tools: they cannot guarantee readable code in themselves.

Code analysis tools allow our coding conventions to be defined as rules. Our source code can then be analysed against these rules to automatically check for conformance. These tools can publish reports that highlight what rules are violated and where in the code the violations occur. Popular code analysis tools are CheckStyle for Java, StyleCop for C#, Perl::Critic for Perl, Pylint for Python, and codetools for R. For other languages you should check Wikipedia's "List of tools for static code analysis". For large teams with members that use different text editors and IDEs we recommend EditorConfig that helps developers define and maintain consistent coding styles between different editors and IDEs.

Source code and documentation

Certain languages have tools available that can automatically generate documentation from source code. This documentation can help others navigate their way around our code and understand what each component does. One example is JavaDoc for Java. This takes Java source code and outputs a set of HTML pages with information about classes, methods, arguments, return types and exceptions. The page content is derived from the source code itself and all pages are automatically cross-referenced. Tags can be provided in comments and the use of a tag determines how that part of a comment is presented in the HTML. As a simple example, a comment "{@link http://www.google.com}" will become a hyperlink in the HTML. Other examples of document generation tools include doxygen for C/C++, Fortran, C#, Java and PHP; NDoc for C#, f90tohtml for Fortran, Sphinx for Python, and roxygen2 for R. For other languages you should check Wikipedia's "Comparison of documentation generators".

Be consistent

There is no single correct way to indent, use white-space or name things, nor is there a single correct set of coding conventions to use. Writing readable code is very much dependant on our programming language, the requirements of our project and what we think of as readable. The golden rule is to be consistent. Once we've decided upon the conventions we will use – with the agreement with the other members of our team, if applicable – then we should document the conventions and stick to them. At the same time, our conventions are not set in stone. If they need to be changed or improved, then we should do so. Our conventions are there to help and serve us, and they will ultimately lead to readable code.

How can we tell when we've succeeded?

A code review, also known as a peer reviews or code inspection, is an effective way of assessing how readable our source code is. This involves our peers, or trusted colleagues, reviewing our code line-by-line. Not only is a code review a great way to get feedback on the readability of our code, it can also be a highly-effective means of detecting errors very early on. For more, see our guide on developing maintainable software.

Further reading

There are many resources on the subject of readable source code.