By Steve Crouch, SSI Research Software Group lead.
This guide is the second in the Unit Testing for Scale and Profit series.
In our last episode we looked at the benefits of unit testing, and how it can help us automate the verification of correct behaviour with our software. After designing and writing a number of tests using the pytest testing framework, we were able to run them and see whether those tests were successful, and using a tool to measure unit test code coverage we were able to see to what extent they tested our code as a whole. But as our code increases in size and particularly complexity, we should expect our number of tests to increase too, which means more time writing tests. Fortunately there is something that can help with this burden which we'll look at in this episode: parameterised tests!
Let's say we’re starting to build up a number of tests for our code, and in some cases we're testing the same function, just with different parameters. However, continuing to write a new function for every single test case isn’t likely to scale well as our development progresses. So how can we make our job of writing tests more efficient?
In our last episode we looked at a Python implementation of the Factorial function and wrote some unit tests for it, and we'll be building on that in this episode. If you've already followed that episode, feel free to skip the next catch-up section. If you haven't followed that episode yet, and would like some background on writing basic unit tests, feel free to go back and follow that episode first to bring you up to speed. Otherwise, if you already have some experience of writing basic unit tests then read on for a quick catch-up.
Note: You will need Python 3.7 or above if you wish to follow the coding examples.
A quick catch-up from last time...
First, create a directory somewhere called mymath, and create a new file within it called factorial.py which contains this Python:
def factorial(n): """ Calculate the factorial of a given number. :param int n: The factorial to calculate :return: The resultant factorial """ if n < 0: raise ValueError('Only use non-negative integers.') factorial = 1 for i in range(1, n + 1): factorial = factorial * i return factorial
You can run this code from within the Python interpreter with, for example:
>>> from mymath.factorial import factorial >>> factorial(3) 6
Next, create another directory called tests (at the same level as the mymath directory) and place the following in a file called test_factorial.py in that directory:
import pytest from mymath.factorial import factorial def test_factorial_3(): assert factorial(3) == 6 def test_factorial_5(): assert factorial(5) == 120 def test_factorial_10(): assert factorial(10) == 3628800 def test_factorial_negative1(): with pytest.raises(ValueError): factorial(-1)
Lastly, you'll need to create a new Python virtual environment and install two packages within it: pytest (the unit testing framework) and pytest-cov (used to showing the extent unit tests 'cover' the statements in your code), e.g. at the command line:
$ python3 -m venv venv $ source venv/bin/activate $ pip3 install pytest pytest-cov
NB: If on Windows, for the second line you may need to use source venv/Scripts/activate instead
Then you'll be able to run the unit tests using:
$ python3 -m pytest --cov=mymath.factorial tests/test_factorial.py
And you should see something like:
============================= test session starts ============================== platform linux -- Python 3.8.10, pytest-6.2.5, py-1.10.0, pluggy-1.0.0 rootdir: /home/user plugins: cov-3.0.0 collected 4 items tests/test_factorial.py .... [100%] ---------- coverage: platform linux, python 3.8.10-final-0 ----------- Name Stmts Miss Cover ----------------------------------------- mymath/factorial.py 7 0 100% ----------------------------------------- TOTAL 7 0 100% ============================== 4 passed in 0.04s ===============================
In which case, you're ready to go!
Parameterising our unit tests
In our last episode, we wrote a single function for every new test we needed. But when we simply want to use the same test code but with different data for another test, it would be great to be able to specify multiple sets of data to use with the same test code. Test parameterisation gives us this.
So instead of writing a separate function for each separate test, we can parameterise the tests with multiple test inputs. For example, in tests/test_factorial.py let us merge the first three tests that deal with positive integers into a single test function:
@pytest.mark.parametrize( "test, expected", [ (3, 6), (5, 120), (10, 3628800) ]) def test_factorial_positive_integers(test, expected): assert factorial(test) == expected
Here, we use pytest’s mark capability to add metadata to this specific test - in this case, marking that it’s a parameterised test. parameterize() is actually a Python decorator. A decorator, when applied to a function, adds some functionality to it when it is called, and here, what we want to do is specify multiple input and expected output test cases so the function is called over each of them automatically when this test is called.
We specify these as arguments to the parameterize() decorator, firstly indicating the names of these arguments that will be passed to the function (test, expected), and secondly the actual arguments themselves that correspond to each of these names – the input data (the test argument), and the expected result (the expected argument). In this case, we are passing in three tests to test_factorial_positive_integers() which will be run sequentially.
So our first test will run this function on 3 (our first test argument), and check to see if it equals 6 (our first expected argument). Similarly, our second test will run this function with 5 and check it produces 120, and finally run it with 10 and check for 3628800 for the third test. If we run it now:
============================= test session starts ============================== platform darwin -- Python 3.8.12, pytest-6.2.5, py-1.11.0, pluggy-1.0.0 rootdir: /Users/user/tmp/tmp plugins: cov-3.0.0 collected 4 items tests/test_factorial.py .... [100%] ---------- coverage: platform darwin, python 3.8.12-final-0 ---------- Name Stmts Miss Cover ----------------------------------------- mymath/factorial.py 7 0 100% ----------------------------------------- TOTAL 7 0 100% ============================== 4 passed in 0.05s ===============================
The big plus here is that we don’t need to write separate functions for each of those tests, which can mean writing our tests scales better as our code becomes more complex and we need to write more tests that use the same code.
How does code coverage fit into parameterisation?
Parameterisation is a really useful technique for scaling up the number of tests, but we have to be careful that our tests continue to test what is important and needs to be tested. For example, we could write hundreds of parameterised tests for a particular important function – and that might be quite easy and look impressive in terms of the number of tests. Another way to do that which goes further than parameterisation is called fuzzing, where inputs to tests are randomised within given constraints, instead of being explicitly defined. But regardless of how we generate our multitude of tests, the overall code coverage may remain low, with the extra tests not helping us much.
Code coverage can help here by helping us to identify how much of the code isn't being tested, but it isn't the whole answer. We should try to ensure that we continue to prioritise the writing of tests that verify the behaviour of code in other important areas as well.
What about Testing Against Indeterminate Output?
So we can write a lot of tests very easily with parameterisation. But what if your implementation depends on a degree of random behaviour? This can be desired within a number of applications in research, particularly in simulations (for example, molecular simulations) or other stochastic behavioural models of complex systems. So how can you test against such systems if the outputs are different when given the same inputs?
One way is to remove the randomness during testing. For those portions of your code that use a language feature or library to generate a random number, you can instead produce a known sequence of numbers instead when testing, to make the results deterministic and hence easier to test against. So you could encapsulate this different behaviour in separate functions, methods, or classes and call the appropriate one depending on whether you are testing or not. This is essentially a type of mocking, where you are creating a “mock” version that mimics some behaviour for the purposes of testing.
Another way is to control the randomness during testing to provide results that are deterministic – the same each time. Implementations of randomness in computing languages, including Python, are actually never truly random – they are pseudo-random: the sequence of ‘random’ numbers are typically generated using a mathematical algorithm. A seed value is used to initialise an implementation’s random number generator, and from that point, the sequence of numbers is actually deterministic. Many implementations just use the system time as the default seed, but you can set your own. By doing so, the generated sequence of numbers is the same, e.g. using Python’s random library to randomly select a sample of ten numbers from a sequence between 0-99:
import random random.seed(1) print(random.sample(range(0, 100), 10)) random.seed(1) print(random.sample(range(0, 100), 10))
[17, 72, 97, 8, 32, 15, 63, 57, 60, 83] [17, 72, 97, 8, 32, 15, 63, 57, 60, 83]
So since your program’s randomness is essentially eliminated, your tests can be written to test against the known output. One caveat to this is to be aware that, potentially, the underlying random number generator implementation may itself change leading to unexpected output in previously known cases, which has been true for Python. The good news is that Python 3 fortunately maintains backwards compatibility with its previous random number generator algorithms, and allows you to specify which version to use when specifying a seed. Another approach which gives you even more control would be to use a mock of a known random number generator algorithm which you can use solely for the purposes of testing instead. In any event, the trick is to ensure that the output being tested against is definitively correct!
The other thing you can do while keeping the random behaviour, is to test the output data against expected constraints of that output. For example, if you know that all data should be within particular ranges, or within a particular statistical distribution type (e.g. normal distribution over time), you can simulate and test against that, conducting multiple test runs that take advantage of the randomness to fill the known “space” of expected results. Note that this isn’t as precise or complete, and bear in mind this could mean you need to run a lot of tests which may take considerable time.
Limits to Automated Testing
Like any other piece of experimental apparatus, a complex program requires a much higher investment in testing than a simple one. Putting it another way, a small script that is only going to be used once, to produce one figure, probably doesn’t need separate testing: its output is either correct or not. A linear algebra library that will be used by thousands of people in twice that number of applications over the course of a decade, on the other hand, definitely does.
It’s also important to remember that no matter how many tests we are able to write, including making use of approaches like parameterisation, at a non-trivial scale of development unit testing cannot catch every bug in an application. To mitigate this manual testing is also important: humans have a remarkable way of finding undesirable behaviours in software. Remember to test using as much input data as you can, since very often code is developed and tested against the same small sets of data. Increasing the amount of data you test against – from numerous sources – gives you greater confidence that the results are correct. The key is to identify and prioritise testing against what will most affect the code’s ability to generate accurate results.
Our software will inevitably increase in complexity as it develops. Using automated testing where appropriate can help us identify problems quickly and save us considerable time, especially in the long term, and allows others to verify against correct behaviour: but it isn't perfect!
Following on from our guide that introduced testing, we've looked at how we can use parameterisation to help us scale the number of tests we need to write, as well as the limitations of this approach and automated testing in general. The next guide in the series will look at how we can take automation even further, by using continuous integration to run our tests automatically on our source code repositories whenever we make changes to our code.
We touched on mocking as an approach to help us make our code deterministic (if we need to) when testing. Mocking is also really useful if our code makes use of an external service via an Application Programmer Interface (API) or code library, where we want to test our code without having to use this external service each time – we can build a mock that mimics the service, accepting some input from our code and returning some test output to our code that represents output from the service. If you'd like to learn more about mocking, there are many guides out there including this comprehensive guide on using mocking in pytest.
For an overview of automation in general that is largely language agnostic, look at our guide on testing your software, which covers automation from the build process to continuous integration and what it can give you.
Want to discuss this post with us? Send us an email or contact us on Twitter @SoftwareSaved.