Building a long term sustainable future for your software through Outreach - Part II

Posted by s.aragon on 31 May 2019 - 9:02am
tree growing tall
Photo by Jeremy Bishop

By Mario Antonioletti, Edinburgh Parallel Computing Centre and The Software Sustainability Institute.

Read Part I of this post.

In this second post, we argue that in order to have a sustainable future you must not only employ good software techniques but also ensure that you create a future workforce that can develop and/or want to use your software.

One way to invest in the future is through outreach activities to ensure that young people are interested in your field, motivated to undertake further study in it, so that, when the time is right, they are able to replace you when you are no longer there. In particular, we use the experiences and the perspective gained at EPCC, a high-performance computing centre that runs the ARCHER UK National Supercomputing Service and the outreach activities that have been developed and used there.

The first post in this series covered how a small Raspberry Pi based cluster, Wee Archie, could, with some simple applications, be used to demonstrate the general principles of how supercomputers work and how they are used. Moreover, it could be taken to events, unlike the actual supercomputers. In this article, we describe one of the physical activities, where through a series of actions you enact part or all of an algorithm (a series of steps) towards solving a given problem, which at little to no cost can be used to compliment such an effort. We shall do this by describing the evolution of one of these activities. The activity we will be describing implements a simple sort – we started by sorting numbers to sort but moved on to using coloured objects.

Our first attempt at undertaking a sorting activity consisted of performing a parallel number sort using “message passing” techniques: that is to sort a set of whole numbers randomly distributed amongst the participants - we want to sort all numbers along the line in either ascending or descending order. Each participant was given a set of numbers each written on a bit of card and the participants were then arranged in a line. The objective was to sort all of these numbers globally, i.e. across the whole line, first sorting locally at a person level, and then each person swapping (the message passing bit) their highest number with one of their neighbours in one direction, if their highest number was higher than the neighbour’s lowest number. Then you do the opposite with the other neighbour in the other direction. This would continue until no one was able to do any more swaps which mean the numbers were sorted across the line of individuals. The details here are not important. If you’re interested, you can find the detailed instructions here, but you can see that the process is rather involved. It requires someone to coordinate and this does not work well in a high throughput environment, such as at a Science Festival, where you will usually see a large number of people for a relatively short amount of time as opposed to a classroom environment where you would have the same number of people for a longer time. Although people seemed to enjoy participating in such an activity they were not always any the wiser by the end of it. It required a bit of rethinking.

We thus simplified the framework and tried to use what is referred to as an embarrassingly parallel example (no dependencies between the tasks so these can be worked on independently, i.e. no communications required): sorting coloured balls into the corresponding coloured bucket with a time constraint is such a task. Using a 30-second timer, we get an individual to sort as many coloured balls into buckets as they can. The task is very simple to explain, even children that can’t yet read can have a go (as yet we have not encountered a problem with colour blindness). Once the time is up, we count the balls sorted in the buckets and then get others involved – their peers, family members, etc. – using multiple people to perform the same task at the same time. One thus demonstrates that, by working together, more can be done at the same time, exactly the same principle under which parallel computing, used by all supercomputers, operate. The set up is beautifully simple and can be used to illustrate so many principles: why performance drops as more users become involved? contention. You can talk about speed-up curves if you plot the performance of individuals as you go along (see below), etc.

chart

Each dot documents the number of objects sorted in 30 seconds for 1, 2, 3, etc. people. We can see that as the number of people increases the expected performance from a straight line drops.

The equipment is not expensive. We started off with a set of tarted up boxes, moved on to using trugs and to a more sophisticated set-up and replaced the balls with bean-bags.

boxes boxes & buckets image2_12.png

Evolution from left to right of the props used for the sorting activity starting from some very simple kit through a gradual and more sophisticated (and expensive) set-up.

Thus we can see how this activity has evolved to something that works well in many different environments. The sorting of coloured objects is self-explanatory, it works well in high throughput environments such as science festivals (we have used it many times in such environments), it is enormously fun for the participants and many different analogies can be drawn with state of the art. Moreover, it fits nicely into a narrative with Wee Archie, discussed in the first article: the sorting demonstrates how parallelism allows you to achieve more in the same amount of time, We usually have some non-functional motherboards from past and current supercomputers that show how multiple processors, corresponding to people in the sorting activity, are used in the same way – to do more in the same amount of time and then finally Wee Archie is a live machine that illustrates the same principles. To me, that tells a compelling narrative.

We are constantly trying to improve and evolve our activities. I have only described one but there are several others. We have written these up and made them available on GitHub so that others can use these, share their experiences as well as developing new ones and hopefully contribute back. At some point, we also want to share the software-based activities developed for Wee Archie. We have also written up instructions on how you can build your own Raspberry Pi clusters. To have a long term sustainability for your software, you need to not only employ good software practice but also think about encouraging others, especially the young, to appreciate your domain and the impact it may have on their lives and their peers and to ultimately get them involved in this process.