Top tips on software for data visualisation

Posted by a.pawlik on 4 October 2013 - 1:43pm

Spirographs.jpgBy Jon Blower Technical Director at Reading e-Science Centre

You've finished your analysis and now you would like to present it to the world, preferably on a pretty but informative picture. But where do you start and what software should you use? Here are some top tips to help you out.

1. Remember the three concepts of visualisation

Communication: when you have a particular idea or concept to get across to a wider audience. Good graphic design and a clear layout are very important.

Confirmation: when you have a known hypothesis to test, and you want to produce a picture to test whether it is correct. Repeatability is important, since you may wish to generate the same kind of picture many times using different data. If your picture forms an important part of your conclusions, it should be possible for a reviewer to check that your analysis has been done correctly. This is where scripting languages (particularly open-source ones) can really shine.

Exploration: interactivity is important when you have a new, perhaps complex, dataset that can be viewed in many different ways - especially if you’re not even sure yet what questions you need to ask of the data. Most scientists aren’t taught how to use interactive tools, so this area of visualisation is somewhat neglected. See below for some suggestions!

Different tasks require different kinds of tools: you shouldn’t expect your scientific analysis package to produce great infographics for a newspaper article. Similarly, interactive plots are sometimes unsuitable if you’re trying to get across a very specific message, because a user can follow their own path through the data, and this may not be the path you’re trying to guide them down! It’s worth spending some time to learn different kinds of visualisation software – fortunately it’s getting easier and easier to try out new packages.

2. Break away from Excel! 

Excel is great for quick plots and interactivity, but not so good if you need repeatability. Spend a little time learning the basics of plotting in an open-source scripting language like Python or R. The Flowing Data website has a great guide by Nathan Yau to get you started.

3. Separate your concerns

A typical script will have code for extracting data from a file, processing the data and visualising them. It’s worth separating these three processes into separate functions so that your visualisation function doesn’t need to know anything about the original file format. This makes your code flexible if you want to reuse the visualisation code for different data in future.

4. Don’t automatically accept your software’s defaults on styling

Spend a little time learning some good-practice guidelines on how to make effective plots (three examples: Better Figures, Elegant Figures, Figured). In particular, watch out for colour scales: default scales are often unattractive, inappropriate for many kinds of data or unsuitable for the colour blind.

5. Get interactive!

It’s easier than ever to create interactive visualisations that help you to explore your data, but you may have to break out of your comfort zone a little. Here are some packages that will help:

  • CartoDB makes it very easy to generate interactive data-driven maps
  • You can easily generate all kinds of interactive visualisations in Javascript with d3.js and Flot;
  • Plots generated by Python’s Matplotlib and Java’s JFreeChart can also be interactive – check out the documentation for help.
  • There are a load of great tools out there. The Creative Bloq website provides a rundown of the top twenty visualisation tools.
  • JSFiddle.net is a very useful site for playing with Javascript without the need to set things up by yourself, and you can share your experiments with your friends.