Software for making sense of the data deluge

Posted by s.hettrick on 4 September 2012 - 9:30am

By Laura Moss, Agent and and Clinical physicist, NHS Greater Glasgow & Clyde and the University of Aberdeen.

As the amount of data collected by sensors and electronic devices increases at an unprecedented rate, it is becoming increasingly more difficult to make sense of what we collect. This problem is thrown into stark relief when one considers that we now create as much data in two days as was collected by the entirety of our civilisation up to 2003 [1]. Large repositories of data are valuable resources to advance our understanding of the world, but exploring and generating hypotheses from the data is far from trivial.

One area that may help in the analysis of large datasets is information visualisation. This focuses on the use of graphical techniques to represent data in a form that is understandable by people, and can be considered as “the use of computer-supported, interactive, visual representations of abstract data to amplify cognition” [2]. Visual data exploration aims at “integrating the human in the data exploration process [and] applying its perceptual abilities to the large data sets available in today’s computer systems” [3]. The visualisation of data allows a person to be directly involved in the exploration process rather than relying on computational data mining algorithms which often lack important domain understanding.

Toolkits and programming languages have been developed to enable developers to generate visualisations by applying these techniques to their own data. Below we briefly describe three of these tools: d3, Javascript InfoVis, and Processing. For a more complete list of available tools see the data visualization website.

Processing

Processing is a popular programming language and development environment for the creation of graphical images, animations and interactions. The Processing language is built on Java, and simplifies Java’s graphical programming elements. Initially, Processing was designed as a visual aid for students learning programming, however, it is now used by a wide range of users including professional designers.

IMAGE 2

Simple Processing Example. Taken from: http://processing.org/learning/drawing/

Benefits of Processing

Processing is open source and useful for learning, quick prototype development, and data exploration. The large community of Processing developers has led to an extensive number of available libraries. Additionally, Processing.js provides a JavaScript implementation of the Processing language, where <canvas> is used as a drawing area. However, as with all of these tools, Processing requires some initial programming knowledge to be able to use it and it possibly isn’t as accessible to all users as some of the other tools.

D3

D3 (Data-Driven Documents) is a powerful visualisation tool that allows you to produce data-driven visualisations, which was developed by the Stanford Vis Group. Unlike Processing, D3 is not a new graphical representation, rather it uses existing web standards (CSS3, HTML5, SVG) to form its vocabulary. D3 is a freely available JavaScript library that allows you to bind data to a DOM (Document Object Model), allowing efficient manipulation of documents based on data. Instead of modifying documents using the W3C DOM API, D3 operates on sets of nodes called selections and “provides numerous methods for mutating nodes: setting attributes or styles; registering event listeners; adding, removing or sorting nodes; and changing HTML or text content.”[5]

Benefits of D3

One advantage of D3 is that rather than using toolkit-specific abstractions, D3 allows direct inspection and manipulation of the DOM. Additionally, D3 supports a very large range of visualisation methods making it a very flexible toolkit. To work with these methods and exploit them fully, D3 requires a high level of skill with JavaScript and JSON.

JavaScript InfoVis

JavaScript InfoVis Toolkit is another toolkit that allows interactive data visualisations to be created on web pages. It is distributed under a BSD open-source licence. Similar to D3, the InfoVis Toolkit allows you to bind data to objects. It is relatively simple to include an InfoVis generated image in your web page: simply select a div container in your HTML and place a <canvas> widget, which displays the chosen image. Data is represented as a JSON structure and passed to the visualisation.

Benefits of JavaScript InfoVis

This toolkit is useful if you require a visualisation which is a little more unusual as a rather eclectic list of data representation methods are supported, including Treemaps, SpaceTrees, and HyperTrees, as well as more basic charts (e.g. bar).

References

[1] Schmidt, E. Techonomy Conference 2012
[2] Card, S, Mackinlay, J., Shneiderman, B. Readings in Information Visualization: Using Vision to Think. Morgan Kaufmann (1998).
[3] Keim, D. Information Visualization and Visual Data Mining. IEEE Transactions on Visualisation and Computer Graphics, Vol 7, No 1, Jan-Mar 2002.
[4] The Table Lens: Merging Graphical and Symbolic Representations in an Interactive Focus+Context Visualization for Tabular Information. Ramana Rao and Stuart K. Card. In Proceedings of the SIGCHI conference on Human factors in computing systems: celebrating interdependence (1994), pp. 318-322
[5] http://d3js.org/