Crowd-sourced computer networks

By Phil Fowler, Postdoctoral Researcher at  Department of Biochemistry, University of Oxford. 

crowd.jpg

What is crowd-sourcing? Well, it is not a great name, at least for the type of crowd-sourcing I'm going to talk about - is sounds like something a film producer would do. The type of crowd-sourcing I'm interested in harnesses thousands of individual computers to run a series of complicated calculations or simulations. So what types of problems is it used for and where did it come from? 

Crowd-sourcing has its origins in screensavers, such as SETI@home, which analyses data collected from radio telescopes for patterns that might indicate intelligent life, and the Screensaver Lifesaver project [1], which looked for molecules to inhibit a range of drug targets. Both of these successful projects encouraged anyone with a computer to download some software (a client) that processes data from a project server. The client runs when the user has stopped interacting with the computer, so it does not interfere with the user's work. If the user resumes using the computer, by pressing a key or moving the mouse, the client suspends itself - hence the analogy with screensavers. These projects grew out of other, more homogenous distributed grids that I won't talk about here.

The first mention of the term crowd-sourcing didn't occur until around 2006. It is a broad term that includes a wide variety of disparate activities, including crowd-funding. Perhaps a more accurate but wordy description of the projects above would be massively distributed heterogeneous volunteer computing grids (even if you cross out the odd word, you can see why this isn't the title of this post).

Distributed computing projects have been very successful in my field of computational biophysics, starting with Screensaver Lifesaver then folding@home and, more recently, FoldIt! The distributed network of computers with Folding@home installed has been the fastest computational entity on the planet at various times, breaking the the 1 PetaFlop barrier back in 2007. Each computer simulates for a short period of time how a particular protein chain moves (and crucially whether it changes conformation). The data are then communicated back to the central Folding@home server which clusters and analyses the results before deciding on the next round on simulations. Gradually, a kinetic model of how the long, floppy chain of a small protein folds up to form the native, compact, active structure can be constructed, using a technique known as Markov State Modelling.

FoldIt! tackles the same problem, that of how proteins fold, but changes it into a game instead. Instead of running calculations, a player can drag different parts of the protein chain around on their screen to get a better fit (the best way to understand the game is to try it yourself). Remarkably, the community of players have, in a few cases, been able to improve predictions made by a more conventional technique [2]. This is all the more impressive as it was done during a competition where the final folded structure was kept secret (CASP).

Such crowd-sourced approaches allow anyone to participate in a research project, and this naturally leads to the engagement of the public and the open communication of science. The aim of distributed computer grids often overlaps with the open-science movement, for example, the players of FoldIt were formally listed as one of the resources in recent papers from that project [2].

There are obvious technical challenges in setting up and maintaining a massive heterogeneous distributed computing grid, which have been eased by the development of an open-source framework:  BOINC. This framework, spun out of the SETI@home project, has become the de facto standard, allowing a wide range of crowd-sourced networks to be set up in the past few years. The availability of a trusted, mature software framework is tremendously important for this field as the client must be reliable, unobtrusive and secure.

Although BOINC clients can be installed on a wide range of operating systems this means that a project to maintain a large number of different versions of its client. Recently, projects have started to embed the BOINC client inside a minimal Linux virtual machine which, given the appropriate software, can run on any of the major operating systems. Computers now encompasses games consoles, tablets and smart phones. It is difficult to see how compute-intensive tasks can run on mobile devices, but games-based approaches like FoldIt may be even better suited.

There is every reason to expect that innovative crowd-sourcing approaches will continue to emerge and help answer important scientific questions. This will be driven by the desire to engage with the public, and the relentless increases in both the computational power of consumer devices and network speeds. To achieve these benefits scientists must reach out and work with computer scientists to build and maintain the software that underpins a crowd-sourced computer network.

References

1. Richards WG (2002). Virtual screening using grid computing: the screensaver project. Nat Rev Drug Discovery, 1 (July), 1–5.

2. Eiben CB, Siegel JB, Bale JB, Cooper S, Khatib F, Shen BW, Foldit Players, Stoddard BL, Popovic Z, Baker D (2012). Increased Diels-Alderase activity through backbone remodeling guided by Foldit players. Nature Biotech, 30(2), 190–2. doi:10.1038/nbt.2109

Posted by a.pawlik on 11 October 2013 - 11:07am