By Mike Jackson.
As part of our work with the particle physicists of the MAUS project, I was tasked with extending their data analysis software to use a distributed task queue. The intent is that data analysis tasks be farmed off to worker nodes to be executed in parallel. MAUS recommended using Celery - an asynchronous task queue based on distributed message passing and implemented in Python. I set about my task with some trepidation, and dreading battles with Linux RPMs and building open-source projects from their source code (which always befall me when not using Java software), not to mention the joys of configuring distributed computing software. However, I found myself very pleasantly suprised!
Celery allows jobs, or tasks, to be executed by one or more Celery workers. These Celery workers run on multiple servers. Each Celery worker can also be configured to execute one or more tasks concurrently. A Celery client creates and submits a task which is dispatched to an available worker node for processing. The client can submit their request synchronously or asynchronously - Celery provides support for querying the status of each submitted task.
Celery has one prerequisite: it needs a message broker to handle message exchanges between clients and workers. Their recommended message broker is RabbitMQ, an implementation of the AMQP (Advanced Message Queueing Protocol) open standard. RabbitMQ's tag line is "Messaging that just works" and, for this user, they lived up to their claim. A Linux yum install just worked, and in a matter of minutes I'd started my RabbitMQ server, created a username and password, a virtual host name and bound the username and password to the virtual host. The virtual host, username and password allow the message broker to be partitioned into discrete sets for use by different applications.
Celery proved equally simple to install. A single-line invocation of Python's easy_install package manager and it was downloaded and ready. Following their first steps, I wrote a simple Celery configuration file specifying the RabbitMQ host, virtual host, username and password and the name of a Python file that would contain the tasks to be executed by the workers. This configuration file is used both by the workers and by the client.
I wrote a Python script containing an example task - a simple add function - and wrote a simple client to invoke the task (two whole lines!). I ran the celeryd command to start a worker. In a separate window I then ran my client, and voila, it worked, just like that. But, this was with client and worker on the same host? How would it fare "in reality"?
I defined a Celery task to execute a common MAUS data analysis pipeline and modified their data analysis framework to invoke this pipeline via Celery. I deployed the MAUS sofware on three separate hosts at EPCC, started RabbitMQ and a Celery worker on one of these, started another worker on a 2nd host and invoked the client on a 3rd and, voila, I could see the data being processed by both workers, in parallel.
I won't taint this happy story with the battles I had in trying to reconfigure tasks dynamically (which Celery doesn't seem to support as much as I wanted it to). To conclude, my experiences with Celery (and RabbitMQ) proved to be smooth, painless and remarkably simple. And I haven't yet explored Celery's support for routing messages, retrying tasks, queue monitoring and management.