HomeNews and blogs hub

An open-source software package for single-molecule fluorescence: PyFRET

Bookmark this page Bookmarked

An open-source software package for single-molecule fluorescence: PyFRET

Author(s)

Rebecca Murphy

Posted on 27 June 2014

Estimated read time: 5 min
Sections in this article
Share on blog/article:
Twitter LinkedIn

An open-source software package for single-molecule fluorescence: PyFRET

Posted by s.hettrick on 27 June 2014 - 9:50am

By Rebecca Murphy, a PhD student working for David Klenerman and Sophie Jackson at the Department of Chemistry, University of Cambridge.

Poor documentation, bugs, errors and spaghetti code: the problems in scientific software development are well known. My research field, single-molecule fluorescence (smFRET), faces an additional challenge: a complete lack of any standard software for data analysis. Each research group maintains their own code-base, with software written in programming languages from C++ to Labview.

Whenever methodological improvements are published, we scramble to implement our own versions, based on sketchy outlines crammed into the supplementary methods of high-profile papers. Without access to source code, it is hard to verify that published analyses perform as reported, or that our own attempted reimplementations behave as they should.

Frustrated by the time wasted debugging bloated analysis scripts and embarrassed by the poor quality of my own code, I decided to create an open-source library for data analysis. PyFRET, a fully open-source python library for smFRET data analysis, provides some simple tools for data processing. I hope that it will be used both as a general data analysis tool; a benchmark against which novel analyses can be compared; and a library to which new techniques can be added for general use by our research community.

As a first-time creator of open-source software, I thought that my greatest difficulties would be with writing good, well-structured, usable code. However, with plenty of experience analysing smFRET data, I already knew exactly what my library needed to do. Instead, other challenges presented themselves.

Firstly, there were decisions about what exactly should be included and what data formats pyFRET should support. So far, pyFRET provides the minimal functionality required for analysis of smFRET data and supports direct parsing of the data files used in my research group, but does not implement all of the more advanced algorithms for data analysis. A minimal release allows fast feedback from other users about what else is needed, what is unnecessary and what is hard to use. Releasing a small, well-tested library allows me to make quick improvements to the codebase and to add required features as needed.

Secondly, I needed to learn how to package. PyPI, the Python Package Index, was an obvious starting point, but understanding the requirements for PyPI hosting was challenging. Tarek Ziade's The Hitchhiker's Guide to Packaging, a tutorial for novice distributers, proved an invaluable resource. I also discovered (sadly only after four uploads of broken code to PyPI) that there is a PyPI testing site that allows debugging before live release.

Packaging was one thing, documentation another. For python users, sphinx, a documentation generator that builds html pages and pdfs from restructured text is a very nice tool for creating high-quality documentation. Unfortunately, I found myself utterly confused by its own documentation. Eventually, after some desperate googling ("Why is sphinx so hard to use?") I found a tutorial "for dummies", which allowed me to get started. PyFRET's tutorial, hosted on ReadTheDocs now looks good, but still lacks a detailed reference (coming soon).

My final difficulty came from dependencies. PyFRET uses three other python libraries - numpy, scipy and matplotlib - for numerical programming and plotting data. For a new user, dependencies present a large barrier to using the software. I had hoped that the pip install system used by PyPI would detect and automatically install missing packages. Unfortunately, Matplotlib does not play nicely with pip install and first attempts to install pyFRET on a colleague's laptop ended in failure. We were saved by Anaconda a free distribution of 125 of the most common python libraries for scientific programming. Installing Anaconda is a single step process, converting a nightmare of error messages into a few simple clicks.

Overall, I was surprised by the number of challenges involved in taking an already mature project and packaging it for open source release. Although each individual step - writing clear documentation, packaging, handling dependencies - does not have to be difficult, learning everything simultaneously was a steep learning curve.

PyFRET now meets nine of 11 points on the Software Sustainability Institute's Release checklist and initial feedback from my research group has been positive. But there is still more work to do. After all, the greatest test of scientific software is its use in research. The next challenge is to collect and analyse some real data using pyFRET. Extensive experimentation awaits.

Share on blog/article:
Twitter LinkedIn