Reducing the Distance between theory and practice

Posted by m.jackson on 17 September 2014 - 11:42am

Polar bears

By Mike Jackson, Software Architect.

Clever theory about how to estimate the density or abundance of wildlife is of limited value unless this theory can be readily exploited and applied by biologists and conservationists. Distance sampling is a widely-used methodology for estimating animal density or abundance and the Distance project provides software, Distance, for the design and analysis of distance sampling surveys of wildlife populations. Distance is used by biologists, students, and decision makers to better understand animal populations without the need for these users to have degrees in statistics or computer science. Distance places statistical theory into the hands of practitioners.

The Distance project includes members from the Research Unit for Wildlife Population Assessment (RUWPA), part of the Centre for Research into Ecological and Environmental Modelling (CREEM), and the School of Mathematics and Statistics at the University of St. Andrews, and the National Marine Mammal Laboratory, Seattle, USA.

Distance consists of a Visual Basic interface on top of analysis engines written in FORTRAN, the statistical programming language R, and ESRI MapObjects. These all feed from a shared Microsoft Access database. Distance is distributed as both a Windows-based program and a suite of packages for R. The Windows version provides a GUI. Distance is hosted on GitHub with a forum on GoogleGroups and a Bugzilla bug tracker.

The project have had a number of strategy meetings to discuss the future of Distance and now have a vision of what additional functionality they wish to add. But, they seek fresh insight on tackling a number of challenges. Paramount among these is that the time from the publication of a manuscript to incorporation of the corresponding features into Distance is becoming so long that there is a danger of Distance becoming stagnant. Distance has evolved over the years and, as a result, Distance uses a number of legacy components that are no longer supported by their vendors, and which are becoming increasingly difficult to maintain. In addition, understanding of the inner workings of much of Distance, and how these components are assembled into a release, resides within the mind of a single developer who now has other commitments which prevents them from devoting as much time to the project as they have in the past. The other major challenge that the project wish to address is how best to manage user support without incurring undue demands upon the team. At present, users are reluctant to submit queries via issue trackers, preferring to e-mail project members directly.

To help resolve these challenges, Distance project members Eric Rexstad and David Miller applied to the Research Software Group for help as part of our open call. We are now working with Distance to undertake a comprehensive review of both the Distance software and how the project manages the development of the software. The Distance review will focus on its ease-of-use, its resources for developers (including its GitHub repositories and build-install-and-test processes) and its architecture. The architectural review will serve both to identify options for improving its maintainability and extensibility and also to capture the extensive technical knowledge of Distance currently held by the single developer. We will also provide recommendations on project governance, specifically looking at "how to be a good and friendly Distance developer", a code contribution policy, and a more systematic approach to managing feature requests and bug reporting. It is intended that these activities will help the project to reduce the time between developing good theory, and published papers, and delivering usable code into the hands of investigators.

We look forward to reporting on our collaboration.

For more details please see our "who do we work with" page on Distance.