Reducing the time between developing good theory, and published papers, and delivering usable code into the hands of investigators.
Clever theory about how to estimate the density or abundance of wildlife is of limited value unless this theory can be readily exploited and applied by biologists and conservationists. Distance sampling is a widely-used methodology for estimating animal density or abundance and the Distance project provides software, Distance, for the design and analysis of distance sampling surveys of wildlife populations. Distance is used by biologists, students, and decision makers to better understand animal populations without the need for these users to have degrees in statistics or computer science. Distance places statistical theory into the hands of practitioners.
The Distance project includes members from the Research Unit for Wildlife Population Assessment (RUWPA), part of the Centre for Research into Ecological and Environmental Modelling (CREEM), and the School of Mathematics and Statistics at the University of St. Andrews, and the National Marine Mammal Laboratory, Seattle, USA.
Distance has its origins in distance sampling methods developed at the University of St. Andrews in the early 1990s. Distance has been supported by a number of funders including EPSRC, JISC, the Wildlife Conservation Society, and the US Office of Naval Research, Marine Life Sciences program amongst others.
To date there have been over 1200 papers published that make use of these methods. Many of those papers are biological in nature, and describe the use of Distance to answer questions of applied biological interest. The project's research is used to extend the capability of these methods for a greater number of situations. The project's suggested citation (Thomas et al. 2010) has been cited 585 times since publication (according to Google Scholar).
Distance itself has been downloaded over 30,000 times in the past 15 years. The primary users are biologists seeking to analyse field data as well as students being trained in wildlife population assessment methods. The project also delivers training workshops both in St. Andrews and, occasionally, in other countries. These have had over 1000 participants from over 50 countries over the past 20 years.
Distance consists of a Visual Basic interface on top of analysis engines written in FORTRAN, the statistical programming language R, and ESRI MapObjects. These all feed from a shared Microsoft Access database. Distance is distributed as both a Windows-based program and a suite of packages for R. The Windows version provides a GUI.
Distance is hosted on GitHub with a forum on GoogleGroups and a Bugzilla bug tracker.
Distance is currently developed by one full-time programmer, one part-time programmer and five academics engaged in some level of software development. It is envisaged that this effort will remain in place for the next 1.5 years. Funding is continuously sought to support future development.
Questions about assessment and management of animal populations are becoming increasingly complex. Consequently the theory associated with answering those questions becomes more complex. As a result, the cycle time from the publication of a manuscript to incorporation of the corresponding features into Distance is becoming so long that there is a danger of Distance becoming stagnant. There are a number of reasons for this.
Distance includes legacy components from many sources. A number of these components are no longer supported by their vendors and the project finds it increasingly difficult to support these myriad components. Additionally, there is no current documentation as to how these components fit together. Understanding of the inner workings of much of Distance, and assembling these components into a release, resides within the mind of one individual.
Distance's GUI is written in Visual Basic 6. Bundling this with R is a challenge and Microsoft have now ended support for Visual Basic 6. The project are considering various open source alternatives and restricting the GUI to support a simple subset of Distance's features. However, the GUI is seen as essential in facilitating uptake of Distance amongst biologists.
E-mails to specific project members are used by both users and project members for feature requests and bug reports. Bugzilla receives little use and some project members are resistant to using GitHub issue trackers. A greater challenge is encouraging users to submit queries via issue trackers rather than e-mails.The project wants to understand how to promote the use of issue trackers and, more generally, organise their software development more effectively.
The project have had a number of strategy meetings discussing the future of Distance and have a vision of what additional functionality they would like to be able to add. But, they seek fresh insight on tackling these challenges.
To help resolve these, Distance project members Eric Rexstad and David Miller applied to the Research Software Group for help in software development and open source best practice as part of the Institute's open call.
We are now working with Distance to:
- Review the Distance code and components, its GUI, documentation, GitHub repository, testing process, and build and installation procedures.
- Review Distance's architecture to identify options for improving maintenance and ease of adding new features.
- Capture the extensive technical knowledge of Distance held by a developer who now has other commitments which prevents them from devoting as much time to the Distance project as they have in the past.
- Write a "how to be a good and friendly Distance developer" set of guidelines, as a code contribution policy.
- Advise the project on governance, specifically how to move beyond an email-based system of selecting work items e.g. by using a GitHub issue tracker.
Our collaboration is intended to deliver a number of benefits. Most importantly, there are statistical questions that cannot be addressed by Distance at present but for which methods have already been published. Improving the extensibility and modularity of Distance could help to reduce the turnaround time between publication of methods and the release of their implementations within Distance.
Capturing the extensive technical knowledge of Distance currently held by a single developer, helps to increase the project's truck factor. It also makes this information more accessible to current and potential Distance developers. Developers interested in extending, modifying, or fixing Distance will be able to more readily understand the architecture of Distance, to implement their changes, and know how contribute these changes to Distance, if they wish to do so. This can make it easier to bring newly-funded developers up to speed with Distance development.
Finally, the Distance project will be able to manage feature and bug requests in a more open, and systematic way.