By Steve Crouch.
Avoid dependency problems when developing software
When your software needs more functionality, there's no need to waste your time re-inventing the wheel. Instead, you can give your software the functionality it needs by reusing other people's software, such as code libraries or packages. Of course, if you rely on someone else's code, you become reliant on them not changing the code so that it stops working with your software. You can defend against these dependency problems with some forethought and a bit of research.
In this guide, we will describe how to choose software and develop code to avoid dependency problems.
Why write this guide?
This guide is an extension of an AskSteve! post, which itself resulted from a question raised at a workshop for SeIUCCR Community Champions.
Why it's important to take version dependencies into account
A poor choice of dependent software can have expensive consequences. You may have developed a fully functional software system on your development environment, but deployment in the target environment yields a host of compatibility problems and runtime errors rendering your software useless. For software dependent on a host of packages, resolving these issues can be incredibly time consuming and stressful, particularly if deployment is left to the last minute!
Problems can occur if your software relies on a specific version of dependent software, and that dependent software is not available on a user's machine. This often happens when scaling out the deployment of your software. If the software you developed in-house is picked up and used by a user community, it will inevitably be used on a wider selection of operating systems and environments than you have used to test your software. Alternatively, problems can occur if your software is used on a bespoke system, such as a state-of-the-art production grid infrastructure.
If the package is available on a user's machine, but the required version is not, you will find that the code will fail due to a versioning conflict. This might happen during deployment (because the version isn't available) or during execution (due to incompatible library APIs between versions). In many ways a version dependency failure during execution is worse, because the dependency problem might go unnoticed for some time.
The infamous ClassCastException in Java is a good example. It can occur at runtime when either the required class is not visible on the classpath of the class loader, or an application doesn't use a class loader API correctly. This problem may not arise until an infrequently-used code path eventually highlights the dependency issue.
We live in a world where software platforms are regularly updated. This increases the risk that a dependent package could change underneath your deployed software and cause your (previously working) software to fail. If you think it's bad when you find these problems at the last minute, it will be even worse should your users find them!
Develop your code defensibly
In general, it's a very good idea to develop your code defensibly by keeping in mind that the underlying software can change when making implementation decisions. The main things to keep in mind are:
Always avoid using deprecated interfaces: support for these may disappear in future versions!
Try to keep the development and intended deployment environment dependencies as similar as possible
Do your best to avoid specific software version dependencies
Try to develop your code in an operating system 'agnostic' manner. For instance, making sure it's able to run on Unix rather than just GNU Linux, where command parameter support can differ
Set up a test infrastructure that allows testing under various environments that represent those of your target users. For example, the NMI Build and Test System as used by Globus. Also, cloud computing infrastructures commonly provide access to virtual machines based on a variety of operating systems,, such as Amazon's EC2.
Test your software on the target platform regularly wherever possible. If the software is to be deployed on an infrastructure, see if you can obtain access for testing
Loosely couple your code to dependent software: use fewer higher-level interfaces where possible as opposed to many low-level function calls
Use abstractions where appropriate: separate your business logic from the underlying dependent software by creating an abstraction layer that allows you to readily plug-in different implementations
Allow the software (or even the end-user themselves) to be able to sanity-check their environment is correct
Of course, always adopt good software maintenance practices
Should you include the dependent software?
Sometimes you can include the dependent software itself within your own deployment. It is common practice to do this in open source software, with things like libraries (Java JAR files or even Perl libraries) or applications, but sometimes this is neither a good idea nor possible for technical, effort or licensing reasons. Plus it means you have more to maintain, your software becomes more internally complex, and you can still end up with conflicts if not done correctly.
As an example, if the target platform is the UK NGS and the dependent software is a high-level scientific application such as DL_POLY for chemistry, then DL_POLY's licence explicitly forbids redistribution, and the software is already installed on the NGS.
In addition, whilst a developer may view the dependencies as separate products, the user just sees one package. So if they run into problems (even with the dependencies), by default it'll be you they ask!
A sensible alternative to bundling dependent package is to use a dependency manager, such as Maven or Ivy for Java.
Where will the software be used?
Before choosing dependent software packages, ask yourself a few questions about where the software will be used:
If your software is intended for an end-user, where will the software be deployed: which operating systems, platforms and environments will be used? Does the software needs to operate cross-platform (e.g. Linux, Windows)?
If you will eventually deploy your software on a large production-level infrastructure, such as an HPC, Grid or cloud environment, what are the constraints for deploying software? Can you readily deploy, or arrange for deployment, of other third-party packages you might need if they're not already available?
If you're developing a software library, which development platforms are commonly used in your community?
What is the expected lifetime of your software? You don't want support for dependent software to evaporate during the lifetime of your software!
Accounting for the intended user community and their deployment environments allows you to be better informed when selecting a software package, and a version, to use for a given task.
Choosing the right software
With tight deadlines, it's all too easy to reach for the nearest software library or package. Especially if it's already installed on your machine, or it's the first thing you come across that works. Before you use the nearest software, there are some things you should consider. More detailed information about this aspect can be found in our guide Choosing the right open-source software for your project, but in summary:
Has your community converged on using a particular software package?
Are all the features you need, and think you will need, included?
Does the software have evidence of a sustainable future (e.g. is there a roadmap)?
What are the mechanisms for supporting the software (community support, direct email, dedicated support team), and how long will the support be available? The more support, the better
Is there evidence of a large user community on user forums or e-mail list archives?
What is its deprecation or versioning policy? Does it have one? If not then it may be more unstable and features may disappear without warning between versioning, especially if releases are frequent.
What are the frequency of releases?
Does the software support open standards? If it does, it will be easier to replace the software should it come to the end of its lifetime
Are there examples of using the software successfully in the manner you want to use it?
Does the version you intend to use come from a forked open-source project, or is it from the original source project? If so, which source is more appropriate?
Are there any alternatives to the software?
In respect to the software licence, do you have the right to use the software in its intended production environment, or the right distribute it along with your software?
It's certainly worth finding out what your community will expect. For example, if you're developing software for a production grid infrastructure, or something similarly sophisticated, you could save a lot of time by finding out which packages are already installed.
Choosing the right version
You've identified some appropriate software - but which version to use? Of course, using the latest stable release (and not a development or snapshot release) is often the best course of action. But there are some things to consider:
Are there any known issues or bugs with the version that could cause problems?
Are there any features you need in the version that are likely to be deprecated in the future?
If the version installed on the production environment is likely to be upgraded, is it likely to cause any compatibility problems in the future?
The key is to be aware of dependencies in your software and its intended operating environment, and to take into account the changeability of the software's operating environment over time. Medieval castles were built to defend against hostile forces and environments - why not do the same with your software?