By Neil Chue Hong.
Once it has left the confines of your own machine, there are four things that are needed for the successful development of your software: a website, a mailing list, an issue tracker and a code repository.
Although most of the infrastructure needed by your project can be set up on your own systems, there are many tools and services that can help you to develop, maintain and publish your software. This guide provides an overview of the different options for repositories, and looks at some of the decisions you will need to make before choosing a repository. Other SSI guides take a more detailed look at specific repositories.
We've also written a blog post about one of our staff member's experiences of choosing a code repository. It provides further information about which repository you should choose.
Why write this guide?
In September 2010, we received a lot of questions about repositories following the news of the impending closure of NeSCForge (a repository run by the National e-Science centre). We wrote this guide to answer those questions, and to help people choose an appropriate repository for their project. We updated this guide in July 2013 to reflect the increasing popularity of distributed revision control systems, particularly Git. We updated it again following the March 2015 announcement of the closure of Google Code. We updated it again in January 2018 following the closure of CodePlex and to reflect the changing popularity within the research community of different repositories.
Which repository is right for your project?
The first step when choosing a repository is to list your requirements. To help with this process, we have listed the factors that you should consider at the end of this guide. The next step, is to decide whether to use a hosted service, an institutional repository or to run the infrastructure yourself.
What hosted services are available?
Hosted services are generally used when your software project is working with collaborators and committers spread across more than one institution. Some of the more popular public hosted services (according to the number of hosted users and projects) are listed below.
- GitHub (Supports: Git/SVN; Established: 2008; Users: >24m; Repositories >69m including clones/forks)
GitHub provides a more developer-focussed environment (as opposed to a project-focussed one). It has developed a strong following in the sciences (NumPy and SciPy are both hosted on GitHub), and has started to host data as well (and you can use git-annex to manage very large data files). It does operate a fair use bandwidth and storage policy, so excessive use may be throttled. It provides paid organisational accounts which include private repositories. It offers free silver plans offering private repositories to researchers and educators.
- BitBucket (Git/Mercurial; 2008; 5m; >330k)
BitBucket is hosted by Atlassian which is known for its collaborative development products, JIRA and Confluence. Its unique selling point is that it offers unlimited private repositories for up to 5 users, which can be appealing for research and prototyping projects.
- GitLab (Git; 2011; 100k; >500k)
GitLab is a web-based Git repository manager with wiki and issue tracker functionality. As well as hosting projects, it is often installed within an institution or group to provide a local repository that is separate from gitlab.com.
- SourceForge (CVS/SVN/Git/Mercurial/Bazaar; 1999; >3.7m; >500k)
SourceForge used to be the best known software project hosting site, but lost its crown as the largest to GitHub in May 2011. It provides most of the features you would expect from a repository and provides services to help recruit new developers. Some users have found that the server can be a little sluggish at times of high demand, and it is primarily supported by advertisements which may not be appropriate for all projects. SourceForge does not allow access from some countries, most notably Iran and Syria.
- Launchpad (Bazaar, CVS/SVN/Git/Mercurial import only; 2004; >3.9m; >40k)
Launchpad is hosted by Canonical and lists some significant projects as users, such as Ubuntu and MySQL. It provides a system (Blueprints) for feature and specifications tracking and the Soyuz release-management system.
- Assembla (SVN/Git; 2005; 800k; >500k)
Assembla has a strong following amongst smaller companies and has extensive project-management facilities in addition to software-development services
- Savannah (CVS/SVN/Git/Mercurial/Bazaar; 2000; >77k; >3k)
Savannah hosts the majority of GNU software and some non-GNU software. Savannah's focus is on hosting for free software projects. To ensure that only free software is hosted, Savannah implements very strict hosting policies, including a ban against the use of non-free formats (such as Macromedia Flash).
There is a comparison of the features of many open source software hosting sites available on Wikipedia.
It is a general point that almost all repositories cater for open-source licensed projects. These sites are probably not suitable if you have a closed source code-base or a mixed licence product. In addition, you may find that the quality of service you receive is a trade-off between stability through lots of users and depersonalisation of the service.
One exception is Bitbucket which allows for the hosting of both private and public repositories, even under their free plan. GitHub also supports private repositories under their paid plans and offer free silver plans offering private repositories to researchers and educators.
There are also services provided for a particular large community. CCPForge provides a GForge-based repository primarily for the Collaborative Computational Projects (CCPs). This repository hosts both open-source and closed-source projects. The project must include a significant contribution from a UK research group and must be performing publicly funded scientific research. It is worth noting that, although CCPForge has a multiple back up policy, they do not guarantee safe storage of data.
Many organisations run their own version-control services, mailing-list managers and services that provide the full forge-like infrastructure. In general, these services are mainly useful if the committers and developers to your project are based at the organisation that hosts the service - although institutional repositories can usually handle a few external collaborators.
The main advantage of an institutional repository is that it is easy to work out who can help when you need something done. On the other hand, if your project has reached a truly global scale, it may not be appropriate for it to be tied to a specific institution (even if this is legally the case - see our guides on contribution licences).
Running your own infrastructure
It is relatively easy to setup and run your own revision-control system, such as CVS, SVN or Git. It is also possible to run your own software repository using packages such as Gitlab, Trac, GForge, Savane (which powers SourceForge and Savannah), Codendi and LibreSource.
Running your own infrastructure requires a commitment of some time to setup and maintain the installation. However, it gives you the most control over the repository and its customisation. Typically, setting up your own repository is worthwhile if you are already running other infrastructure for your project and you are expecting to host more projects in the future.
However with the increasing popularity of distributed version control systems like Git and Mercurial, it is no longer a benefit to run your own infrastructure because you fear that a third-party repository provider will suddenly close and you will lose access to your source code. With distributed revision control systems, you always have a full copy of your code and commit history, so as long as you have a good backup regime, you are free to move repository providers with relative ease.
Choosing a repository for your software project is not unlike choosing where to host a website. There are many options, from running it all yourself to paying for a fully hosted service. The option you choose will depend on your circumstances - particularly the functionality you require - the amount of effort required to manage the project, the popularity of the service amongst the community you work in, and the size and diversity of contributors to your project.
The most important point to keep in mind when choosing a repository, is that a repository only serves its purpose in the present. You must regularly review the provision from your repository in case you need to migrate to another service in the future.
Factors to consider when choosing a repository
- What functionality do you need now?
- Version-control system, including web interface for online code-browsing
- Mailing lists, list management and archives
- Bug/issue tracker
- Basic web server for project/software pages
- Software package hosting/publishing
- Statistics reporting (e.g. number of commits, number of downloads)
- Project/release management
- Access Control (e.g. setting up project level roles)
- How easy is it to upgrade to additional functionality in the future?
- What is your preferred version-control system, e.g. CVS, SVN, Git, Mercurial?
- Is it important to have your code publicly available?
- Are all your code committers local?
- How easy is it to integrate other things you run separately (e.g. a website) with the repository?
- How good is the support for your IDEs of choice?
- Is there support for authentication systems such as OpenID or SSH keys?
- What additional forge, social networking, project-management functionality do you want from the site? e.g. GitHub is good for social coolness
- Where are similar projects to yours hosted?
- What's the speed of upload/download?
- How easy is it to backup the entire repository (code, mailing lists, issue tickets, ...)
- How established and stable is the repository?
- How good is the user support?
- How much effort do you have to put into repository maintenance?
- Would it be better to use more than one repository, e.g. code stored in GitHub and a link to Assembla for its extra tools?
- What are the Service Level Agreements for uptime, downtime, time to fix outages and bandwidth?
If you are trying to migrate your project from one repository to another, you might also want to consider two extra factors:
- How easy will it be to transfer not just your code, but your community, to the new site, e.g. do you have mailing list archives, wikis, user accounts
- Do you need to keep the revision history associated with your code, or can you start afresh?