HomeNews and blogs hub

Open Source Community, Simplified

Bookmark this page Bookmarked

Open Source Community, Simplified

Author(s)
Neil Chue Hong

Neil Chue Hong

Director

Posted on 28 February 2011

Estimated read time: 12 min
Sections in this article
Share on blog/article:
Twitter LinkedIn

Open Source Community, Simplified

Posted by n.chuehong on 28 February 2011 - 8:00am

Crowd2.jpgSteve Bennett, a Research Data Analyst from the Australian National Data Service, who I met when giving a talk on software sustainability at e-Research Australasia last year, pointed me at a great article by Max Kanat-Alexander from the Bugzilla Project on Code Simplicity which examines the factors leading to the growth of an open source community.

Max notes that: 

"Growing and maintaining an open-source community depends essentially on three things:

  • Getting people interested in contributing
  • Removing the barriers to entering the project and contributing
  • Retaining contributors so that they keep contributing

If you can get people interested, then have them actually contribute, and then have them stick around, you have a community. Otherwise, you don’t."

Importantly, he also makes the observation that:

"If you are just starting a project or need to improve the community of an existing project, you should address these points in reverse order. If you get people interested in a project before you do the later two steps, then people won’t be able to enter and won’t stick around when they do enter. You won’t actually expand your community. So first, we want to be sure that we can retain both existing and new contributors. Once we’ve done that, then we want to remove the barriers to entry, so that interested people can actually start contributing. Only then do we start worrying about getting people interested."

The article goes on to provide some very useful tips about how to achieve those three points, and I wondered how some of these might apply in the research software space. Although much research software is open-source licensed, it's not always the case that it follows a pure open-development model and I got fellow SSI'er, Mike Jackson, to comment based on his experience, particularly with the OGSA-DAI project:

Don’t freeze the trunk for long periods

For most of the research software projects we've worked with, I think this still holds true. There's one caveat I'd add though: their create a release branch idea has merits but there is, as they admit, the overhead of double committing bug fixes to both trunk and branch, which can become confusing and runs the risk of a developer forgetting one or the other commits.

We've never really had this as an issue on the OGSA-DAI project, because we've not really embraced the whole trunk/branch thing and have been (reasonably) content to have changes right up to a week or so before a release. A comprehensive automated test system really helps in this respect, since as a release manager you can just say "when it's next green we'll release".

Turnover is inevitable

With the OGSA-DAI project, our contributors are primarily researchers who need OGSA-DAI for their research. Naturally, researchers tend to move on when their project ends, because they lack the funding or time to continue to contribute. The number of people who contribute outwith work (belonging to that happy to program as a hobby group) is small indeed. This is a peril of the researcher/developer: in general you want to spend your time on the research, not the development, and so once the code is acceptable or the research focus changes, the researcher moves on to new projects.

Respond to contributions immediately

Before OGSA-DAI moved to the open-development model, our policy was that major contributions to the code were subject to a strict contribution review. Contributors were thanked, but reviews were not necessarily completed - and were instead sacrificed on the altar of deadlines and left hanging on a Bugzilla ticket. Of course, this doesn't help the contributor who wonders 'why haven't they done anything with it?'. An agreement at the outset as to how to handle all contributions and prioritise them might have ensured that all contributions made it into OGSA-DAI.

Max Kanat-Alexander states that contributors 'don't (usually) mind having to revise a contribution. They even generally don't mind revising it several times' which is a fair point. But, on the other hand, it is time consuming, and can be demoralising to keep having to review the same contribution multiple times ('I'm a software developer, not a programming teacher'), especially in the face of pressing deadlines. I suspect that this points to a greater issue concerning the level of software-development experience in the research community.

Be extremely kind and visibly appreciative

It's easy to thank contributors; correcting people on their faults less so. It can be daunting to tell someone, especially a stranger, that their code is flawed! It also comes down to the individual habits of the developers - sometimes it may seem easier (see the multiple iterations point above) to just quietly fix it.

Encourage a total absence of personal negativity

I think we've managed this on the OGSA-DAI project though it was never explicitly addressed at the outset. The distinction between job and hobby may make a difference here, because we're representing our employer in our interactions with the community. A hobbyist has no such restraint.

It could be useful for new projects to explicitly think about and set out their ensuring cordial relations with our community vision. This is particularly important if you are likely to get a high proportion of student contributors: they are more likely to ask RTFM type questions, and indeed the hardest group to respond to are those who are effectively asking you to do their project assignments.

Certain requests can tempt a sarcastic response. While this may prove to be a good release from frustration, it will also appear extremely negative to someone browsing e-mail archives - especially if they come across the response out of its context of a dozen inane emails from the same person. Certainly, I've seen responses of this sort in other open-source project archives.

Provide a list of easy, starting projects

A list of easy, starting projects can prove especially useful for research-software projects, because they can serve as a basis for MScs and also, from a project perspective, in inducting new team members. Of course, providing such information does not guarantee that a potential contributor will read it. (We've had people ask to work on OGSA-DAI without any explanation as to why they want to work on the project, their motivation, interests or even any sign that they knew what OGSA-DAI does!)

Create and document communication channels

Max Kanat-Alexander makes a good point about using communications channel. This is very important! Communication is of great relevance to any project that's moved to becoming open source. We still make some decisions face-to-face without emailing our public committers list, even after the fact - old habits die hard!

Running an IRC can create an expectation that there might always be someone there, and, just because someone is there, that they can help right now. There is the risk that the IRC will be hijaked as contributors go off-topic, because they have moved to new projects, but continue to hang around on IRC. On the other hand, development of contributers can help build a sense of a proper community, and it is a great equaliser when a team is clustered with groups of developers from across the world.

Excellent, complete, and simple documentation describing exactly how a contribution should be performed

Documentation is one of the hardest things to crack. We've had a documented contribution process for OGSA-DAI for a while, dating to the days before we were an open-source project. It could be improved - currently it's read the governance model and sign a form. It would be useful to clarify the levels of contribution, what these mean and how to do it:

  • Add/correct documentation via mailing list or ticket: you'll be added to the acknowledgements. E-mail/ticket.
  • Submit patch via email or ticket: you retain copyright and will be added to the acknowledgements.
  • Complete contributor form and contribute component via email or repository: you retain copyright, and are added as co-author.

The way that copyright of contributions is handled can affect the inclination to contribute. The OGSA-DAI model used to be that all copyright would be signed over to the University of Edinburgh as the project custodians. Now, contributors retain copyright but grant us permission to use/exploit as we want, which is a more lenient model that gives the contributors ownership over their contribution.

Another barrier we've encountered is the "we would contribute but our code it isn't ready, it's not quite good enough". In some cases no amount of pleading or reassurances can overcome it (and probably proves a bit awkward for both parties!). I'm not sure why this arises or how it could be overcome, but persistance is probably the key. Keep in contact with the potential contributors and persuade them, where appropriate, that they write good code.

In some cases trust and ownership become issues. Contributors may be reluctant to hand over their
contributions because they want to retain ownership and control of their code. In an academic context, this might arise from a fear that authorship will be falsely attributed or proper acknowledgement may not be
given. Certainly I can think of one example where OGSA-DAI code was taken, the existing copyright removed and another organisation's copyright added.

Contributors may also fear that their code will be changed for the worse, and this will reflect badly upon them as the original authors (especially if the authors of the changes aren't recorded). In the academic and research world, your reputation is your greatest, and often only, asset.

Make all this documentation easy to find

I find myself disagring with Max when he says that "having everything documented and clearly stated on a public website meant that we no longer had to personally explain it all, every time, to every new contributor." The best, most complete documentation in the world is not much use if people don't bother to read it! In my experience, the same questions can come up again and again. In this case, it's time to review the documentation. Perhaps it isn't easy enough to find, or structured in an easy-to-understand way?

Getting People Interested

There are four motivations for contributors:

  1. They like helping
  2. They enjoy being part of a community
  3. They want to give something back
  4. They think that something is wrong and they need/want to fix it

I think that virtually all of our contributors are in the last category. There are a few, particularly those picked up through initiatives like the Google Summer of Code and research visits, who fall into the second category. In general, it's probably true that in the research community, it appears that you'll gain more by reinventing something if it's wrong and needs fixed, so even the fourth category can be difficult to find and retain.

Be a super-popular product

One advantage of the academic/research origins of OGSA-DAI was that we evolved out of a desire to research distributed data management. We were one of a number of such projects, some of which included us as a necessary project for their research. Would early uptake, and continued funding, have been that way if we'd been a company or a group of hobbyists starting up out of the blue?

In general, it's not easy to be a super-popular product in academia - it will very much depend on whether you're in a super-popular research domain (popular, that is, at a given time). That said, software such as R, BLAST and Taverna have built up strong and passionate followings by starting small and focussing on the people they already have on board.

Write in a popular programming language

The difficulty with programming languages is that what is a popular in general, may not be a popular within a specific research community. This can also lead to problems when you have a turnover of contributors. Steve Crouch from the SSI has been covering the question of "To C or Not to C – Which Programming Language Should I Use?" on the Ask Steve blog.

Getting out and about

One big advantage of being an academic, open-source project is meetings, conferences, workshops and the like are part of the job and are funded. This is particularly good for getting out into different user communities, but could also lead to conflict between the desire to develop and the desire to disseminate.

Overall, there are many excellent points made in Max Kanat-Alexander's Code SImplicity article, which are relevant to open-source projects based in academia. If you can deal with all the odd student requests, the contributors drifting away to pursue other research, and overcome the issues of trust and confidence then you too can develop into a successful, strong open-source or open-development project.

Share on blog/article:
Twitter LinkedIn