By Simon Hettrick, Deputy Director.
Over the last few months, we’ve been working on improving our understanding of the size of the research software community. In previous posts, I’ve discussed our plans for this research. Although we've not yet finished our analysis, we thought that it would be interesting to release some early results. First of all, how much money do the Research Councils invest into research that relies on software? The answer: at least a third of the entire RCUK budget - or £840 million in 2013.
The UK Research Councils and Technology Strategy Board (TSB) have been investing, at a minimum, around 30% of their total budget for project grants into software-reliant research, which is £840 million in the financial year 2013-14. We expect the actual investment to be significantly larger than this figure due to the fact that software is rarely discussed in the title or abstract of a grant - data on which this research relies.
Investment per council into software-reliant research is relatively stable, with an average increase of 3.3% over the last four years. Notable exceptions to this rule are the TSB which, despite a significant increase in total research investment, has invested a lower percentage of those funds into software-reliant research over the last three years. The AHRC is the only council to have seen a double-digit increase (an average increase of almost 20% per year) in investment in software-reliant research over the last four years, although the NERC is close with an average increase of 9.2% in software-reliant research per year over the same period. Further research is required to identify whether these significant differences are caused by a single large research project, or whether they are representative of a change in policy at the Research Council in question.
The code and data can be found at DOI:10.5281/zenodo.2553194.
This post discusses early results from our investigation into funding of software-reliant research. It is based on an analysis of the data that is available on Gateway to Research. This data includes project funding by the Research Councils, but it may not include investment by other means (e.g. direct funding or funding through a centre).
The analysis is based on searching grant titles and abstracts for terms that indicate a reliance on software. This means the analysis will overlook any grants in which the software reliance is only identifiable in the main body of the grant application. Post-analysis of the results is yet to be conducted to determine a confidence level for the search terms we have used.
The data on Gateway to Research relates to funding from the AHRC (Arts and Humanities Research Council), BBSRC (Biotechnology and Biological Sciences Research Council), EPSRC (Engineering and Physical Sciences Research Council), ESRC (Economic and Social Research Council), MRC (Medical Research Council), NERC (Natural Environment Research Council), STFC (Science and Technology Facilities Council) and TSB (Technology Strategy Board).
Tracking down the data
It’s very easy to find information on the accounts of a Research Council, or to find the size of an investment in a particular theme, because these numbers are made publicly available. It’s somewhat more difficult to find out how much funding goes towards something as ubiquitous as software, which is used extensively in research but is rarely viewed as the star of the show.
The RCUK funds a website called Gateway to Research (GtR), which was developed, as described on the site, to "search and analyse information about publicly funded research". It provides access to around 50,000 research projects funded by the seven Research Councils and the Technology Strategy Board (TSB). The website provides the title and abstract for all grants and other data, such as the start and end date of the project and the size of the award (i.e. the amount of funding).
Rather than use the application programming interface (API), we found it easier to simply download all data from GtR as a comma-separated values (csv) file and then interrogate the data locally. Once we’ve published this first round of results, we are keen to take a second look at the API, because this would allow us to automate our analysis and keep our results up to date when new data is made available.
We downloaded the titles and abstracts for all projects held on the GtR website and then set to searching them for software-related terms. A script was used to find terms related to software use in the title and abstract of each bid, namely: software, software developer, software development, programming, program, computational, HPC, simulation, modeling, data visualisation. We are assuming that the existence of these terms in the title or the abstract of a grant shows that the project is likely to rely on software as a fundamental part of the research workflow.
We have attempted to select terms that are likely to be used only in relation to software, but some of the terms have more than one definition. An obvious example is program. Even though we are reviewing UK applications, it is possible that program may have been used instead of programme to indicate a planned series of tasks or events rather than lines of computer code. What's more, there may be other uses of terms like modeling or simulation that do not require software. Consequently, the next step is to assess how well the search terms are identifying software-reliant grants. It's not feasible to pick through all of the 13,000 grants that were identified, so we instead propose to review a randomly selected group of 1-5% of the identified grants and use the results of this review to temper the overall result.
The funding data available from GtR is simply the award amount and the start and end date of the project, which necessitated another script to separate that data into financial years and then relate total funding over a particular financial year to the funding invested into software-reliant projects. Here we had to assume that funding is spent evenly over the duration of the project. This is rarely the case, but given the infinite variations in how money can be spent, it is the only workable assumption.
How much is the UK spending on software-reliant projects?
First of all, we can look at the research investment into all projects (i.e. the total research investment made by the Research Councils). We've performed this analysis over the financial years from 2010-11 to 2013-14.
If you relate this data to the published accounts of the Research Councils, there are a number of discrepancies. As discussed above, this is because the Research Councils account for the total award of a project in the financial year in which the project is funded. By contrast, we are spreading the cost of the total award equally over the duration of the project.
Looking at these results, we can see that research funding has been fairly steady over the last few years, with most council's annual investment drifting by only a few percent. The TSB is a notable exception, showing a double-digit growth over the last two financial years. The AHRC has also grown its investment by an average of around 16% over the last four years.
Next we'll look at the investment into software-reliant projects.
Again, we see a fairly even investment into software-reliant projects. With the exception of the ESRC, STFC and TSB, every council has increased its spending on software-reliant projects over the last four years. Averaging over all research councils and all years shows a 3.3% increase in investment in software-reliant projects.
To gain a greater understanding of the relative investment in software, we need to look at the percentage spend of each council on software-reliant research.
From the above graph, we can see that the STFC, until recently, invested the most in relative terms into software, but it has now been overtaken by the EPSRC. In the last financial year, the EPSRC and STFC may have invested a similar proportion of their funds into software-reliant projects (about 50%) but, due to their significantly different budgets, the figures are rather different: STFC invested £62 million whereas the EPSRC invested £416 million.
The TSB is an interesting case. Of all of the funders analysed, the TSB is the only one to show a significant increase in total research investment, yet concurrent with this overall increase, its relative spend on software-reliant research has dropped significantly - by about 10% each year following the 2011-12 financial year. We have to conclude that the TSB's increased spending is not going towards software, but further investigation will be required to understand where the funding is being invested.
Our findings indicate that the Research Councils spend around 30% of their research budget on software-reliant projects. This is likely to be the lower bound on spending because, in our experience, it is rare for software to be discussed openly in the title or abstract of a bid. For this reason, we expect to have missed many software-reliant projects. Even on that basis, an annual expenditure of £840 million on software-reliant projects is a huge investment to be made in an area that in our opinion does not attract enough maintenance spending and is given little visible support within the research community.
Software is fundamental to research
If you believe that software is fundamental to research - and we believe that the current spending on software-reliant projects supports this fact - then please sign our petition.
This blog post is merely an early summary of the research we have been conducting. Further investigation is needed to determine the confidence level of our results. Later in the year, the final results will be released, along with the data (where appropriately licensed) and the software we developed.
The next blog in this series will investigate the job market in academia for software developers: how many jobs are available, how much are software developers being paid, and what job title are they likely to end up with.
Thanks to Mario Antonioletti for dealing with the download of data from Gateway to Research, Steve Crouch for putting together the search scripts and for the analysis of financial data, and Devasena Inupakutika for handling the graphing.