Research Data Management Survey Results: Software

Patrick McCann
Monday 5 September 2016

As previously reported, over the summer we conducted a survey on research data management practices at the University and we’ve been analysing the responses.
Included in the survey were a few questions about software development activities. Firstly, we asked:

Do you write, develop or maintain programs, scripts or other code as part of your research?

This question was intentionally broad. Many of the people who engage in the kinds of activities we are interested in here don’t necessarily consider themselves to be programming or developing software.
299 people answered this question, with 124 answering ‘Yes’. 41% of the people who responded to the survey stated that they write, develop or maintain programs, scripts or other code.
It would be a bit of a leap to suggest that this is representative of researchers at the University in general – response rates varied across the schools and the way in which the survey was promoted may have skewed the results. Nevertheless it would seem that a significant number of researchers at the University are engaged in these kinds of activities.
52 of those 124 are academics, 45 are postgraduate students. Unsurprisingly, the Schools most represented among those writing code are Physics and Astronomy, Biology, Mathematics and Statistics, and Computer Science. However, code is being written for research in Schools right across the University, including Management, International Relations, Divinity, and Philosophical, Anthropological and Film Studies.

Survey respondents who "write, develop or maintain programs, scripts or other code as part of your research?" by role
Survey respondents who “write, develop or maintain programs, scripts or other code as part of your research?” by role – click to enlarge.

Survey respondents who "write, develop or maintain programs, scripts or other code as part of your research?" by School
Survey respondents who “write, develop or maintain programs, scripts or other code as part of your research?” by School – click to enlarge.

The next question asked those who had indicated that they write code to state which “programming languages” they used. Following on from the broad nature of the preceding question, a rather liberal definition of “programming language” was used, with the list of over 30 options drawn from technologies that members of the Research Computing Network had mentioned when responding to the initial request for expressions of interest. As one respondent noted, HTML isn’t strictly a programming language, but we are interested in researchers who work with such technologies. A number of further languages were mentioned in a free text area with this question.
Looking at which Schools use each language, we can see that any language used by more than 3 respondents is used across more than one School; any used by more than 7 is used across at least 3 Schools.
Numbers using each "Programming Language"
Numbers using each “Programming Language” – click to enlarge.

The popularity of MATLAB, Python and R is very much in line with a rough analysis published by Simon Hettrick of the Software Sustainability Institute which identified those three languages as the most popular among Research Software Engineers (RSEs). Each of them is used across at least 6 Schools at the University.
Looking at the same numbers the other way around, we can see that in any School with more than one developer there are a range of languages in use. This graph is quite busy but it illustrates the diversity of activity, even if the details regarding individual languages are difficult to discern.
Diversity of "programming languages" in use in each school
Diversity of “programming languages” in use in each school – click to enlarge.

We also asked of those who indicated that they write code:

Do you use a code repository or version control software (e.g. Git, Mercurial, Subversion) to manage your code?

Only 35 of the 124 said ‘Yes’. These results are particularly interesting when looked at on a per-School basis.

Comparison of number of developers per School with number of those who use a code repository or version control software to manage their code.
Comparison of number of developers per School with number of those who use a code repository or version control software to manage their code – click to enlarge.

Almost all of the developers from Computer Science manage their code using this type of technology. Elsewhere, the drop-off is considerable, with no more than 50% in any others managing their code in this way. Among those that do, public GitHub repositories are the most popular solution.
Solutions used by respondents to manage their code.
Solutions used by respondents to manage their code – click to enlarge.

These few questions in this survey have established that there is a considerable number of researchers right across the University engaged in software development activities using a wide range of technologies. However, it would seem that there is work to do in order to promote and support sustainable software development and hence reproducible research. More work is needed to better understand the requirements of developers at the University with regard to code repositories and a range of other issues, and then to see how best to meet those requirements. This survey only scratched the surface, but 82 of the 124 gave permission for us to follow up with them regarding these issues, and we intend to take them up on that before long.

Related topics