#lovedataweek: Data Needs Software

Patrick McCann
Friday 15 February 2019

As part of Love Data Week 2019, we thought it would be worth taking a brief moment to think about software.
Software is not data, but software is essential for working with data – whether that’s creating it, collecting it, processing it or analysing it. In many cases, the software used for these purposes will be off-the-shelf – but we know that many researchers write code themselves.
Sharing research data, particularly that underpinning research outputs, is a great way to make research verifiable and reproducible, not to mention driving up citations and encouraging collaboration. However, it is possible to further enhance each of these benefits by making the software you write to work with that data available too.
We’re working on formalising processes and guidance around this, but we can give some pointers in the meantime.
Many researchers developing software do so in public GitHub repositories, which gives a degree of visibility straight away (You don’t have to use GitHub, but you should be using a version control system for your software – if you need help with that, get in touch via [email protected] or keep an eye out for a Software Carpentry workshop on CAPOD’s PDMS). “Public” does not equate to “open”, however – you need to include a licence for that, preferably in the form of a LICENCE file in the root of your Git repository. Find out more in the Software Sustainability Institute (SSI) guide to How To Choose a Software Licence by Mike Jackson, or take a look at choosealicense.com to find out more about the most common options.
You’ve applied a suitable license – your software is now open! This is wonderful, but it’s possible to go further. GitHub (and similar platforms) are great tools for collaboration, but they are not archives – why not archive your software in Zenodo and get a DOI in a few simple steps? You can then create a record in Pure pointing to it, making explicit the relationships with your papers and datasets.
Actually, just wait a moment. It’s really easy to archive your software in Zenodo, but it’s worth thinking about whether you’ve supplied enough information about your software. As detailed in the SSI guide on What to Deposit, again by Mike Jackson, you should have at a minimum:

  • A LICENCE file (no problem, we sorted that out a couple of paragraphs ago)
  • A README file – you’ve probably got one of these already, but make sure it includes:
    • A name
    • A description
    • A contact
    • A link to the project website / source code repository
  • A CONTRIBUTORS file detailing everyone who has contributed to the software

You are encouraged to go further by providing more metadata, documentation, test code and data… but the minimum is those three short text files.
Once you’ve got those, go ahead and archive the repository in Zenodo and create the record in Pure. The Research Computing and Research Data Management teams are here to help if needed.
Mike and the SSI have also supplied guidance on What Not to Deposit, and How to Describe a Software Deposit – and there’s much more concerning research software on their website.

Related topics

Share this story