treating code as a first-class research object/citizen

This is the first is a series of posts over the coming months about treating code as a fundamental component – or a first-class citizen – of modern-day research. Research is becoming increasingly reliant on code and analysis, and we’ve come a ways in getting data recognized as a “research object”. But what about the software needed to regenerate analyses? How do we shift the conversation to also recognize the code used to conduct the experiment as a critical piece of research?

The Software Sustainability Institute in the UK has some excellent posts outlining the ideal world many of us are striving for in open science. Here’s an excerpt from their longer post on this entitled “Publish or be damned” that I found especially interesting. (Do give the full post a read when you have a chance, too. It outlines a number of the key issues we need to be cognizant of.)

And, as always, I welcome your thoughts (keep them constructive, please. 🙂 )


The Research Software Impact Manifesto

As those involved in the use and development of software used in research, we believe that:

  1. Open science is a fundamental requirement for the overall improvement and acheivement of scientific research.
  2. Open science is built on the tenets of reuse, repurposing, reproducibility and reward.
  3. Software has become the third pillar of research, supporting theory and experiment.
  4. Current mechanisms for measuring impact do not allow the impact of software to be properly tracked in the research community.
  5. We must establish a framework for understanding the impact of software that both recognises and rewards software producers, software users and software contributors; and encourages the sharing and reuse of software to achieve maximum research impact.

To enable this, we subscribe to the following principles:

  • Communality: software is considered as the collective creation of all who have contributed
  • Openness: the ability of others to reuse, extend and repurpose our software should be rewarded
  • One of Many: we recognise that software is an intrinsic part of research, and should not be divorced from other research outputs
  • Pride: we shouldn’t be embarassed by publishing code which is imperfect, nor should other people embarass us
  • Explanation: we will provide sufficient associated data and metadata to allow the significant characteristics of the software to be defined
  • Recognition: if we use a piece of software for our research we will acknowledge its use and let its authors know
  • Availability: when a version of software is “released” we commit to making it available for an extended length of time
  • Tools: the methods of identification and description of software objects must lend themselves to the simple use of multiple tools for tracking impact
  • Equality: credit is due to both the producer and consumer in equal measure, and due to all who have contributed, whether they are academics or not

This does not rescind the values of the current credit system, but reinforces them by acknowledging that there are many forms of output that can lead to indicator events.


3 thoughts on “treating code as a first-class research object/citizen

  1. I was wondering if you have any ideas on how to support this view over software for open science and how could one construct “methods of identification and description of software objects” in order to be integrated with tools for tracking impact?

    Is this a matter of making so that software is directly citable or do you think further measures need to be taken?

    In effect what is happening right now is that some software is already being recognized as a research object. This is happening indirectly though. For instance some pieces of software typically have a paper associated with them and this is in effect what is getting cited, being a fully fledged Desktop application or a software library, etc.

    The problem still remain though. It is very difficult to distinguish between a paper describing a piece of software and other types of publications. It is not impossible, I would say that this would be doable (and interesting) by using machine learning techniques to classifying publications into different categories. It is unfortunately an ugly and infeasible solution, as many “pay-walls” do not allow you to process the contents of publications they index.

    I also have to recognise that you’re right about the software needed to do analyses. Sometimes people cite it, sometimes they don’t.

    Overall this is essentially the same problem we had with data which is now citable (hell yeah). It would be awesome to have some infrastructure supporting the indexing of both software and the data and literature associated with it.

    1. The start of my blog post ( which Kaitlin references considers some of the challenges in making software directly citable, by comparing it with dataset citation.

      I think your question about tool integration is a great one. A big challenge is balancing working with existing tools, and creating a newer, better way of doing things. An obvious start would be to be able to associate DOIs with specific versions of software which were used to generate the results in a paper. This is perfectly doable just now, by depositing an archived (e.g. ZIP) copy of the software into an institutional repository or a repository like FigShare which provide DOIs for you.

      However this isn’t brilliant, as it means you’re taking additional copies each time. It actually does deal with new versions relatively well, as you can provide successor/predecessor information in your repository. What would be the next step is to associate a DOI with a specific unique identifier in a code repository, e.g. a release in GitHub (which already provides permanent identifiers). The final stage is understanding what is the useful citable granularity of a piece of software. Given that we haven’t solved this regular papers, this might take longer!

      My colleague Mike Jackson has also written a great guide to describing and citing scientific software under current constraints and tooling:

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s