[Experiment] Exploring code review for science

Over the years, science has evolved many quality control mechanisms, the best known of which is probably peer review. This first appeared nearly 300 years ago, and was initially put into place to allow trusted peers to scrutinize new findings, ideas, and societal implications as part of the publication process.

Here at the Mozilla Science Lab, we’re interested in finding ways of pushing the limits to what we think of as “science on the web” and instilling better practices for open, reproducible research. We aim to do this through community building, skills training (like Software Carpentry), and building prototypes and running pilots with other groups and organizations.

Our latest experiment is exploring a model for code review in science. As research becomes increasingly dependent on computation, we want to see whether the code review practices that are now commonplace in open source software development will also work in research.

Why code review?

The code used to produce the results in a paper is not usually reviewed when a paper is published, beyond a basic “sense-check” (or, as a colleague put it to me, “plausibility versus defensibility, not reproducibility”).  As code follows the trajectory of data in being integrated and recognized as a first-class research object, we want to work with the scientific community to figure out how scientists can and should check each other’s computational methods without increasing the time and cost of reviews unsustainably.

How this will work

The staff at PLOS Computational Biology helped us identify a set of already-published papers that include snippets of code in MATLAB, Python, R, and C++ that may be suitable for code review. We’ve put the code in front of a group of Mozilla engineers who regularly do code reviews for Firefox, mobile and for other programs.

These engineers aren’t scientists (though some may be interested in the underlying research).  Instead, they are skilled professionals who don’t have the domain expertise of the scientists, who in turn are not full-time software developers.

Over the course of the next month, each Mozilla engineer will review code from one or two papers. Their reviews, and their insights into the review process, will be captured in ReviewBoard, so that we can have a core collation point for the findings. Reviewers will have the ability to flip through the rest of the paper should they so choose, but this is really about the code snippet itself.

Once they’ve finished reviewing, we’ll analyze the findings, look for any patterns or interesting outliers, as well as examine how the actual commentary on the code itself was done, with the aim of using that as a model for code review in the future.

What we’re testing

This experiment is a means to explore the following:

– What does code review from a software engineer outside of academia look like? How do they approach the task?

– To what extent is domain knowledge needed to do a successful code review? Is the code parseable by someone outside of that discipline?

– What lessons can be learned about code review, possibly to influence and enhance traditional peer review?

– Does this process surface issues around best practice in writing software and code? If so, what are those issues?

– Following the review, how useful is the markup to the author? Does this feedback help them in their work? How can we change those norms?

We’ll be writing more about this experiment as the reviews continue, and hope to have our analysis completed next month. It’s just a first step, but we believe that this could help change the way researchers, administrators, publishers and funders think of code in scholarly research, and possibly unveil some deep-seated issues surrounding how code is recognized, maintained and shared in the scientific community.

 —

Have a question or would like to join in on the conversation? Drop us a line at [email protected] or add your comments here. We’d love to hear your thoughts. Also, have an idea you’d like explored. Let us know.

And special thanks to our Mozilla volunteers, Marian Petre from the Open University, Mike Hoye from Mozilla and Greg Wilson from Software Carpentry.

18 thoughts on “[Experiment] Exploring code review for science

  1. Very interesting.

    In the field I used to be in (astronomy/hydrodynamics) a lot of the codes where open in one way or another so usually if a paper came out pointing to an interesting code, people would run it on the standard test problems. It helps that hydrodynamics has a few simple standard problems, that have analytic solutions, but this still helped people have an idea if the codes were good without formally reviewing them. Not to mention that most software developers that I’ve met would not be too into reviewing code written in FORTRAN.

    I definitely look forward to seeing where this project goes.

  2. Wow, this is great! You may see some commonalities in where the reviewers suggest improvements, which would could then become the focus areas for training of scientists. Is that something you intend to look for?

    1. Sure is. Also curious what else they surface, as my assumption is that most code in science isn’t written for production. Curious to see if that has any effect here.

  3. I also replied to Rosemary on the PLOS Biologue and will copy it here.

    I direct an NSF research coordination network, the Network for Computational Modeling in Social and Ecological Sciences (CoMSES Net). One of our initiatives is to develop a model code library and to establish procedures and best practices for peer evaluation of this code. You might want to take a look at this (http://www.openabm.org) and let me know if you’d like to coordinate on this.

    Michael Barton

  4. An interesting experiment, I am looking forward to seeing the results.

    It might be interesting to do another round of code review, but by trained software developers working specifically on scientific software (yes, I know they are not that numerous). Every domain has its coding conventions and domain-specific idioms, so it would be interesting to see if developers from different backgrounds come to different conclusions.

    1. Thanks, Konrad! We’ve thought about that, about putting the code snips in front of some of our Software Carpentry instructors (computationally savvy and most practicing scientists). Stay tuned!

  5. Thanks to all for conducting this experiment. We are always interested in new ways of leveraging the open content of PLOS Computational Biology to further scholarship. Keep these ideas coming.
    Best,
    Phil Bourne
    Founding Editor in Chief

  6. Great to see Mozilla undertaking this. I’m particularly interested to see if this can be done objectively, or whether it requires subjective judgement based on the field the paper is from.

    You might be interested in these two related blog posts from Carl Boettiger on reviewing scientific code and the discussion in the comments.

    http://carlboettiger.info/2013/06/13/what-I-look-for-in-software-papers.html#comment-930028211
    http://carlboettiger.info/2013/07/09/reviewing-software-revisited.html

Leave a reply to Phil Bourne Cancel reply