open science manuscripts and “web-native” scholarship

Lately I’ve been thinking a lot on the notion of “web-native” scholarship and its various permutations. The idea harkens back to some of the messaging and work we were doing in 2006 and 2007 at Science Commons, exploring how to take the principles of working in a networked, collaborative, and open environment – like the Web enables – and applying them to science.

Jason Priem of Impact Story has recently been using the term “web-native”, as well, in his presentations on alternative ways of measuring “impact” and reputation in the digital age (“altmetrics”, for those new to the idea). (For those of you who haven’t checked out his latest slide decks on the topic, you’re in for a treat.)

But for this post, I wanted to point to something recently posted by Carl Boettinger, on manuscript reviews and GitHub. It’s one of many recent posts on further pushing the boundaries of review and scholarly publishing by further breaking apart and repackaging the components to arguably present something more useful, accessible and reproducible.

Carl has been thinking for a while on how to best package an open science manuscript for maximum reuse – doing some really cool things to help make his work accessible and reproducible that transcends current traditional publishing submission systems.

Now, not everything Carl has detailed here may resonate with those who are unfamiliar with GitHub (heck, I’m still learning), but I think he touches on some neat hacks on the system that I’d personally like to see taken further.

From his post, he suggests submitting a cover letter of sorts along with his manuscript for review where he can document some of the open and reproducible features, to point them out to reviewers. Among them (emphasis mine):

  1. The manuscript has been written on Github. Consequently the full drafting and revision history is available, along with graphs of author contributions (which omit authors without Github accounts and may be distorted by trivial line changes)
  2. The manuscript has been written with all the code necessary to repoduce the results embedded as a knitr dynamic document. This helps ensure the analysis is always in synch with the results presented in the manuscript and the that the research is reproducible. The analysis, figures, and manuscript can be reassembled from scratch by typing make pdf in the repository directory.
  3. Code to replicate the analysis and produce each of the figures shown can be found at: (Version-stable lnk to the appropriate Github pages? Deposit in Figshare/Dryad first?)
  4. Data to replicate the analysis and data shown in each of the figures can be found at: (Easiest to link to Github, since the code and data already reside there.
Alternatively I could deposit these in Figshare or Dryad first…)
  5. The manuscript, code, data, and documentation are available as an R package in the Github repository.
  6. The issues tracker associated with the manuscript’s repository provides a record of this research, including lines of investigation that were resolved into the results presented here, lines that were closed as dead-ends or null results, and outstanding issues for further investigation.
  7. The daily lab notebook entries accompanying this research can be found under the project-tag between dates of XX and XX.

Something about being able to type in “make pdf” and have your results be reassembled gets me really excited. There have been a number of others who are working towards that goal, including the folks at Run My Code. Now, this is a bit beyond what we can expect from researchers on the whole, in terms of taking the time to document each step in a digital lab notebook, make all of their data available in something like figshare or Dryad and ensure it has the proper metadata to be of use to others, have the code in a state where you can just hit a button and replicate the analysis. And well, there’s having an “issues tracker” associated with the manuscript’s repository (which I still think would be a brilliant idea for review systems to utilize).

I think the greatest challenge for Carl in getting reviewers to jump onboard for this is going to be making this approachable for lay reviewers who are time-strapped, for those who may have never pulled an item from a repo or navigated an issues tracker. But these sorts of proof of concepts show what proper open science publication could look like, which is really exciting.