treating code as a first-class research object/citizen

This is the first is a series of posts over the coming months about treating code as a fundamental component – or a first-class citizen – of modern-day research. Research is becoming increasingly reliant on code and analysis, and we’ve come a ways in getting data recognized as a “research object”. But what about the software needed to regenerate analyses? How do we shift the conversation to also recognize the code used to conduct the experiment as a critical piece of research?

The Software Sustainability Institute in the UK has some excellent posts outlining the ideal world many of us are striving for in open science. Here’s an excerpt from their longer post on this entitled “Publish or be damned” that I found especially interesting. (Do give the full post a read when you have a chance, too. It outlines a number of the key issues we need to be cognizant of.)

And, as always, I welcome your thoughts (keep them constructive, please. 🙂 )

***

The Research Software Impact Manifesto

As those involved in the use and development of software used in research, we believe that:

Open science is a fundamental requirement for the overall improvement and acheivement of scientific research.
Open science is built on the tenets of reuse, repurposing, reproducibility and reward.
Software has become the third pillar of research, supporting theory and experiment.
Current mechanisms for measuring impact do not allow the impact of software to be properly tracked in the research community.
We must establish a framework for understanding the impact of software that both recognises and rewards software producers, software users and software contributors; and encourages the sharing and reuse of software to achieve maximum research impact.

To enable this, we subscribe to the following principles:

Communality: software is considered as the collective creation of all who have contributed
Openness: the ability of others to reuse, extend and repurpose our software should be rewarded
One of Many: we recognise that software is an intrinsic part of research, and should not be divorced from other research outputs
Pride: we shouldn’t be embarassed by publishing code which is imperfect, nor should other people embarass us
Explanation: we will provide sufficient associated data and metadata to allow the significant characteristics of the software to be defined
Recognition: if we use a piece of software for our research we will acknowledge its use and let its authors know
Availability: when a version of software is “released” we commit to making it available for an extended length of time
Tools: the methods of identification and description of software objects must lend themselves to the simple use of multiple tools for tracking impact
Equality: credit is due to both the producer and consumer in equal measure, and due to all who have contributed, whether they are academics or not

This does not rescind the values of the current credit system, but reinforces them by acknowledging that there are many forms of output that can lead to indicator events.

[On the move] What to teach biologists about computing?

This week, Greg and I are in Annapolis for a workshop convened by Software Carpentry comrade and Michigan State professor Titus Brown on what to teach biologists about computing and data. You can read the full background of the meeting here in the proposal, or a distilled version here. One of the main focuses will be how to extend Software Carpentry, but also looking at assessment, motivation, other comparable projects (and if/how to join forces).

The hashtag for the meeting is #w2tbac, and Greg (@gvwilson) and I (@kaythaney) (and others) will be live tweeting. We’ll be posting our notes following the meeting. Stay tuned for more. And feel free to add your thoughts here in the post.

Eating our own dogfood – the WiSE bootcamp

This past Monday and Tuesday, a team of stellar women hosted the first ever Software Carpentry bootcamp specifically for women in science and engineering. Software Carpentry is part of the Mozilla Science Lab, led by Greg Wilson and a vast network of volunteer instructors and helpers, who teach 2-day skills bootcamps around the world to researchers. Its volunteers have run over 90 intensive two-day boot camps at dozens of sites around the world in the last 18 months for over 2,500 scientists, and the site provides open access material online for self-paced instruction.

This past event, known as “WiSE” for short, was the largest bootcamp to date, with 120 women participating, with a team of 7 instructors and 19 (19!) helpers there to assist with live troubleshooting in each of the three skill level rooms.

In an effort to truly eat our own dogfood at the Science Lab, and to show my support for a cause I believe in, I decided to participate in this bootcamp as a student. I’ve worked in science and technology for the last eight years, managing semantic web projects, software development, information architecture, but don’t identify myself as a day-to-day programmer. What better way to learn than by actually going through the bootcamp myself, eh?

A full room for the WiSE bootcamp at Microsoft, June 24-25, 2013 . Photo by @OpenHelix

The two days were loosely structured to cover the same broad concepts but at varying levels of expertise in each room – on day 1, exploring how to use the shell and version control through Git; day 2, delving into Python (taught through the IPython Notebook in our room) and showing how to query databases using SQLite (via the Firefox plug-in). You can read more about the instruction here, as well as search through the lesson material on their Github repository (look under “boot-camps”).

Outside of the instruction itself, it was incredible to hear some of these women’s stories as to why they came to such a workshop. Many of the instructors were self-taught themselves, learning Python and other computing skills out of necessity to move forward with their research, having not been introduced to these concepts and frameworks in traditional educational settings (at least for the sciences). That gap was echoed by the participants, articulated by one of the women who recently started a postdoc position as “My advisor said I needed to know Python, and told me to find a way to learn it.”

She flew in from Colorado specifically to get those skills to bring back to the lab so she could be competitive in her research and move forward. Her alternative? Sit with a book and try to teach herself. While ambitious, that shouldn’t be the only option.

And there were countless others like her, looking for some way to close the gap in understanding so as to have the skills needed to apply to modern day research. The room was full of women from all walks of research, from environmental scientists and microbiologists to those working with civic data following Hurricane Sandy and others doing financial modelling. Some had experience using the shell, but were encountering Python for the first time. Others had done some database work, but were new to version control. Hats off to the instructors and helpers for navigating through a diverse set of skill levels to troubleshoot and help over the course of 2 days. They made it look effortless.

I continue to be struck by the demand for this sort of technical understanding in order to be competitive and successful (subjectively speaking) in research, yet puzzled as to the lack of formal instruction at the undergraduate or even graduate level. To me, these are the core competencies needed for 21st century research – transferable skills that result in net positives, regardless of whether or not someone continues along a STEM path. Laying the groundwork for others to easily build upon these skills and extend their understanding is, to me, a no-brainer. And with the increasing call for more open, reproducible science, this sort of understanding can help us shift practice, rather than entrench the next generation of Nobel Laureates, researchers and educators in dated practices, because we don’t have time/money/bandwidth/excuses to evolve.

All in all the event left me inspired and hopeful, not only as a women in science, but as someone working closely with the Software Carpentry team. But that doesn’t mean that the problem is solved. The team continues to tailor and hone the curriculum, responding to feedback from pre- and post- assessment surveys as well as comments from instructors on what worked and what didn’t. We at the Science Lab are also continually looking at how we can better extend the reach of these events, as well as think of other “core competencies” that we’d like to introduce, even beyond that offered through Software Carpentry.

I’d love to hear your thoughts, on the bootcamp itself or, more broadly, on other skills you think are indispensable for modern-day researchers, but that may be unaddressed by the current educational system. Feel free to leave them in the comments here, or contact us via Twitter. We’ll be setting up a mailing list in the coming weeks and starting community calls (completely open to the public), which I’ll post here, and encourage you all to join.

Also, for more on the WiSE bootcamp, here are a few posts from instructors and participants, as well as a post on how you can stay involved. And a massive hat-tip again to the 25 (female!) volunteers who came and helped teach 120 students this week. Bravo.

[On the move] Education reform, the White House, and learning Git

The last week has been chock full of interesting meetings and travels, starting in Washington, DC discussing how to transform STEM education and now wrapping up at a Software Carpentry bootcamp in Cambridge, MA for women in science and engineering (more on that later).

The first stop was the AAAS in DC for a small workshop on transforming STEM education at the university level. The event brought together funders, institutional heads, and practitioners, working to push systemic change in the university. There was a strong focus on retention rates (there’s a steep dropoff in STEM undergrads after year 2), diversity and curriculum design. More on that to come in another post.

Then, on to the White House for the Champions of Change event on Open Science, where 13 stellar members of the community were recognised for their contributions. The Mozilla Science Lab was honored to be a part of the day’s celebrations, co-hosting the Open Science poster session and reception in the Indian Treaty Room with the Office of Science and Technology Policy. Many have written about the day’s event, but perhaps my favorite post is by my former colleague John Wilbanks, about making “open” stand for something. We’re thrilled to have been able to support the event (and getting to throw get togethers in the White House doesn’t hurt for a Thursday either. 😉 )

Then, on to Boston. To round off the trip, I participated in a Software Carpentry (part of the skills portion we’re building out at the Science Lab) workshop for women in science and engineering, held this past Monday and Tuesday. The two-day bootcamp provided training on using shell, Git, iPython (lesson taught in the iPython notebook) and SQLite. Over 120 women participated in the event, one of their largest to date. Stay tuned for a separate post. Needless to say, I head out today with a brain well-exercised and buzzing with ideas from the last 10 days.

announcing the mozilla science lab

I’m thrilled to announce that I’ve joined Mozilla to build and direct their new open science initiative – the Mozilla Science Lab. The project is supported by the Alfred P. Sloan Foundation.

I’m excited to be returning to my open roots, as well as continuing to push the boundaries of what “digital research” can and should look like, and further explore how we can make the web work for science.

Why Mozilla?

Openness, empowerment and disruption are baked into Mozilla’s DNA. Their belief in the power of the open web and drive to explore new ways the technology can transform is inspiring. They truly believe that we all should be able to innovate in the digital world, regardless of your level of technical proficiency – that we should be able to be more than passive consumers. This is incredibly important for science, especially as we grapple with a daunting skills gap at the university level that is, in many cases, disincentivising researchers to participate, to innovate, or even in some cases, continue to do science.

Mozilla cares deeply about “digital literacy”, and it’s time we explore what that means for science, especially given discussion about the “skills gap” in funding circles and at the policy level. I started to unpack this a bit back in January in a piece on Radar – teasing out some of the core competencies I think we’re neglecting in basic science education. We’ll be discussing that more here on the blog in the coming weeks, as well.

The first member of my team is Greg Wilson, founder of Software Carpentry, a program that teaches basic computational literacy to researchers to help them be more productive. I’ve long admired Greg’s work in this space, in providing an entry point for students to learn things like version control, data management, basic scripting. In the last year alone, they’ve run over 70 events for more than 2,200 attendees – all led by volunteers – and are on track to double both numbers in the coming twelve months. More importantly, Software Carpentry is our first step in exploring what “digital literacy” ought to be for researchers and what they need to know to actually do it.

We also want to find ways of supporting and innovating with the research community – building bridges between projects, running experiments of our own, and building community. We have an initial idea of where to start, but want to start an open dialogue to figure out together how to best do that, and where we can be of most value.

I’ll be writing more here on the blog in the coming months as we ramp up development of the program (hint: we have some cool stuff planned. 😉 ). Stay tuned for more in the coming weeks about how you can get involved. You can also check out our wireframe here at wiki.mozilla.org/ScienceLab or follow us @MozillaScience.

open science manuscripts and “web-native” scholarship

Lately I’ve been thinking a lot on the notion of “web-native” scholarship and its various permutations. The idea harkens back to some of the messaging and work we were doing in 2006 and 2007 at Science Commons, exploring how to take the principles of working in a networked, collaborative, and open environment – like the Web enables – and applying them to science.

Jason Priem of Impact Story has recently been using the term “web-native”, as well, in his presentations on alternative ways of measuring “impact” and reputation in the digital age (“altmetrics”, for those new to the idea). (For those of you who haven’t checked out his latest slide decks on the topic, you’re in for a treat.)

But for this post, I wanted to point to something recently posted by Carl Boettinger, on manuscript reviews and GitHub. It’s one of many recent posts on further pushing the boundaries of review and scholarly publishing by further breaking apart and repackaging the components to arguably present something more useful, accessible and reproducible.

Carl has been thinking for a while on how to best package an open science manuscript for maximum reuse – doing some really cool things to help make his work accessible and reproducible that transcends current traditional publishing submission systems.

Now, not everything Carl has detailed here may resonate with those who are unfamiliar with GitHub (heck, I’m still learning), but I think he touches on some neat hacks on the system that I’d personally like to see taken further.

From his post, he suggests submitting a cover letter of sorts along with his manuscript for review where he can document some of the open and reproducible features, to point them out to reviewers. Among them (emphasis mine):

The manuscript has been written on Github. Consequently the full drafting and revision history is available, along with graphs of author contributions (which omit authors without Github accounts and may be distorted by trivial line changes)
The manuscript has been written with all the code necessary to repoduce the results embedded as a knitr dynamic document. This helps ensure the analysis is always in synch with the results presented in the manuscript and the that the research is reproducible. The analysis, figures, and manuscript can be reassembled from scratch by typing make pdf in the repository directory.
Code to replicate the analysis and produce each of the figures shown can be found at: (Version-stable lnk to the appropriate Github pages? Deposit in Figshare/Dryad first?)
Data to replicate the analysis and data shown in each of the figures can be found at: (Easiest to link to Github, since the code and data already reside there. Alternatively I could deposit these in Figshare or Dryad first…)
The manuscript, code, data, and documentation are available as an R package in the Github repository.
The issues tracker associated with the manuscript’s repository provides a record of this research, including lines of investigation that were resolved into the results presented here, lines that were closed as dead-ends or null results, and outstanding issues for further investigation.
The daily lab notebook entries accompanying this research can be found under the project-tag between dates of XX and XX.

Something about being able to type in “make pdf” and have your results be reassembled gets me really excited. There have been a number of others who are working towards that goal, including the folks at Run My Code. Now, this is a bit beyond what we can expect from researchers on the whole, in terms of taking the time to document each step in a digital lab notebook, make all of their data available in something like figshare or Dryad and ensure it has the proper metadata to be of use to others, have the code in a state where you can just hit a button and replicate the analysis. And well, there’s having an “issues tracker” associated with the manuscript’s repository (which I still think would be a brilliant idea for review systems to utilize).

I think the greatest challenge for Carl in getting reviewers to jump onboard for this is going to be making this approachable for lay reviewers who are time-strapped, for those who may have never pulled an item from a repo or navigated an issues tracker. But these sorts of proof of concepts show what proper open science publication could look like, which is really exciting.

2000, the year formerly known as “the future”

How quickly we forget how far we’ve come. A fun look back at how we interacted with technology in the year 2000, before smartphones, (seemingly) ubiquitous connectivity, and digital footprints.

On being grownups about childish behavior

Quite possibly the best summation of the polarisation (and in many cases, absolute idiocy / insidiousness) of the Adria Richards / PyCon commentary. Well said, Tom.

a much welcomed twist to the Bechdel test

In the lead up to International Women’s Day this Friday, I’ll be speaking to 6th form (16/17 year olds) and university students tonight at Nokia, sharing my tales of being a woman in science / tech. It’s part of the Remarkable Women programme I’m a part of, celebrating women in business, government, science and technology in the UK. (Fun sidepoint, I’m the “open science technologist” in their compilation. Quite like that.)

Now, having long been one of the only folks in the room without a Y chromosome, I know firsthand some of the “hardships” in being a woman in science, business and technology. I’ve got the stories of rampant misogyny, outrageous pickup lines, and foibles just like the rest. But I also know what an incredibly slippery slope it is to start dissecting the “why”, or to make broad, sweeping generalisations about gender, sector differences, etc.

It’s complicated. For example, I, personally, had male mentors – really tough, frighteningly intelligent ones, as a matter of fact. No one had to put me in touch with a female role model to help me understand where I fit in this space or console me when I got thrown the token think-you’re-the-assistant-heckler – I sought that guidance through other means. And that worked for me. But unfortunately, some of the dialogue around gender bias and even themes for these events seems to skip that point (or, to one extreme, act is if I’ve been slighted by having that experience). Not all of us fit that mold – and that’s also something to celebrate.

This post on a twist to the Bechdel test echoes many of the points I raise that are sadly, oft neglected from “Women in X” meetings.

There’s also this brilliant “Dear future women in technology …” post by @bitchwhocodes which really resonates.

I’m going to see if I can adhere to the points raised in the first linked post for tonight’s event, lead by example and all that. I wonder if they’ll notice.

For more on International Women’s Day (technically Friday, March 8th), visit their website.

From quantum mechanics to NFL predictions

Fun collection of iPython notebooks and resources, including ones on laser models, analysis of the financial crisis, look at the mining disaster and more. There are even a few broader resources (ie., “Git for Scientists: A Tutorial“) sprinkled in.

kaitlin thaney

Author: kaythaney