There’s an interesting Twitter chat going on, stemming from Titus Brown’s recent blog post asking how to find a postdoctoral appointment where a student can do open science.
What could a student ask a potential employer (and mentor) to help shed light on the culture of the lab?
The post brought to mind a discussion had in the summer of 2013 at a meeting convened at SESYNC in Annapolis, MD about what to teach biologists about computing. We were discussing how to best assess the skill set of the graduate students applying to join a lab, with a tilt towards looking for the best practices associated with open science (good data and software management skills, proclivity to post their work so as to be shared and communicated, etc).
In the breakout, Titus, myself, Nirav Merchant (University of Arizona), and Marian Petre (Open University), brainstormed an activity-based assessment so we could level set better. Here’s the rough sketch of what we came up with:
0. Here’s a data sample. What would you need to fix in order to make it so you and others could use it?
1. Name and organize 3 data files (i.e., .csv, .dat, .txt)
2. Run this program on one of these files?
3. How would you capture that process for someone else to use?
4a) Suppose you change the program. How do you convey that information?
4b) Suppose someone sends you a changed version of the file/program. how do you interact with it?
5. How would you know that your program is doing what you want it to do?
6. How would you make your files available to others?
7. What additional data would you want to include (re: Ethan White paper)?
Does this suit our needs? How do these tasks map to various disciplines? What are we missing?
Thoughts, comments and suggestions welcome. I’d love to hear your thoughts.
(Also, for more on that meeting – which feels like forever ago – read Titus’ summary post. Lots of good stuff in there.)
3 thoughts on “Assessing open practice in science – an idea”
I would also add these questions :
– How do you know your program is not breaking the whole system (memory profiling, code optimization) (we ask this question a lot when we run code on clusters, we don’t want to make those sysadmins angry 🙂 )
– What kind of plots you can generate to explain your data, how do you capture meaningful plots ?
– How do you share your results with non programmers
Some references from the excellent Plos Comp Bio’s collection : 10 simple rules :
– Ten Simple Rules for Better Figures : http://journals.plos.org/ploscollections/article?id=10.1371/journal.pcbi.1003833
– Ten Simple Rules for the Care and Feeding of Scientific Data http://journals.plos.org/ploscollections/article?id=10.1371/journal.pcbi.1003542
– Ten Simple Rules for Effective Computational Research http://journals.plos.org/ploscollections/article?id=10.1371/journal.pcbi.1003506
One important point is context-setting, i.e. describing the data itself and the problem it is attempting to address or solve. I think an issue with many scientists is they zoom in on detail and don’t provide the bigger picture that would help people understand what the data is, why it was collected, and so on. Open data is more likely to be re-used if people ‘get’ what it is. Understanding that need to tell a story is a useful attribute for any scientist. Communication is key.