Intro to Open Data on P2PU

The second of P2PU’s Open Science course videos is now online, this one focusing on “Open Data”. I was delighted to be able to join this module, alongside course-master Billy Meinke, Heather Piwowar from Impact Story, Ross Mounce from OKFN and Wouter van de Bos from Max Planck. You can check out more of the video modules here: https://p2pu.org/en/courses/5/content/1370/.

Have a watch, ping us at @MozillaScience with any questions (or leave them in the comments), and if you’re feeling *really* creative, try remixing / splicing the video using some of the Webmaker tools:

https://webmaker.org/en-US/tools/#popcorn-maker

And let us know what you come up with!

[Experiment] Exploring code review for science

Over the years, science has evolved many quality control mechanisms, the best known of which is probably peer review. This first appeared nearly 300 years ago, and was initially put into place to allow trusted peers to scrutinize new findings, ideas, and societal implications as part of the publication process.

Here at the Mozilla Science Lab, we’re interested in finding ways of pushing the limits to what we think of as “science on the web” and instilling better practices for open, reproducible research. We aim to do this through community building, skills training (like Software Carpentry), and building prototypes and running pilots with other groups and organizations.

Our latest experiment is exploring a model for code review in science. As research becomes increasingly dependent on computation, we want to see whether the code review practices that are now commonplace in open source software development will also work in research.

Why code review?

The code used to produce the results in a paper is not usually reviewed when a paper is published, beyond a basic “sense-check” (or, as a colleague put it to me, “plausibility versus defensibility, not reproducibility”).  As code follows the trajectory of data in being integrated and recognized as a first-class research object, we want to work with the scientific community to figure out how scientists can and should check each other’s computational methods without increasing the time and cost of reviews unsustainably.

How this will work

The staff at PLOS Computational Biology helped us identify a set of already-published papers that include snippets of code in MATLAB, Python, R, and C++ that may be suitable for code review. We’ve put the code in front of a group of Mozilla engineers who regularly do code reviews for Firefox, mobile and for other programs.

These engineers aren’t scientists (though some may be interested in the underlying research).  Instead, they are skilled professionals who don’t have the domain expertise of the scientists, who in turn are not full-time software developers.

Over the course of the next month, each Mozilla engineer will review code from one or two papers. Their reviews, and their insights into the review process, will be captured in ReviewBoard, so that we can have a core collation point for the findings. Reviewers will have the ability to flip through the rest of the paper should they so choose, but this is really about the code snippet itself.

Once they’ve finished reviewing, we’ll analyze the findings, look for any patterns or interesting outliers, as well as examine how the actual commentary on the code itself was done, with the aim of using that as a model for code review in the future.

What we’re testing

This experiment is a means to explore the following:

– What does code review from a software engineer outside of academia look like? How do they approach the task?

– To what extent is domain knowledge needed to do a successful code review? Is the code parseable by someone outside of that discipline?

– What lessons can be learned about code review, possibly to influence and enhance traditional peer review?

– Does this process surface issues around best practice in writing software and code? If so, what are those issues?

– Following the review, how useful is the markup to the author? Does this feedback help them in their work? How can we change those norms?

We’ll be writing more about this experiment as the reviews continue, and hope to have our analysis completed next month. It’s just a first step, but we believe that this could help change the way researchers, administrators, publishers and funders think of code in scholarly research, and possibly unveil some deep-seated issues surrounding how code is recognized, maintained and shared in the scientific community.

 —

Have a question or would like to join in on the conversation? Drop us a line at sciencelab@mozillafoundation.org or add your comments here. We’d love to hear your thoughts. Also, have an idea you’d like explored. Let us know.

And special thanks to our Mozilla volunteers, Marian Petre from the Open University, Mike Hoye from Mozilla and Greg Wilson from Software Carpentry.

Intro to Open Science on P2PU: Watch, engage, remix.

The first of P2PU’s Open Science course videos is now online. (I’ll be joining to speak on Open Data in a few weeks.)

Have a watch, ping us at @MozillaScience with any questions (or leave them in the comments), and if you’re feeling *really* creative, try remixing / splicing the video using some of the Webmaker tools:
https://webmaker.org/en-US/tools/#popcorn-maker

And let us know what you come up with!

Session call open for this year’s #MozFest

The Mozilla Festival (or “MozFest” as we affectionately call it) is coming to London October 25-27, and planning is in full swing. This year we’ll have an entire track on “Science and the Web” and we want you to help us shape the program.

MozFest is where many of Mozilla’s best and most innovative ideas come from, bringing together over 1,000 inventors, hackers, creatives (and now those from the research community) to share ideas and explore how we can forge the future of the web together.

This  year’s focus is on the mentors, catalysts and change agents who care  about the using the web to make a difference in their respective fields  and communities. Individual sessions are organized under the following  themes:
  • Build Webmaker Together: The web is wild. Co-design its future with hackable projects, new memes and creativity remixed with digital tools.
  • Connect Your City: Champion digital making and bring together local communities.
  • Look Who’s Watching: Learn how to control who gets your data. Help others protect their privacy and develop long-term solutions to tracking.
  • Make the Web Physical: Take sensors, actuators and more to meld the virtual with the actual and make the web work your way.
  • Open Games: Ready, Set, Go! Join a community of game makers who use the web as a platform to build open games.
  • Science and the Web: Transform how we use the web–a scientist’s invention–to explore, experiment and build on each other’s research.
  • Skills and Badges:  Challenge  conventional skillsharing. Recognize and verify learning in  new ways to increase opportunities and make the most of the web.
  • Source Code for Journalism: Hack the news: learn, teach, and make journalism that’s native to the open web.
  • Teach the Web: Let’s teach the world the web. Discover how to inspire learners and spread digital literacy with hands-on making.
  • Webmaking for Mobile: Become a maker in the booming world of the mobile web.

For the “Science and the Web” track, we’re looking for sessions that show how the Web can (/and is) transforming science, and help introduce the MozFest crowd to some of the folks and organizations leading the charge. This isn’t your average conference, where you present to a group. Preference is given to hands-on and collaborative sessions (in terms of groups involved as well as the session content itself). Or, as we say “Less yack, more hack.” 🙂

A few areas to explore as a starter, in case you’re looking for ideas:

  • Hacking digital scholarship: How can we use the web to push the limits of how we share knowledge in the sciences?
  • Data sharing: Openness is key to advancing discovery. What role does data play and what cool things can we do with open data in research?
  • Citizen Science and engagement: From LHC@Home and Zooniverse to things like Microryza, how can we use the web to make research more accessible and use the wisdom of the crowd?
  • Tools for better web science: Come show off your open tool to help change the way we do science.
  • Badges / Altmetrics for research: How can we use new forms of assigning credit to research to facilitate sharing, collaboration and interoperability?
  • Code and data literacy: Hands-on training with tools or other practices to help us better enable a community of digital researchers.

This is just a start, and feel free to stray from this list. We’re currently pulling together the program, and the session call is open through the end of August for you to submit your ideas. Check out last year’s program, and get those proposals in! To submit, fill out this short form (and be sure to select “Science and the Web” track.)

We look forward to your submissions, and don’t forget to get your tickets!

[Interview] The power of the browser as a platform for scientific data visualisation

I recently came across a web-based visualization project called “Luminosity“, a recent winner of the Mozilla Ignite app challenge led by Amit Kapadia, a developer on the citizen science project Zooniverse. Ignite is an open innovation challenge hosted by Mozilla and the National Science Foundation (and a sister project to the Science Lab), funding app development showing the potential of gigabit fiber networks.

Luminosity sets out to better enable researchers to create scientific visualizations based in the browser. The app is focused on astronomical data, exploring the power of using the web as a platform for the sciences.

At the Science Lab, making the web work for research is in part about creating open, innovative tools to help advance discovery. Sometimes these tools are open source by design, othertimes not, but still contribute nevertheless to open science.

Amit’s approach with Luminosity takes that work to another level, exploring using the browser as the platform – a space Mozilla is quite involved in 😉 – to create scientific visualizations. I asked Amit if he would share his thinking about the possibilities for not only better visualization tools in the sciences, but also about his design decision in making his app browser-based.

Kay Thaney: Thanks for joining me today. Let’s start with Luminosity. Tell us a bit more about the problem the app sets out to solve, and how you address it.

Amit Kapadia: Luminosity is an experiment to bring science grade tools to the web.  The browser is evolving into a full-fledged platform where all types of applications can exist.  Many software applications in the sciences rely on older technologies, often sitting on a 10 year old code base.  Luminosity is an attempt at a fresh start.  In its current implementation, the app allows astronomical data to be imported and visualized in the browser, emphasizing a modern user experience, performance, and a collaborative workflow.

Kay Thaney: One thing that struck me from the description was the web-native design. In today’s world, most of us interact with the web through browser-based applications. Is Luminosity’s design unique when it comes to the sciences, or are others taking the same approach towards data visualization tools for research?

Amit Kapadia: The functionality of Luminosity is typically expected from a native desktop application.  Putting aside the science for the moment, there are many application such as Photoshop or Final Cut Pro, that we know as being strictly desktop applications.  Over the last years, browser technology has advanced tremendously such that functionality of these heavy-weight applications can be executed in a browser.  Proof of concepts are beginning to emerge, but far fewer in the sciences.  One example from the sciences is Slice:Drop, an application for visualizing 3D medical data.

Kay Thaney: Did you encounter any challenges in taking a browser-based approach? Why aren’t more startups and tool developers in this space creating tools in this manner?

Amit Kapadia: There are difficulties taking a browser-based approached.  Many disciplines, such as astronomy, have existing toolsets that have matured over the years.  These are often well established libraries written in low level languages.  Migrating to the browser requires porting these libraries to JavaScript, so that, for example, scientific formats can be read in the browser.  This is a difficult task, but offers the reward of using the browser as a scientific platform.

Only now are developers beginning to utilize the full power of modern JavaScript engines.  The gap between native and JavaScript performance is narrowing thanks to the large investment of Mozilla and Google.  It would serve the science community well to take advantage of this technology.  Many developers are still not aware of the level of performance.  Hopefully that changes, and we’ll see more desktop applications migrating to the browser.

Kay Thaney: At the Science Lab, we’re strong supporters of providing not only better tools but ones that can help make concepts such as data visualisation more accessible and understandable to researchers. Looking beyond astronomy, do you think Luminosity could be applied to another disciplines data problems (say, ecology or the life sciences)?

Amit Kapadia: Many of the initial ideas behind Luminosity have been specific to astronomy.  Currently the web app only supports a file format known as FITS (Flexible Image Transport System), a format archiving nearly 40 years of astronomical data.  Despite being able to read an arcane format, the real problem that Luminosity addresses is visualizing complex imagery.  Astronomical images have a very high dynamic range, requiring tailored tools to best visualize the data.  Medical and satellite images share the same problem, storing their data in formats such as DICOM and GeoTIFF.  In the future, Luminosity could be extended to offer more domain specific tools, while levering it’s current capability of displaying high dynamic range images.  The plotting features, utilizing D3, are already generic.  Other disciplines could take advantage once other file formats are supported.

The most experimental feature of Luminosity is the collaborative workflow.  Collaborative image viewing has been implemented for a subset of data, allowing a group of researchers across multiple institutions to view and manipulate an image together.  Collaboration is the most straightforward idea to apply across different disciplines, including education.  Cizzle, another recent submission of Mozilla Ignite, is developing a web app for students to better understand our Solar System.  Between these tools, one could imagine researchers digitally visiting a classroom for a data exploration activity.

Kay Thaney: What is next for Luminosity?

Amit Kapadia: Luminosity is still experimental.  In the short term I’d like to continue building collaborative features, so that researchers spread across a large distance can visualize data together.  Another item on the to do list is visualizing three-dimensional volumes, and luckily WebGL provides a solution.  The application will continue to utilize only open standards, and I’ll continue to experiment with ways to approach performance of native code.

Kay Thaney: Thanks, Amit. We look forward to hearing more about Luminosity’s development in the future.

Want to check out the code behind the project? Visit Amit’s GitHub page and have a look. You can also see the tool in action in this video.

treating code as a first-class research object/citizen

This is the first is a series of posts over the coming months about treating code as a fundamental component – or a first-class citizen – of modern-day research. Research is becoming increasingly reliant on code and analysis, and we’ve come a ways in getting data recognized as a “research object”. But what about the software needed to regenerate analyses? How do we shift the conversation to also recognize the code used to conduct the experiment as a critical piece of research?

The Software Sustainability Institute in the UK has some excellent posts outlining the ideal world many of us are striving for in open science. Here’s an excerpt from their longer post on this entitled “Publish or be damned” that I found especially interesting. (Do give the full post a read when you have a chance, too. It outlines a number of the key issues we need to be cognizant of.)

And, as always, I welcome your thoughts (keep them constructive, please. 🙂 )

***

The Research Software Impact Manifesto

As those involved in the use and development of software used in research, we believe that:

  1. Open science is a fundamental requirement for the overall improvement and acheivement of scientific research.
  2. Open science is built on the tenets of reuse, repurposing, reproducibility and reward.
  3. Software has become the third pillar of research, supporting theory and experiment.
  4. Current mechanisms for measuring impact do not allow the impact of software to be properly tracked in the research community.
  5. We must establish a framework for understanding the impact of software that both recognises and rewards software producers, software users and software contributors; and encourages the sharing and reuse of software to achieve maximum research impact.

To enable this, we subscribe to the following principles:

  • Communality: software is considered as the collective creation of all who have contributed
  • Openness: the ability of others to reuse, extend and repurpose our software should be rewarded
  • One of Many: we recognise that software is an intrinsic part of research, and should not be divorced from other research outputs
  • Pride: we shouldn’t be embarassed by publishing code which is imperfect, nor should other people embarass us
  • Explanation: we will provide sufficient associated data and metadata to allow the significant characteristics of the software to be defined
  • Recognition: if we use a piece of software for our research we will acknowledge its use and let its authors know
  • Availability: when a version of software is “released” we commit to making it available for an extended length of time
  • Tools: the methods of identification and description of software objects must lend themselves to the simple use of multiple tools for tracking impact
  • Equality: credit is due to both the producer and consumer in equal measure, and due to all who have contributed, whether they are academics or not

This does not rescind the values of the current credit system, but reinforces them by acknowledging that there are many forms of output that can lead to indicator events.

announcing the mozilla science lab

I’m thrilled to announce that I’ve joined Mozilla to build and direct their new open science initiative – the Mozilla Science Lab. The project is supported by the Alfred P. Sloan Foundation.

I’m excited to be returning to my open roots, as well as continuing to push the boundaries of what “digital research” can and should look like, and further explore how we can make the web work for science.

Why Mozilla?

Openness, empowerment and disruption are baked into Mozilla’s DNA. Their belief in the power of the open web and drive to explore new ways the technology can transform is inspiring. They truly believe that we all should be able to innovate in the digital world, regardless of your level of technical proficiency – that we should be able to be more than passive consumers. This is incredibly important for science, especially as we grapple with a daunting skills gap at the university level that is, in many cases, disincentivising researchers to participate, to innovate, or even in some cases, continue to do science.

Mozilla cares deeply about “digital literacy”, and it’s time we explore what that means for science, especially given discussion about the “skills gap” in funding circles and at the policy level. I started to unpack this a bit back in January in a piece on Radar – teasing out some of the core competencies I think we’re neglecting in basic science education. We’ll be discussing that more here on the blog in the coming weeks, as well.

The first member of my team is Greg Wilson, founder of Software Carpentry, a program that teaches basic computational literacy to researchers to help them be more productive. I’ve long admired Greg’s work in this space, in providing an entry point for students to learn things like version control, data management, basic scripting. In the last year alone, they’ve run over 70 events for more than 2,200 attendees – all led by volunteers – and are on track to double both numbers in the coming twelve months. More importantly, Software Carpentry is our first step in exploring what “digital literacy” ought to be for researchers and what they need to know to actually do it.

We also want to find ways of supporting and innovating with the research community – building bridges between projects, running experiments of our own, and building community. We have an initial idea of where to start, but want to start an open dialogue to figure out together how to best do that, and where we can be of most value.

I’ll be writing more here on the blog in the coming months as we ramp up development of the program (hint: we have some cool stuff planned. 😉 ). Stay tuned for more in the coming weeks about how you can get involved. You can also check out our wireframe here at wiki.mozilla.org/ScienceLab or follow us @MozillaScience.