Since the dawn of computer programming, software developers have been aware of the rapidly growing complexity of code as its size increases. Keeping in mind all the details in a few hundred lines of code is not trivial, and understanding someone else’s code is even more difficult because many higher-level decisions about algorithms and data structures are not visible unless the authors have carefully documented them and keep those comments up to date.
My most recent paper submission (preprint available) is about improving the verifiability of computer-aided research, and contains many references to the related subject of reproducibility. A reviewer asked the same question about all these references: isn’t this the same as for experiments done with lab equipment? Is software worse? I think the answers are of general interest, so here they are.
A recent article in “The Atlantic” has been the subject of many comments in my Twittersphere. It’s about scientific communication in the age of computer-aided research, which requires communicating computations (i.e. code, data, and results) in addition to the traditional narrative of a paper. The article focuses on computational notebooks, a technology introduced in the late 1980s by Mathematica but which has become accessible to most researchers only since Project Jupyter (formerly known as the IPython notebook) started to offer an open-source implementation supporting a wide range of programming languages. The gist of the article is that today’s practice of publishing science in PDF files is obsolete, and that notebooks are the future.
It is widely recognized by now that software is an important ingredient to modern scientific research. If we want to check that published results are valid, and if we want to build on our colleagues’ published work, we must have access to the software and data that were used in the computations. The latest high-impact statement along these lines is a Nature editorial that argues that with any manuscript submission, authors should also submit the data and the software for review. I am all for that, and I hope that more journals will follow.
Over the last few years, I have repeated a little experiment: Have two scientists, or two teams of scientists, write code for the same task, described in plain English as it would appear in a paper, and then compare the results produced by the two programs. Each person/team was asked to do a maximum amount of verification and testing before comparing to the other person’s/team’s work.
In discussions about computational reproducibility (or replicability, or repeatability, according to the preference of each author), I often see the argument that reproducing computations may not be worth the investment in terms of human effort and computational resources. I think this argument misses the point of computational reproducibility.
Two currently much discussed issues in scientific computing are the sustainability of research software and the reproducibility of computer-aided research. I believe that the communities behind these two ideals should work together on taming their common enemy: software collapse. As a starting point, I propose an analysis of how the risk of collapse affects sustainability and reproducibility.
The importance of reproducibility in computer-aided research (and elsewhere) is by now widely recognized in the scientific community. Of course, a lot of work remains to be done before reproducibility can be considered the default. Doing computational research reproducibly must become easier, which requires in particular better support in computational tools. Incentives for working and publishing reproducibly must also be improved. But I believe that the Reproducible Research movement has made enough progress that it’s worth considering the next step towards doing trustworthy research with the help of computers: verifiable research.