Since the dawn of computer programming, software developers have been aware of the rapidly growing complexity of code as its size increases. Keeping in mind all the details in a few hundred lines of code is not trivial, and understanding someone else’s code is even more difficult because many higher-level decisions about algorithms and data structures are not visible unless the authors have carefully documented them and keep those comments up to date.
My most recent paper submission (preprint available) is about improving the verifiability of computer-aided research, and contains many references to the related subject of reproducibility. A reviewer asked the same question about all these references: isn’t this the same as for experiments done with lab equipment? Is software worse? I think the answers are of general interest, so here they are.
It is widely recognized by now that software is an important ingredient to modern scientific research. If we want to check that published results are valid, and if we want to build on our colleagues’ published work, we must have access to the software and data that were used in the computations. The latest high-impact statement along these lines is a Nature editorial that argues that with any manuscript submission, authors should also submit the data and the software for review. I am all for that, and I hope that more journals will follow.
It’s hard to find an aspect of modern life that is not influenced in some way by software. Some of it is very visible, for example the Web browser I start on my computer. Other software is completely invisible, such as the software controlling my car’s diesel engine. Some software is safety critical, for example flight control software in airplanes. Other software is used in a much more futile way, such as playing games. I could go on listing characteristics in which different software packages differ, but I will leave it at that - I don’t really expect anyone to disagree about the ubiquity and diversity of software in our increasingly digital world.
Over the last few years, I have repeated a little experiment: Have two scientists, or two teams of scientists, write code for the same task, described in plain English as it would appear in a paper, and then compare the results produced by the two programs. Each person/team was asked to do a maximum amount of verification and testing before comparing to the other person’s/team’s work.