Konrad Hinsen's Blog

Sustainable software and reproducible research: dealing with software collapse

Two currently much discussed issues in scientific computing are the sustainability of research software and the reproducibility of computer-aided research. I believe that the communities behind these two ideals should work together on taming their common enemy: software collapse. As a starting point, I propose an analysis of how the risk of collapse affects sustainability and reproducibility.

What I call software collapse is what is more commonly referred to as software rot: the fact that software stops working eventually if is not actively maintained. The rot/maintenance metaphor is not appropriate in my opinion because it blames the phenomenon on the wrong part. Software does not disintegrate with time. It stops working because the foundations on which it was built start to move. This is more like an earthquake destroying a house than like fungi or bacteria transforming food, which is why I am trying out the term collapse.

The software stacks used in computational science have a multi-layer structure that seems to be nearly universal. At the bottom, there is non-scientific infrastructure, such as operating systems, compilers, and support code for I/O, user interfaces, etc. All of this software is used by scientists in the same way as by other computer users. The predominant view is that this software is external to scientific computing, much like computer hardware. One exception is infrastructure software for high-performance computing, which like the hardware it runs on is often designed specifically for use in science and engineering.

The second layer is scientific infrastructure. Here we find libraries and utilities used for research in many different disciplines, such as LAPACK, NumPy, or Gnuplot. The people developing this software tend to be researchers or research software engineers, i.e. people with a scientific background. The methods (algorithms, data structures) implemented in these packages are typically well-known and stable. This does not exclude ongoing research on improving the implementations, but from the users’ point of view, the job done by the software remains the same, often for several decades.

The third layer contains discipline-specific research software. These are tools and libraries that implement models and methods which are developed and used by research communities. Often the developers are simply a subset of the user community, but even if they aren’t, they work in very close contact with their users, who provide essential feedback not only on the quality of the software, but also on the directions that future development should take.

The fourth and final layer is project-specific software, which is whatever it takes to do a computation using software building blocks from the lower three levels: scripts, workflows, computational notebooks, small special-purpose libraries and utilities. At the end of a project, such software may become the starting point for software specific to another project, but it is rarely reused without modification, and rarely used by anyone except the members of the project that developed it.

Computational models and methods often move down the stack in the course of time. They are developed initially within a specific project, then the more widely useful ones become part of discipline-specific software, and some of them may find adoption in other fields of research and become a part of the scientific infrastructure layer.

Software in each layer builds on and depends on software in all layers below it, meaning that changes in any lower layer can cause it to collapse.

The reproducible research community focuses on the fourth layer, the project-specific software. Traditionally, the main obstacle to reproducibility was that this layer was not published, and sometimes even deleted by its authors at the end of a project. This layer also contains algorithms executed by a human user, e.g. by entering commands one by one into the computer. This ephemeral software is typically not even recorded. Fixing these problems is mainly a matter of creating an awareness of their importance, and much progress has been made in this respect. But the problem of layer–4 software collapsing due to changes in the lower levels remains largely unsolved. Project-specific software is particularly vulnerable to collapse because it is almost never maintained, since its active days are over.

The sustainable software community is mainly interested in layer 3, the discipline-specific community software. Its development is fragile because the importance of this software is not yet recognized by institutions and funders, unlike the scientific infrastructure software one layer below. Moreover, this software is often developed by scientists with insufficient training in software engineering techniques. There are essentially two tasks that need to be organized and financed: preventing collapse due to changes in layers 1 and 2, and implementing new models and methods as the scientific state of the art advances. These two tasks go on in parallel and are often executed by the same people, but in principle they are separate and one could concentrate on just one or the other.

The common problem of both communities is collapse, and the common enemy is changes in the foundations that scientists and developers build on. The options they have for dealing with this are about the same as for house owners facing the risk of earthquakes:

  1. Accept that your house or software is short-lived. In case of collapse, start from scratch.
  2. Whenever shaking foundations cause damage, do repair work before more serious collapse happens.
  3. Make your house or software robust against perturbations from below.
  4. Choose stable foundations.

House owners generally opt for strategies 3 or 4, or a mixture of them. Strategies 1 and 2 are unattractive because house owners might well be injured or killed during a collapse.

Most software developers, in science or elsewhere, prefer strategies 1 or 2. In many business settings, this makes sense because software is short-lived or rapidly evolving anyway, due to changing requirements and newly appearing possibilities. In science, these motivations exist as well, but must be weighed against the need for preservation of the scientific knowledge embodied by scientific software. You may not care about losing the Web browser you used long ago, given that there’s a better one now. But if ten years from now, doubts come up about the analysis of LIGO data, you want to be able to go back to the analysis code and check what exactly was done at the time.

A difference between the sustainable software and the reproducible research communities is that the former privileges strategy 2, continuous repair, whereas the latter dreams of strategy 4, stable foundations. Strategy 2 is in fact easier to adopt, given that most of the software industry is applying it. Strategy 4 is seen as unrealistic by many, because stable foundations are hard to find, and the few we have impose unpleasant restrictions. But if developers in layer 3 adopt the continuous-repair strategy, this leaves only one option for the code in layer 4 - accept that it is short-lived. This is more or less what we see happening at the moment. For a recent discussion, see this blog post by C. Titus Brown and the discussion following it.

In one of the comments there, Daniel S. Katz proposes a cost-benefit analysis, which to the best of my knowledge has not been attempted until now. However, I think it should be done globally, rather than for an individual research project. A move towards stable foundations (strategy 4) is likely to require a large up-front investment, but lower development costs later on, for scientific code in all layers. It might well be interesting for nothing else but reducing global development costs, not even counting the hard to evaluate benefit of long-term reproducibility.

It’s also worth looking at why software foundations are shaking all the time. Why can’t we just keep on using the same software forever, if we are happy with the way it works?

One reason is the bottom layer of our software stack, which we share with non-scientific software. There are market incentives for shaking up the foundations of commercial software, which then cause collateral damage elsewhere, such as in science. For example, some markets rely on planned obsolescence and never-ending change to create continuous customer demand. Smartphones are a good example. Also, a company controlling a software platform might benefit from changing it a bit all the time in order to retain control and customer attention. Finally, security problems in systems software are discovered regularly, and their fixes can send ripples up the software stack. All this makes it difficult to find stable foundations to build on. However, it is clearly not impossible. After all, banks have been keeping their COBOL software alive for decades. At worst, we could build our own bottom layer instead of sharing it with other application domains. One advantage of scientific software in that respect is that it has few if any security concerns to deal with.

Unfortunately, we also have home-made quakes in our software stack, due to changes in layers 2 and 3. In the fast-paced development of layer 3, collateral damage sometimes leads to collapse in layer 4. I suspect much of this could be avoided with some more attention on stability, plus extensive testing. What’s worse is a widespread attitude that considers stability impossible anyway and concludes that one more breaking change is not such a big problem after all. This is particularly harmful for the scientific infrastructure of layer 2. I’ll just mention my two-year-old rant about NumPy as an example. In view of the systematic non-maintenance of layer–4 software, this is an inappropriate attitude in the world of scientific computing in my opinion.

As a final remark, strategy 3 does not seem to exist in the software world. There are no proven techniques for making a program robust against changes in its foundations. Software interfaces are much too rigid for that. I vaguely remember Alan Kay speaking about more lenient interface mechanisms - if anyone has a reference to share, please leave a comment! A recent presentation by Rich Hickey, the creator of the Clojure language, also contains useful ideas for dealing with change in interfaces (executive summary: add new features, but don’t remove or change existing ones), but it’s more of a move towards strategy 4 than strategy 3. More generally, I would like to see more research and development along these lines. Robustness is a major design principle in other engineering domains, and software would benefit from a larger dose as well.