Is reproducibility good for scientific progress? (a paper review)

2019-04-23 reproducible research

A few days ago, a discussion in my Twitter timeline caught my attention. It was about a very high-level model for the process of scientific research whose conclusions included the affirmation that reproducibility does not improve the convergence of the research process towards truth. The Twitter discussion set off some alarm bells for me, in particular the use of the term "reproducibility" in the abstract, without specifying which of its many interpretations and application contexts everybody referred. But that's just the Twitter discussion, let's turn to the more relevant question of what to think of the paper itself (preprint on arXiv).

The core of the work presented in that paper is a stochastic model for the process of scientific research. There is some phenomenon described by a "true" mathematical model. Scientists do not know this model, but can obtain data points from it. This is how experiments are described. Scientists do have full access to their own models for reality. At each time step, a scientist generates a new model according to some strategy and evaluates the quality of that model to see if it is "better" (in a well-defined sense) than the current concensus model of the community. One of the strategies is replication of prior work.

Such highly simplified high-level models are easy to criticize because of the huge number of simplifying assumptions. And yet, in other branches of science (such as physics), simple toy models have proven to be very useful. In particular, they can help identify mechanisms that are also present in more realistic (and thus more complex) descriptions of the same phenomena. However, toy models require reality checks as well, in the form of validation, even if validation is qualitative rather than quantitative. This is in my opinion one of the weak spots of this paper: validation is limited to a few basic sanity checks. Given the scarcity of empirical data on the scientific process, this isn't really surprising.

As for the specific issue of reproducibility, the model presented in the paper has a major weakness in that it completely ignores the issues that motivate reproducibility checks and replication studies in real life. Scientists, like all humans, are prone to mistakes and biases. The collective process of scientific research therefore includes verification steps that reduce the impact of mistakes and bias. Peer review is probably the best known one, but reproducibility checks and replication studies fall into this category as well. It is then not surprising that a model without mistakes and bias predicts little utility for verification measures.

However, this is merely a criticism of the current proposed model. It should be possible to include mistakes and bias without profound changes to the basic idea of modelling scientific research by a stochastic process. Confirmation bias is perhaps the simplest case: Let authors of original research overestimate the benefit of their work (as part of the evaluation criterion S in the paper) and replicators underestimate it. As for mistakes, a crude technique would be to let some percentage of scientists generate two new models, evaluate the first one, but report the second one as having been tested. Mistakes detected in a replication study would then lead to erasure of the replicated study from the process of concensus formation.