Over the last years, an interesting metaphor for information and knowledge curation is beginning to take root. It compares knowledge to a landscape in which it identifies in particular two key elements: streams and gardens. The first use of this metaphor that I am aware of is this essay by Mike Caulfield, which I strongly recommend you to read first. In the following, I will apply this metaphor specifically to scientific knowledge and its possible evolution in the digital era.
Dear software engineers,
Many of you were horrified at the sight of the C++ code that Neil Ferguson and his team wrote to simulate the spread of epidemics. I feel with you. The only reason why I am less horrified than you is that I have seen a lot of similar-looking code before. It is in fact quite common in scientific computing, in particular in research projects that have been running for many years. But like you, I don’t have much trust in that code being a faithful and trustworthy implementation of the epidemiological models that it is supposed to implement, and I don’t want to defend bad code in science.
In his 1962 classic “The Architecture of Complexity”, Herbert Simon described the hierarchical structure found in many complex systems, both natural and human-made. But even though complexity is recognized as a major issue in software development today, the architecture described by Simon is not common in software, and in fact seems unsupported by today’s software development and deployment tools.
Malleable systems are software systems that are designed to be modified and extended by their users, eliminating the usually strict borderline between developers and users. Making scientific software more malleable is a goal that I have been pursuing for 25 years, starting with a shift from Fortran to Python as my main programming language, and a simultaneous shift from writing programs to writing toolkits, such as my Molecular Modelling Toolkit first published in 1997. Therefore I was pleased to discover the Malleable Systems Collective, which has just published a post in which I examine what is probably the most successful malleable system in the history of software: Emacs. If you care about users having more influence on their software, check out their site!
One question I have been thinking about in the context of reproducible research is this: Why is all stable software technology old, and all recent technology fragile? Why is it easier to run 40-year-old Fortran code than ten-year-old Python code? A hypothesis that comes to mind immediately is growing code complexity, but I’d expect this to be an amplifier rather than a cause. In this pose, I will look at another candidate: the dominance of Open Source communities in the development of scientific software.
It’s the season when everyone writes about the past year, or even the past decade for a year number ending in 9. I’ll make a modest contribution by summarizing my experience with Pharo after one year of using it for projects of my own.
A coffee break conversion at a scientific conference last week provided an excellent illustration for the industrialization of scientific research that I wrote about in a recent blog post. It has provoked some discussion on Twitter that deserves being recorded and commented on a more permanent medium. Which is here.
Over the last few years, I have spent a lot of time thinking, speaking, and discussing about the reproducibility crisis in scientific research. An obvious but hard to answer question is: Why has reproducibility become such a major problem, in so many disciplines? And why now? In this post, I will make an attempt at formulating an hypothesis: the underlying cause for the reproducibility crisis is the ongoing industrialization of scientific research.
A while ago I wrote about my ideas for a successor of today’s computational notebooks. Since then I have made some progress on a prototype implementation, which is the topic of this post. Again I have made a companion screencast so that you can get a better idea of how all this works in practice.
A few days ago, a discussion in my Twitter timeline caught my attention. It was about a very high-level model for the process of scientific research whose conclusions included the affirmation that reproducibility does not improve the convergence of the research process towards truth. The Twitter discussion set off some alarm bells for me, in particular the use of the term “reproducibility” in the abstract, without specifying which of its many interpretations and application contexts everybody referred. But that’s just the Twitter discussion, let’s turn to the more relevant question of what to think of the paper itself (preprint on arXiv).