In my last post, I have discussed the two main types of scientific models: empirical models, also called descriptive models, and explanatory models. I have also emphasized the crucial role of equations and specifications in the formulation of explanatory models. But my description of scientific models in that post left aside a very important aspect: on a more fundamental level, all models are stories.
It is often said that science rests on two pillars, experiment and theory. Which has lead some to propose one or two additional pillars for the computing age: simulation and data analysis. However, the real two pillars of science are observations and models. Observations are the input to science, in the form of numerous but incomplete and imperfect views on reality. Models are the inner state of science. They represent our current understanding of reality, which is necessarily incomplete and imperfect, but understandable and applicable. Simulation and data analysis are tools for interfacing and thus comparing observations and models. They don’t add new pillars, but transforms both of them. In the following, I will look at how computing is transforming scientific models.
One question I have been thinking about in the context of reproducible research is this: Why is all stable software technology old, and all recent technology fragile? Why is it easier to run 40-year-old Fortran code than ten-year-old Python code? A hypothesis that comes to mind immediately is growing code complexity, but I’d expect this to be an amplifier rather than a cause. In this pose, I will look at another candidate: the dominance of Open Source communities in the development of scientific software.
It’s the season when everyone writes about the past year, or even the past decade for a year number ending in 9. I’ll make a modest contribution by summarizing my experience with Pharo after one year of using it for projects of my own.
A while ago I wrote about my ideas for a successor of today’s computational notebooks. Since then I have made some progress on a prototype implementation, which is the topic of this post. Again I have made a companion screencast so that you can get a better idea of how all this works in practice.
Regular readers of this blog may have noticed that I am not very happy with today’s state of computational notebooks, such as they were pioneered by Mathematica and popularized by more recent free incarnations such as Jupyter, R markdown, or Emacs/OrgMode. In this post and the accompanying screencast (my first one!), I will explain what I dislike about today’s notebooks, and how I think we can do better.
One of the more interesting things I have been playing with recently is Pharo, a modern descendent of Smalltalk. This is a summary of my first impressions after using it on a small (and unfinished) project, for which it might actually turn out to be very helpful.
There is an important and ubiquitous process in scientific research that scientists never seem to talk about. There isn’t even a word for it, as far as I now, so I’ll introduce my own: I’ll call it knowledge distillation.
In today’s scientific practice, there are two main variants of this process, one for individual research studies and one for managing the collective knowledge of a discipline. I’ll briefly present both of them, before coming to the main point of this post, which is the integration of digital knowledge, and in particular software, into the knowledge distillation process.
Since the dawn of computer programming, software developers have been aware of the rapidly growing complexity of code as its size increases. Keeping in mind all the details in a few hundred lines of code is not trivial, and understanding someone else’s code is even more difficult because many higher-level decisions about algorithms and data structures are not visible unless the authors have carefully documented them and keep those comments up to date.
My most recent paper submission (preprint available) is about improving the verifiability of computer-aided research, and contains many references to the related subject of reproducibility. A reviewer asked the same question about all these references: isn’t this the same as for experiments done with lab equipment? Is software worse? I think the answers are of general interest, so here they are.