Konrad Hinsen's Blog

Welcome to my digital garden!

A few years ago, I discovered Mike Caulfield’s The Garden and the Stream: A Technopastoral and understood why I wasn’t happy with my blog.

The dependency hubs in Open Source software

A few days ago, Google announced its experimental project Open Source Insights, which permits the exploration of the dependency graph of Open Source software. My first look at it ended with a disappointment: in its initial stage, the site considers only the package universes of Java, JavaScript, Go, and Rust. That excludes most of the software I know and use, which tends to be written mainly in C, C++, Fortran, and Python. But I do have a package manager that has all the dependency information for most of the software that I care about: Guix. So I set out to do my own exploration of the Guix dependency graph, with a particular focus: identifying the hubs of the Open Source dependency network.

The structure and interpretation of scientific models, part 2

In my last post, I have discussed the two main types of scientific models: empirical models, also called descriptive models, and explanatory models. I have also emphasized the crucial role of equations and specifications in the formulation of explanatory models. But my description of scientific models in that post left aside a very important aspect: on a more fundamental level, all models are stories.

The structure and interpretation of scientific models

It is often said that science rests on two pillars, experiment and theory. Which has lead some to propose one or two additional pillars for the computing age: simulation and data analysis. However, the real two pillars of science are observations and models. Observations are the input to science, in the form of numerous but incomplete and imperfect views on reality. Models are the inner state of science. They represent our current understanding of reality, which is necessarily incomplete and imperfect, but understandable and applicable. Simulation and data analysis are tools for interfacing and thus comparing observations and models. They don’t add new pillars, but transforms both of them. In the following, I will look at how computing is transforming scientific models.

Some comments on AlphaFold

Many people are asking for my opinion on the recent impressive success of AlphaFold at CASP14, perhaps incorrectly assuming that I am an expert on protein folding. I have actually never done any research in that field, but it’s close enough to my research interests that I have closely followed the progress that has been made over the years. Rather than reply to everyone individually, here is a public version of my comments. They are based on the limited information on AlphaFold that is available today. I may come back to this post later and expand it.

The four possibilities of reproducible scientific computations

Computational reproducibility has become a topic of much debate in recent years. Often that debate is fueled by misunderstandings between scientists from different disciplines, each having different needs and priorities. Moreover, the debate is often framed in terms of specific tools and techniques, in spite of the fact that tools and techniques in computing are often short-lived. In the following, I propose to approach the question from the scientists’ point of view rather than from the engineering point of view. My hope is that this point of view will lead to a more constructive discussion, and ultimately to better computational reproducibility.

The landscapes of digital scientific knowledge

Over the last years, an interesting metaphor for information and knowledge curation is beginning to take root. It compares knowledge to a landscape in which it identifies in particular two key elements: streams and gardens. The first use of this metaphor that I am aware of is this essay by Mike Caulfield, which I strongly recommend you to read first. In the following, I will apply this metaphor specifically to scientific knowledge and its possible evolution in the digital era.

An open letter to software engineers criticizing Neil Ferguson’s epidemics simulation code

Dear software engineers,

Many of you were horrified at the sight of the C++ code that Neil Ferguson and his team wrote to simulate the spread of epidemics. I feel with you. The only reason why I am less horrified than you is that I have seen a lot of similar-looking code before. It is in fact quite common in scientific computing, in particular in research projects that have been running for many years. But like you, I don’t have much trust in that code being a faithful and trustworthy implementation of the epidemiological models that it is supposed to implement, and I don’t want to defend bad code in science.

Wanted: a hierarchically modular software architecture

In his 1962 classic “The Architecture of Complexity”, Herbert Simon described the hierarchical structure found in many complex systems, both natural and human-made. But even though complexity is recognized as a major issue in software development today, the architecture described by Simon is not common in software, and in fact seems unsupported by today’s software development and deployment tools.

Emacs as a malleable system

Malleable systems are software systems that are designed to be modified and extended by their users, eliminating the usually strict borderline between developers and users. Making scientific software more malleable is a goal that I have been pursuing for 25 years, starting with a shift from Fortran to Python as my main programming language, and a simultaneous shift from writing programs to writing toolkits, such as my Molecular Modelling Toolkit first published in 1997. Therefore I was pleased to discover the Malleable Systems Collective, which has just published a post in which I examine what is probably the most successful malleable system in the history of software: Emacs. If you care about users having more influence on their software, check out their site!