Konrad Hinsen's blogMon, 01 Jan 2024 16:47:50 +0100Konrad HinsenThe low-hanging fruit in computational reproducibility2023-11-302023-11-30Konrad Hinsen<p>Yesterday I participated in the <a href="https://www.ouvrirlascience.fr/international-workshop-software-pillar-of-open-science/" >International workshop “Software, Pillar of Open Science”</a>, organized by the <a href="https://www.ouvrirlascience.fr/home/" >French Committee for Open Science</a>. In the course of the various presentations and discussions (both in public and during coffee breaks), I realized that something has been absent from such events all the time: the vast majority of scientists.</p>
<!-- more -->
<p>What prompted this insight was the juxtaposition of two observations: during the introduction, the importance of software in research ("92% of all researchers say they rely on software"), and during the panel on reproducibility, the difficulties resulting from the complexities of today's software stacks.</p>
<p>Here's a provocative proposition: we can solve computational reproducibility for a big majority of those 92% of researchers by buying them a license for <a href="https://www.wolfram.com/mathematica/" >Mathematica</a>.</p>
<p>It's not Open Source, and that's bad for Open Science. I agree. But it does everything that most of those researchers need, it's very easy to install and run, and it's stable. You can run 20-year-old Mathematica code in today's version, and get the same results in the vast majority of cases. No reproducibility issues.</p>
<p>It's worth asking the question how a commercial company can solve a problem that highly qualified academic researchers have been discussing for a decade and continue to declare difficult. My answer to this question is threefold: (1) commercial licenses provide the resources for ensuring <a href="https://science-in-the-digital-era.khinsen.net/#The%20sustainability%20doughnut%20of%20scientific%20software" >the floor of the sustainability doughnut</a>, (2) the contractual producer-client relation provides the information necessary for ensuring <a href="https://science-in-the-digital-era.khinsen.net/#The%20sustainability%20doughnut%20of%20scientific%20software" >the ceiling of the sustainability doughnut</a>, and (3) their audience is very different from the participants at software-for-open-science events.</p>
<p>The last aspect is my key message here. All the activities around software in Open Science are organized by and for people who work in computational science, meaning that computation is their principal tool of scientific inquiry. A large proportion of them has a degree in computer science. On the other hand, most of the 92% of researchers who depend on software do <a href="https://science-in-the-digital-era.khinsen.net/#Computer-aided%20research" >computer-aided research</a> but <em>not</em> computational science. Their main tools are instruments or mathematical theories. They use computers as auxiliary tools, mostly for routine data analysis tasks.</p>
<p>The people who contribute to Open Source projects for scientic software have overall the same profile as the participants of software-for-open-science events. They develop and document their software for this kind of profile as well. The invisible others can and do use this software as well, but it's a lot too complex for them. It's above the sustainability ceiling. But since the invisible others are invisible to the developers, they have no way to make their needs heard. In contrast to a commercial company, who knows all of its clients (they are paying for their license every year), cares about them (they are paying for their license every year), and regularly asks them about their needs and their degree of satisfaction.</p>
<p>I brought up this issue during the panel on sustainability, and discovered that there are others who have been thinking about it, for example panel member <a href="https://sloan.org/about/staff/joshua-m-greenberg" >Josh Greenberg</a> from the Sloan foundation (whom I'd also like to thank for an insightful discussion after the event). That's very promising. And here's my proposal for a first step into this direction: let's work on diversity and inclusion in Open Science. Make sure that all of the 92% of software-using researchers are represented.</p>
<!-- Local Variables: -->
<!-- mode: markdown -->
<!-- End: -->
This blog gets a facelift2023-11-162023-11-16Konrad Hinsen<p>Regular visitors to my blog have probably noticed that it looks different now. However, the visual changes are only a side effect of a more profound change: I now use a different static site generator, <a href="https://github.com/coleslaw-org/coleslaw" >coleslaw</a>.</p>
<!-- more -->
<p>It's been a while that I wanted to replace Disqus by a less invasive commenting system, and the recent announcement by Disqus to insert ads into the comments on my blog was what finally motivated me to actually invest some time to get this done. </p>
<p>The first task was to find a replacement for Disqus. One of my criteria was to allow commenting from the Fediverse, to remove the need for creating yet another account on yet another site just in order to be able to comment. The other criterion was not to depend on some third party service that might disappear or turn evil one day. In reply to <a href="https://scholar.social/@khinsen/111345801956217482" >a question on Mastodon</a>, <a href="https://neuromatch.social/@mstimberg/111346021003557962" >Marcel Stimberg</a> pointed me to a <a href="https://carlschwan.eu/2020/12/29/adding-comments-to-your-static-blog-with-mastodon/" >post by Carl Schwan</a> explaining how to use replies to a post-related toot as a channel for commenting. That looked just fine: no need for anyone to set up new accounts, just a one-time investment for updating my blog-generation code.</p>
<p>Next, I explored how to implement this technique in the static site generator I was using before, <a href="https://github.com/greghendershott/frog" >Frog</a>. It turned out to be more complicated than I expected, because Frog allows only a fixed set of metadata fields on a post. Adding a field is certainly not impossible, but I'd have had to make changes to many places in the code to add parsing code for the new field and then pass its optional value around from function to function until its final destination in HTML rendering.</p>
<p>Before attacking such a major code surgery, I checked out other static site generators on a few-hour train ride, looking for one that supports arbitrary metadata or, better yet, is more hackable than Frog. After all, I might want to make other changes in the future, so having a codebase that I feel comfortable hacking on is likely to be valuable. Given my recently renewed interest in Common Lisp (see <a href="/posts/2023/10/09/deconstructing-the-mastodon-client.html" >this post</a>) for the reasons), I quickly settled on Coleslaw as a candidate to take a closer look at.</p>
<p>Coleslaw has a fixed set of metadata fields as well, but that set is defined by the slots of a class. Just add a slow, and you have a new metadata field. Very hackable! Moreover, the codebase is reasonably small, and while it's not a model of clarity, the ability to explore the code in a live programming environment makes it rather easy to get into, contrary to the more static and debug-hostile Racket code of Frog.</p>
<p>So that's why you are now looking at a Coleslaw-generated blog. It's my personal modified fork for now. I may look into factoring out my add-ons as plugins and submit them upstream, but this is absolutely not a high-priority project. Many people have their own fork of Coleslaw with similar personalizations, and that looks just fine. The forks are even very discoverable via GitHub. I'd prefer having discoverability <em>beyond</em> a single forge, but I don't think that's doable today.</p>
<p>Even though the blog looks very different, the contents of the posts have not changed, and the URLs remain identical as well. That took another ten minutes of hacking on Coleslaw. The URLs of the RSS and Atom feeds have also remained the same. I have exported the comments from Disqus and added them as static HTML on the posts. You can no longer add comments on the old posts, but at least read the existing ones. As a bonus, I also imported the posts from my very first blog at wordpress.com, because Coleslaw comes with a Wordpress importer that makes this a very straightforward operation.</p>
<p>The visual presentation of the pages isn't really to my taste, but I am not sure I'll be able to come up with something significantly better with my current rudimentary knowledge of CSS. I'll leave that for a future facelist session, which may of course never happen.</p>
<!-- Local Variables: -->
<!-- mode: markdown -->
<!-- End: -->
Following branching conversations on Mastodon2023-11-052023-11-05Konrad Hinsen<p>This post is a follow-up to my previous one, <a href="http://blog.khinsen.net/posts/2023/10/09/deconstructing-the-mastodon-client/" >Deconstructing the Mastodon client</a>. My topic is a scenario that traditional Mastodon clients handle rather badly, wheres my home-grown solution handled it very well: lengthy and branching conversations.</p>
<!-- more -->
<p>Such conversations happen all the time on social networks. Someone posts an interesting question or observation, which is commented by many others. Then comments are added to comments, and soon the replies form a branching tree that grows over a few days, sometimes even weeks. Keeping up to date with such a conversation is not supported by any Mastodon client I know of. Worse, due to the way Mastodon implements federation, some replies may never arrive on your instance.</p>
<p>What I did in the past is put a bookmark on the initial toot, and then check for new replies once per day or so. Once you get to dozens of toots, checking for new ones is already a minor effort. And although I know how to check for replies outside of my own instance, in practice I hardly ever do it because it's too laborious.</p>
<p>A <a href="https://git.sr.ht/~khinsen/malleable-message-manager/tree/main/item/examples/conversations.lisp" >simple script</a> that I run once per day makes this a lot easier. I still mark interesting conversations as bookmarks. But now it's my script that copies the whole tree into a mail folder, skipping toots that are already present. New additions to the tree thus show up as unread mails in my inbox, just like replies in a mailing list. Better yet, my script retrieves the whole tree twice: once from my own instance, and once by retrieving each toot from the instance it was posted to, checking on that instance for replies. Neither approach is sufficient on its own: my instance doesn't see all replies, but the foreign instances from which I retrieve toots won't show me non-public toots.</p>
<p>Nothing of this is rocket science, but it's a nice illustrations of the possibilities that open up once you take control over your personal information environment. I wish this were easier, and thus accessible to more people. But it won't get easier as long as most computer users find it perfectly normal that a small technophile elite defines what everyone else is able to do in their digital lives. So if you are reading this and think "nice, but that's above my level of competence", the very least you should do is express your desire to be able to do such things on your own. On Mastodon, for example.</p>
<!-- Local Variables: -->
<!-- mode: markdown -->
<!-- End: -->
Deconstructing the Mastodon client2023-10-092023-10-09Konrad Hinsen<p>Ever since I joined Twitter in 2011, and then moved to Mastodon in 2022, I have been unhappy with the timeline view proposed by both of these communication platforms as their main interface. Now I have finally done something about it: I wrote my own Mastodon client. Or perhaps rather a non-client, because the concept of "the client" is a big part of what I disliked.</p>
<!-- more -->
<p>My use of social networks can be broken down into three categories:</p>
<ol>
<li>conversations, mostly public but sometimes private</li>
<li>keeping up to date with the work of a small number of people or institutions</li>
<li>staying in touch with communities I consider myself a part of, and following topics I find interesting</li>
</ol>
<p>These are not clearly separated categories. It's often messages from category 2 that start conversations, and occasionally messages from category 3. But most of my daily use of Mastodon consists of</p>
<ol>
<li>participating in ongoing conversations</li>
<li>reading the feeds of accounts I care about specifically</li>
<li>scanning all the other news feeds sporadically and often superficially, depending on how much time and interest I have at the moment</li>
</ol>
<p>A timeline view mixing all messages from all accounts I follow is somewhat acceptable for (3), but no good for (1) and (2). Mastodon proposes lists for (2), and notifications to help with (1), but neither mechanism is satisfying for me. Lists in particular suffer from an awkward user interface. Moreover, I do (3) exclusively on mobile devices (on the bus etc.), (1) almost exclusively on the desktop (as I don't like typing on on-screen keyboards), and (2) alternating between multiple devices.</p>
<p>There are, of course, many Mastodon clients, so I tried out a few of them. For a while, I used Fedilab on Android (for me: phone and e-ink tablet) for activity (3), and the default Web client and/or <a href="https://elk.zone/" >Elk</a>, mainly on the desktop, for (1) and (2). It was a workable setup, but not a satisfying one. In addition to the cumbersome list interface, what I found missing was synchronization between my usage of multiple devices For (2), I'd need to be able to efficiently access all messages I hadn't seen before, on any of my devices (two mobile, two desktop). As a long-time Emacs user, I also tried <a href="https://codeberg.org/martianh/mastodon.el" >mastodon.el</a>, which is nice but, like Emacs, it is desktop only, and thus doesn't help with my multi-device issues.</p>
<p>At some point I realized that what I wanted is not a better Mastodon <em>client</em>, but a better Mastodon <em>workflow</em>. What I care about is a data structure, a stream of toots, that is accessible via an HTTP API. I want to split this stream into several streams according to various criteria. For some substreams, I want to make sure I don't miss any message. For others, I need an interface to scan all messages when I feel like it, or search for specific keywords when I don't have time for scanning everything.</p>
<p>Can I get such interfaces to Mastodon streams without writing my own client? Yes, by repurposing existing software. Small streams of which I don't want to miss anything are much like e-mail (after spam filtering of course!). High-volume streams that I scan or search are much like RSS feeds. There is a lot of good software for managing e-mail and RSS feeds, for all platforms I use and even exotic platforms that I don't use (yet?). There are also good infrastructure tools in this space, in particular for e-mail. <a href="https://isync.sourceforge.io/" >isync</a>, for example, takes care of IMAP(S), letting me work with local files (Maildir) and not worry about networks, certificates, and their various modes of failure.</p>
<p>It actually takes surprisingly little software to transform Mastodon streams into e-mail and RSS feeds, if you can resist temptations of overengineering. A toot is a snippet of HTML with optional attachments (images, video, audio). That's also what a MIME message happens to be. A near-perfect match. RSS items are HTML snippets as well. No attachments, but you can include the same preview images that Mastodon clients display with toots. If you can find support libraries for mail, RSS, and the Mastodon API in a programming language that you know well enough, this becomes a very manageable side project.</p>
<p>If your preferences match mine, meaning you are happy to use Common Lisp for such a job, you can use <a href="https://sr.ht/~khinsen/malleable-message-manager/" >my code</a> as a starting point for your own Mastodon experiments. Its main support libraries are <a href="https://github.com/Shinmera/tooter" >tooter</a> for the Mastodon API, and <a href="https://github.com/40ants/mel-base" >mel-base</a> for e-mail. RSS is trivial if you have XML support, for which I use <a href="https://github.com/shinmera/plump" >plump</a>. My RSS aggregator is <a href="https://newsblur.com/" >Newsblur</a>, which has a reasonable Web interface for the desktop and a very nice Android app. For e-mail, I use K9 on Android, and Emacs on the desktop, but I am pretty sure any other e-mail client would work fine as well. The most time-consuming aspect turned out to be mel-base, a library that's insufficiently documented and not quite up to date, lacking support in particular for subject lines and account names containing Unicode characters.</p>
<p>If you have followed so far, you have probably noticed that my non-client supports nothing but reading toots. Each of my transformed toots ends with a link that opens it in the default Web client, where I can reply, boost, or like. The Web client is also what I use for administrative tasks. Bonus: I add another link to each toot that opens it in the instance of its author, where I have access to the full reply chain, of which my own instance often captures only a subset. A very simple solution to one of Mastodon's unfortunate limitations that are due to federation.</p>
<p>The hopefully generalizable lesson from this project is that it is possible to improve one's personal computing environment with reasonable effort, under the condition of accepting an initial learning curve for some technologies. The important question then is how to identify technologies that are worth learning, which I interpret as technologies that are likely to be useful again for other software personalization efforts. A first draft of a list of criteria:</p>
<ol>
<li><strong>Choose <a href="https://boringtechnology.club/" >boring technology</a>.</strong> You want well-known, well-documented, and stable infrastructure to build on. No surprises, no tech churn. Your learning effort should be a good investment.</li>
<li><strong>Choose small-scale rather than enterprise-grade technology.</strong> Your problems and challenges are very different from Microsoft's. Prefer small software stacks.</li>
<li>Corollary 1: <strong>choose carefully who you turn to for advice.</strong> Most conference talks, blog posts, StackOverflow discussions, etc. come from software professionals. Better listen to people like yourself (but no, I have no advice on where to find them, nor how to judge their competence).</li>
<li>Corollary 2: <strong>consider old technology.</strong> Most modern software development tools are designed for software professionals. Tools for small-scale development were common in the 1980s and 1990s, before computers became commodities. Technology from that era that's still supported today may well be your best bet. I am a happy user of <a href="https://www.gnu.org/s/emacs/" >Emacs</a>, Smalltalk (more precisely <a href="https://pharo.org/" >Pharo</a> with <a href="https://gtoolkit.com/" >Glamorous Toolkit</a> as my preferred user interface), and Common Lisp (more precisely <a href="https://www.sbcl.org/" >SBCL</a>). Python is from the 1990s as well, but since it was widely adopted by software professionals in the 2000s, its ecosystem suffers too much from tech churn for my taste.</li>
<li><strong>Build on general protocols and file formats rather than specialized ones.</strong> Hierarchical filesystems rather than the Dropbox API. E-mail rather than Matrix. HTML, XML, and JSON files rather than JavaScript libraries or Web APIs.</li>
<li><strong>Consider debuggability.</strong> Delegate hard-to-debug stuff (e.g. networking, in particular with encryption) to other software. Choose tools that support debuggability. Debugging is a lot easier if you can build your own problem-specific debugging tools, which in turn is best supported by development tools that are extensible and focus on rapid feedback. Smalltalk systems are best in class in this respect, and Glamorous Toolkit even turned this into a design principle, called "Moldable Development".</li>
</ol>
<p>Unfortunately, there is one more aspect to making good choices that is hard to generalize: you need some expertise in figuring out which problems you can solve yourself with reasonable effort and which are so hard that your efforts are better spent on delegating or circumventing them. Data synchronization is in this second category, but like most people I learned this the hard way (years ago), while trying to do it myself and losing both time and data in the process.</p>
<p><br></p>
<p>After a few weeks of using my setup, I am fully satisfied with it. I also note that my original ideas about defining my personal algorithmic feeds have evolved substantially with practical experience. Once I have taken care of conversations (they go to e-mail) and the small set of accounts I follow closely (a low-volume RSS feed), I ended up splitting the remaining toots (i.e. most of my timeline) by topics in the crudest imaginable way: substring search. It's not perfect but definitely good enough. There's always room for improvement. My main failure so far is in removing all the cat-related toots from my feeds. That may actually require AI-based image recognition. Some problems are hard!</p>
<p>I'd love to hear about similar projects in this space (tell me <a href="http://scholar.social/@khinsen" >on Mastodon</a>!). The only one I am aware of is <a href="https://steampipe.io/blog/mastodon" >Jon Udell's Steampipe-based client</a>. Steampipe provides an SQL/database view on many Web services, which is perfect for doing non-trivial queries. That's something my own setup doesn't address at all. It's not something I feel a need for right now, but I may well add Jon's client to my toolbox one day.</p>
<!-- Local Variables: -->
<!-- mode: markdown -->
<!-- End: -->
Welcome to my digital garden!2022-08-312022-08-31Konrad Hinsen<p>A few years ago, I discovered Mike Caulfield's <a href="https://hapgood.us/2015/10/17/the-garden-and-the-stream-a-technopastoral/amp/" >The Garden and the Stream: A Technopastoral</a> and understood why I wasn't happy with my blog.</p>
<!-- more -->
<p>Blogs are streams, timelines of posts. Each post has a timestamp, and is considered "finished". Later changes are technically possible, but culturally limited to corrections. A blog post is considered a published essay, and therefore comes with a date of publication. I am much more interested in gardens, which are collections of essays that are revised and improved over long time periods.</p>
<p>It took me a while to actually set up a digital garden and populate it with some content, but I eventually did it. I won't say much about it because it speaks for itself. It's just one click away: <a href="https://science-in-the-digital-era.khinsen.net/" >https://science-in-the-digital-era.khinsen.net/</a></p>
<p>Does this mean the end of this blog? No, but posts will become even rarer. A blog is still the best place to make announcements, or to comment on events. But I am a researcher, not a journalist. The fundamental job of a researcher is to curate and extend knowledge collections. That's what I will do from now on in my own little garden.</p>
<!-- Local Variables: -->
<!-- mode: markdown -->
<!-- End: -->
The dependency hubs in Open Source software2021-06-102021-06-10Konrad Hinsen<p>A few days ago, Google announced its experimental project <a href="https://deps.dev/" >Open Source Insights</a>, which permits the exploration of the dependency graph of Open Source software. My first look at it ended with a disappointment: in its initial stage, the site considers only the package universes of Java, JavaScript, Go, and Rust. That excludes most of the software I know and use, which tends to be written mainly in C, C++, Fortran, and Python. But I do have a package manager that has all the dependency information for most of the software that I care about: <a href="https://guix.gnu.org/" >Guix</a>. So I set out to do my own exploration of the Guix dependency graph, with a particular focus: identifying the hubs of the Open Source dependency network.</p>
<!-- more -->
<p>This was also a good opportunity to test the practical utility of <a href="https://github.com/khinsen/guix-gtoolkit" >a new GUI for Guix</a> that I have been working on recently as a side project. In fact, I added this dependency hub analysis to that GUI, so now you can access it with a simple click.</p>
<p>Software being the complex beast that it is, I have to start by properly defining the subjects of my inquiry. What exactly do I mean by "package", "dependency", and "dependency hub"?</p>
<p>The term <em>package</em> is widely used to describe a unit of development and distribution in software systems, but every package manager has a slightly different notion of what a package actually is. A package could be "Python", or "Python 3.8.2", or "Python 3.8.2 built with gcc 7.5, version X of dependency Y, ...". Guix adopts the last, most fine-grained, definition. This is a good choice when you want to do reproducible software builds, but it is not very useful for analyzing dependency graphs. So I chose the level of name + version number, meaning that I consider "Python 3.8.2" a different package from "Python 3.8.1". That's of course debatable as well. But in Guix, it is rare to have multiple versions of a piece of software coexist at the same time. When it does happen, there is a good reason, typically a significant evolution in the software that makes different dependents prefer different versions. An example is Python 2 vs. Python 3, or the different major versions of gcc. In those cases, looking at their dependencies and dependents separately does make sense.</p>
<p>The term <em>dependency</em> is also widely used with different meanings. The two most common ones are <em>runtime dependency</em> and <em>build dependency</em>. A runtime dependency of package X is a package that must be installed on the computer to <em>use</em> package X. In contrast, a build dependency is a package that is required in order to <em>build</em> package X, where <em>building</em> means anything required to turn source code into something executable. Think of it as a generalization of <em>compiling</em>. Usually the build dependencies are roughly a superset of the runtime dependencies: there are packages you need to build package X, e.g. a compiler, but which are then no longer required for using package X. It's the build dependencies that matter for the evolution of software systems, so that's the definition I used in my analysis.</p>
<p>Unfortunately, the complexity of defining dependencies doesn't end there. Many packages have <em>optional</em> dependencies. When they are available, some additional functionality is enabled. Do you count them or not? My pragmatic take is that I trust the Guix developers to have made good choices. So for me, a dependency is whatever it takes to build a package in Guix.</p>
<p>This leaves the notion of a <em>dependency hub</em> to be defined. In network science, a hub is a node that has an exceptionally high number of connections to other nodes, such that a large share of the information propagating through the network passes through the hubs. A software dependency graph differs from most networks in that its edges have a direction: A depending on B is not the same as B depending on A. This leads to several <em>a priori</em> reasonable definitions for hubs: 1. packages that have many dependencies, 2. packages that have many dependents, and 3. packages for which the sum of dependencies plus dependents is high. Let's immediately eliminate the last definition, as I see no interest in it. Definition 1 identifies the packages that are particularly <em>vulnerable</em> to <a href="https://hal.archives-ouvertes.fr/hal-02117588/document" >software collapse</a>, definition 2 the packages that can most easily <em>cause</em> software collapse.</p>
<p>The latter characteristic corresponds best to the capture of information flow as the defining feature of network hubs, and it also happens to be what I am most interested in. The information that flows in the network is requests for change. Nodes receive such requests from dependents, who are in fact the software's clients or users. They typically ask for improved or extended functionality. Nodes also receive requests from dependencies, when they implement changes that break backward compatibility and then ask <em>their</em> dependents to adapt to these changes. The nodes that potentially receive and send many requests for change are thus the nodes who have the most dependents. They are the hubs in the dependency network. Note, however, that the asymmetry in the dependency relation still matters. Nodes can ignore requests for change coming from their dependents, but they cannot ignore requests coming from their dependencies. It's called "dependency" for a reason!</p>
<p>At this point, I can take a break from theory and show you the results of my analysis. The top twenty hubs in the Guix dependency graph are:
<table>
<tr>
<th>Package</th> <th>Number of dependents</th>
</tr>
<tr>
<td>perl 5.30.2</td> <td>7964</td>
</tr>
<tr>
<td>pkg-config 0.29.2</td> <td>7938</td>
</tr>
<tr>
<td>zlib 1.2.11</td> <td>7414</td>
</tr>
<tr>
<td>ncurses 6.2</td> <td>7337</td>
</tr>
<tr>
<td>libffi 3.3</td> <td>6687</td>
</tr>
<tr>
<td>xz 5.2.4</td> <td>6535</td>
</tr>
<tr>
<td>readline 8.0</td> <td>6503</td>
</tr>
<tr>
<td>libxml2 2.9.10</td> <td>6302</td>
</tr>
<tr>
<td>expat 2.2.9</td> <td>6170</td>
</tr>
<tr>
<td>libunistring 0.9.10</td> <td>6150</td>
</tr>
<tr>
<td>bzip2 1.0.8</td> <td>6070</td>
</tr>
<tr>
<td>tzdata2019c</td> <td>6068</td>
</tr>
<tr>
<td>Python 3.8.2</td> <td>6061</td>
</tr>
<tr>
<td>bash 5.0</td> <td>6042</td>
</tr>
<tr>
<td>gettext 0.20.1</td> <td>5768</td>
</tr>
<tr>
<td>m4 1.4.18</td> <td>5621</td>
</tr>
<tr>
<td>libgpg error-1.37</td> <td>5518</td>
</tr>
<tr>
<td>libgcrypt 1.8.5</td> <td>5514</td>
</tr>
<tr>
<td>libxslt 1.1.34</td> <td>5479</td>
</tr>
<tr>
<td>gmp 6.2.0</td> <td>5363</td>
</tr>
</table>
If you want more, <a href="../../../../static/hubs.json" >here</a> is the full list as a JSON file, sorted by decreasing number of dependents.</p>
<p>If you have thought a bit about what to expect before looking at this table, you have probably included programming languages such as <tt>perl</tt> or <tt>python</tt> in this list. But perhaps you did not expect to see utilities such as <tt>pkg-config</tt> or <tt>bzip2</tt>. Remember these are <em>build</em> dependencies. The very first step in building a package, <em>any</em> package, is unpacking its source code. Many of the packages in my top-twenty list represent boring but essential infrastructure software. The software equivalent of the power grid and the road network: stuff that everybody just takes for granted. Such packages rarely get into the news, except when something goes seriously wrong, as in the case of the <a href="https://heartbleed.com/" >Heartbleed bug</a> affecting OpenSSL. Which, by the way, is at position 634 in my list. It would be much higher up in a network defined by different criteria, of course. There's more to software than build dependencies.</p>
<p>One motivation for writing this post was to point out a common fallacy in reasoning about Open Source software. A popular argument is that Open Source gives you the freedom to change software to fit your needs, by creating and maintaining your own fork. Or paying someone else to do it for you, if you are not an accomplished hacker yourself. The source code is there for anyone to grab, after all, and the license allows modification and redistribution.</p>
<p>This argument was valid in the 1980s. There were few packages, few dependencies, and a much higher percentage of computer users had programming experience. Today, you can perhaps maintain your own fork of Perl, but you cannot fork its hub position in the network, nor can you reasonably maintain forks of its 7964 dependants. If the Perl maintainers introduce a breaking change, those 7964 dependents will either adapt or disappear. Hypothetically, a large number of them could together envisage maintaining their own fork. But there are no good coordination mechanisms among developers of unrelated Open Source projects, and therefore this doesn't happen in practice.</p>
<p>In an <a href="https://blog.khinsen.net/posts/2020/02/26/the-rise-of-community-owned-monopolies/" >earlier post</a>, I have written about community-owned monopolies in the Open Source universe. In that post, I wrote that for software users, there is no practical difference between Microsoft killing Windows 7 and the Python community killing Python 2, even though the former is proprietary and commercial, whereas the latter is Open Source. The reason is that both pieces of software are hubs in dependency networks. Microsoft and the Python developer community are two very different institutions, with very different goals, values, policies, legal status, etc. But that hardly matters for the average software user, whose work depends on a complex web of interacting pieces of software. At the level of that web, it's the information flow patterns that determine evolution. Requests for change, or non-change. Average software users have practically no way to make their needs heard by the people who manage the hubs. Even the best-intentioned altruistic Open Source hub maintainer cannot possibly keep every user's interests in mind, because there is no way to even be aware of them. A web of software is a very different beast than a single project. <a href="http://robotics.cs.tamu.edu/dshell/cs689/papers/anderson72more_is_different.pdf" >More is different.</a></p>
<p>In the almost 40 years since the beginnings of the Open Source movement, the mode of governance of Open Source projects has evolved significantly. Most importantly, all the people involved have realized that governance matters and must be consciously organized, rather than evolve through cumulative random accidents of history, which almost inevitably leads to a <a href="https://en.wikipedia.org/wiki/The_Tyranny_of_Structurelessness" >tyranny of structurelessness</a> in the long run. Now we must develop an awareness of similar issues at the level of the <em>web</em> of Open Source projects, followed by the development and implementation of better information flow and decision structures.</p>
<p>I will conclude this post with a technical remark. I did my dependency hub analysis using a relatively new tool in the software world, called the <a href="https://gtoolkit.com/" >Glamorous Toolkit</a>, to which I added an <a href="https://github.com/khinsen/guix-gtoolkit" >interface to Guix</a>. This toolbox significantly lowers the cost of developing new tools. In the screenshot below, you see on the left the user interface of my analysis. It's an additional view on the Guix package catalog, complementing various other views that are already in place. On the right, you see the complete code for this analysis, including the user interface (which also gives access to the list of dependents, not just the number). In contrast to traditional scripts, there is no overhead for reading data or writing out the results. My code works on data structures that are already in place. What is not obvious from the screenshot is that you get the right-hand panel via alt-click from the left-hand one, meaning that users of my little analysis tool always have direct access to the code. It isn't obvious either that modifying the code on the right will immediately update the view on the left, making development highly interactive. If you think notebooks are great, try Glamorous Toolkit. But be warned that you might then realize that notebooks are no longer the state of the art.</p>
<div class="figure">
<img src="../../../../static/guix-gtoolkit-dependency-hubs.png" alt="" width="100%"/>
<p class="caption"></p>
</div>
<!-- Local Variables: -->
<!-- mode: markdown -->
<!-- End: -->
The structure and interpretation of scientific models, part 22021-01-082021-01-08Konrad Hinsen<p>In <a href="https://blog.khinsen.net/posts/2020/12/10/the-structure-and-interpretation-of-scientific-models/" >my last post</a>, I have discussed the two main types of scientific models: empirical models, also called descriptive models, and explanatory models. I have also emphasized the crucial role of equations and specifications in the formulation of explanatory models. But my description of scientific models in that post left aside a very important aspect: on a more fundamental level, all models are stories.</p>
<!-- more -->
<p>To illustrate my point, I will take up my running example from part 1: celestial mechanics. Newton's model for our solar system is, as I said, composed of several equations, the most famous of which, <em>F</em> = <em>m</em> ⋅ <em>a</em>, many readers will probably remember from a high-school physics class. But that equation means nothing on its own. It just says that there are three quantities, one of which being the product of the other two.</p>
<p>The minimal story required to make sense of this equation provides a definition of the three quantities involved. For acceleration (the <em>a</em>), this may look superficially simple: it's the second derivative of an object's position in time. The concepts of position and time are part of our everyday intuition, so that's the easy part. Velocity is an intuitive everyday concept as well, but its precise relation to position as a time derivative is not. For acceleration, nothing short of calculus will do. In fact, Newton invented calculus along with his physical theory! Defining mass (the <em>m</em>) and force (the <em>F</em>) is not a trivial task either. Both concepts are rooted in our everyday intuition about the world, but their role in Newton's law of motion requires a much more precise understanding. If you have doubts about this, try explaining the difference between <em>mass</em> and <em>weight</em> to someone who doesn't have a scientific education.</p>
<p>From this big-picture point of view, equations such as <em>F</em> = <em>m</em> ⋅ <em>a</em> are tiny pieces of our scientific models. They are the tips of icebergs whose massive underwater parts are the stories defining the underlying concepts and linking them to our intuition about the world, often through multiple and increasingly abstract layers. We tend to forget about these stories, because once we have understood them well enough, what we actually work with are the equations. But this works only for the well-established models whose stories are now found in textbooks. New research continuously introduces new models, often as small variants or extensions of existing ones. Their stories are told in scientific publications.</p>
<p>Historically, <a href="https://en.wikipedia.org/wiki/History_of_mathematical_notation" >mathematical notation</a> was introduced as a convenient shorthand for use in plain-language stories. The lengthy phrase "force equals mass times acceleration" thus became <em>F</em> = <em>m</em> ⋅ <em>a</em>. The transition to symbolic equations encouraged the development of formal methods in mathematics, starting with algebraic transformations of simple equations. This approach was so successful that equations became the main focus of interest in science. Later, other formal representations were added for the non-numerical aspects of models, graphs being the prime example. The most recent addition to the collection of formal notations for scientific models is software. Today, scientists spend most of their time working with the formalized parts of scientific models, such as equations or algorithms, to the point of neglecting the stories that give them meaning.</p>
<p>What happens when people use the equations of scientific models without a proper understanding of their stories is nicely illustrated by the joke about the physics student who combines Einstein's <em>E</em> = <em>m</em> ⋅ <em>c²</em> with Pythagoras' <em>a²</em> + <em>b²</em> = <em>c²</em> to deduce <em>E</em> = <em>m</em> ⋅ (<em>a²</em> + <em>b²</em>). It works as a joke among physicists because in their community, everybody knows the two inputs and the contexts from which they are taken. For other people, there is nothing funny about this reasoning, and it can even look convincing. Such superficial use of scientific models without understanding their context is actually quite common in today's research: the inappropriate use of statistical inference methods is a major cause of the <a href="https://en.wikipedia.org/wiki/Replication_crisis" >reproducibility crisis</a>.</p>
<p>Computing technology has played a big role in alienating scientists from their models. Most obviously, computers have made it possible to apply scientific models and methods as black-box tools: in an automated fashion, without understanding them. But the attitudes of the software industry, whose development tools computational science has inherited, have also contributed to this tendency. The focus of the software industry is on professional developers making tools for others that almost magically solve some of their problems. Users then get a manual, or hands-on training, for learning how to use the tool, but the inner workings of the tool are something they shouldn't even have to think about. A good tool is one that minimizes learning requirements. Applied to science, this implies that users shouldn't have to know the stories behind the models. Everyone with a dataset should be able to do statistical inference with a few mouse clicks and get a nice visualization. But without the stories, we can easily draw wrong conclusions from nice graphics.</p>
<p>After a long period of separation of tools and stories, computational notebooks are now bringing some of the stories back. The enthusiastic adoption of notebooks by computational scientists is perhaps the best evidence for the importance of stories in science. But today's notebooks capture only the surface stories of a research project. It's tips of icebergs again. The typical notebook makes use of a large number of code libraries that are based on non-trivial scientific models, but the reader of the notebook remains completely unaware of them. Ideally, these models, with their stories, should be only a few clicks away.</p>
<p>So what would an electronic representation of scientific models look like, ideally? It's a collection of cross-referencing stories. In the celestial mechanics example, there's a story about positions, velocities, and accelerations, which refers to a story about time and to a story about derivatives. There is another story that explains mass. The story of Newton's law of motion, which also introduces the concept of force, can then refer to these more fundamental stories. If this description reminds you of Wikipedia, or in fact of any Wiki, you are right. Wikis are also collections of cross-referencing stories. What is missing in Wikis is a machine-readable version of the formalized parts of our models. Which, as I explained in <a href="https://blog.khinsen.net/posts/2020/12/10/the-structure-and-interpretation-of-scientific-models/" >part 1</a>, needs to allow at least equations, specifications, and algorithms for its ingredients. Another feature that is missing in today's Wikis, although some people are working on it, is the possibility to integrate computational tools in the form of code snippets. Their role would be to give access to visualizations, simulations, and other exploration tools.</p>
<p>My own experiments in this domain are <a href="https://github.com/khinsen/leibniz/" >Leibniz</a>, a digital scientific notation for embedding machine-readable formal models into human-readable stories, and the <a href="https://github.com/activepapers/activepapers-pharo" >Pharo edition of ActivePapers</a>, which integrates datasets and computational tools into a Wiki-like collection of stories. Both ingredients require more work, and then need to be combined. There remains a lot of work to do.</p>
<!-- Local Variables: -->
<!-- mode: markdown -->
<!-- End: -->
The structure and interpretation of scientific models2020-12-102020-12-10Konrad Hinsen<p>It is often said that science rests on two pillars, experiment and theory. Which has lead some to propose <a href="https://physicsworld.com/a/the-third-pillar-of-science/" >one</a> or <a href="https://www.hpcwire.com/2019/04/18/is-data-science-the-fourth-pillar-of-the-scientific-method/" >two</a> additional pillars for the computing age: simulation and data analysis. However, the <em>real</em> two pillars of science are observations and models. Observations are the input to science, in the form of numerous but incomplete and imperfect views on reality. Models are the inner state of science. They represent our current understanding of reality, which is necessarily incomplete and imperfect, but understandable and applicable. Simulation and data analysis are tools for interfacing and thus comparing observations and models. They don't add new pillars, but transforms both of them. In the following, I will look at how computing is transforming scientific models.</p>
<!-- more -->
<h2>Empirical models</h2>
<p>The first type of scientific model that people construct when figuring out a new phenomenon is the <em>empirical</em> or <em>descriptive</em> model. Its role is to capture observed regularities, and to separate them from noise, the latter being small deviations from the regular behavior that are, at least provisionally, attributed to imprecisions in the observations, or to perturbations to be left for later study. Whenever you fit a straight line to a set of points, for example, you are constructing an empirical model that captures the linear relation between two observables. Empirical models almost always have parameters that must be fitted to observations. Once the parameters have been fitted, the model can be used to <em>predict</em> future observations, which is a great way to test its generality. Usually, empirical models are constructed from generic building blocks: polynomials and sine waves for constructing mathematical functions, circles, spheres, and triangles for geometric figures, etc.</p>
<p>The use of empirical models goes back a few thousand years. As I have described in <a href="https://blog.khinsen.net/posts/2017/12/19/data-science-in-ancient-greece/" >an earlier post</a>, the astronomers of antiquity who constructed a model for the observed motion of the Sun and the planets used the same principles that we still use today. Their generic building blocks were circles, combined in the form of epicycles. The very latest variant of empirical models is machine learning models, where the generic building blocks are, for example, artificial neurons. Impressive success stories of machine learning models have led some enthusiasts to proclaim <a href="https://www.wired.com/2008/06/pb-theory/" >the end of theory</a>, but I hope to be able to convince you in the following that empirical models of any kind are the beginning, not the end, of constructing scientific theories.</p>
<p>The main problem with empirical models is that they are not that powerful. They can predict future observations from past observations, but that's all. In particular, they cannot answer what-if questions, i.e. make predictions for systems that have never been observed in the past. The epicycles of Ptolemy's model describing the motion celestial bodies cannot answer the question how the orbit of Mars would be changed by the impact of a huge asteroid, for example. Today's machine learning models are no better. Their latest major success story as I am writing this is the <a href="https://deepmind.com/blog/article/alphafold-a-solution-to-a-50-year-old-grand-challenge-in-biology" >AlphaFold predicting protein structures from their sequences</a>. This is indeed a huge step forward, as it opens the door to completely new ways of studying the folding mechanisms of proteins. It is also likely to become a powerful tool in structural biology, if it is actually made available to biologists. But it is not, as DeepMind's blog post claims, "a solution to a 50-year-old grand challenge in biology". We still do not know what the fundamental mechanisms of protein folding are, nor how they play together for each specific protein structure. And that means that we cannot answer what-if questions such as "How do changes in a protein's environment influence its fold?"</p>
<h2>Explanatory models</h2>
<p>The really big success stories of science are models of a very different kind. <em>Explanatory</em> models describe the underlying mechanisms that determine the values of observed quantities, rather than extrapolating the quantities themselves. They describe the systems being studied at a more fundamental level, allowing for a wide range of generalizations.</p>
<p>A simple explanatory model is given by the <a href="https://en.wikipedia.org/wiki/Lotka%E2%80%93Volterra_equations" >Lotka-Volterra equations</a>, also called predator-prey equations. This is a model for the time evolution of the populations of two species in a preditor-prey relation. An example is shown in this plot (Lamiot, CC BY-SA 4.0 <a href="https://creativecommons.org/licenses/by-sa/4.0">https://creativecommons.org/licenses/by-sa/4.0</a>, via Wikimedia Commons):</p>
<p><img src="https://upload.wikimedia.org/wikipedia/commons/5/5b/Milliers_fourrures_vendues_en_environ_90_ans_odum_1953_en.jpg" alt="predator-prey" width="600"/></p>
<p>An empirical model would capture the oscillations of the two curves and their correlations, for example by describing the populations as superpositions of sine waves. The Lotka-Volterra equations instead describe the interactions between the population numbers: predators and prey are born and die, but in addition predators eat prey, which reduces the number of prey in proportion to the number of predators, and contributes to a future increase in the number of predators because they can better feed their young. With that type of description, one can ask what-if questions: What if hunters shoot lots of predators? What if prey are hit by a famine, i.e. a decrease in their own source of food? In fact, the significant deviations from regular periodic change in the above plot suggests that such "outside" events are quite important in practice.</p>
<p>Back to celestial mechanics. The decisive step towards an explanatory model was made by Isaac Newton, after two important preparatory steps by Copernicus and Kepler, who put the Sun at the center, removing the need for epicycles, and described the planets' orbits more accurately as ellipses. Newton's laws of motion and gravitation fully explained these elliptical orbits and improved on them. More importantly, they showed that the fundamental laws of physics are the same on Earth and in space, a fact that may seem obvious to us today but wasn't in the 17th century. Finally, Newton's laws have permitted the elaboration of a rich theory, today called "classical mechanics", that provides several alternative forms of the basic equations (in particular <a href="https://en.wikipedia.org/wiki/Lagrangian_mechanics" >Lagrangian</a> and <a href="https://en.wikipedia.org/wiki/Hamiltonian_mechanics" >Hamiltonian</a> mechanics), plus derived principles such as the conservation of energy. As for what-if questions, Newton's laws have made it possible to send artefacts to the moon and to the other planets of the solar system, something which would have been unimaginable on the basis of Ptolemy's epicycles.</p>
<p>So far I have cited two explanatory models that take the form of differential equations, but that is not a requirement. An example from the digital age is given by <a href="https://en.wikipedia.org/wiki/Agent-based_model" >agent-based models</a>. There is, however, a formal characteristic that is shared by all explanatory models that I know, and that distinguishes them from empirical models: they take the form of specifications.</p>
<h2>Specifications and equations vs. algorithms and functions</h2>
<p>Let's look at a simple problem for illustration: sorting a list of numbers (or anything else with a well-defined order). I have a list <code>L</code>, with elements <code>L[i]</code>, <code>i=1..N</code> where <code>N</code> is the length of the list <code>L</code>. What I want is a sorted version which I will call <code>sorted(L)</code>. The <em>specification</em> for <code>sorted(L)</code> is quite simple:</p>
<ol>
<li><code>sorted(L)</code> is a list of length <code>N</code>.</li>
<li>For all elements of <code>L</code>, their multiplicities in <code>L</code> and <code>sorted(L)</code> are the same.</li>
<li>For all <code>i=1..N-1</code>, <code>sorted(L)[i] ≤ sorted(L)[i+1]</code>.</li>
</ol>
<p>Less formally: <code>sorted(L)</code> is a list with the same elements as <code>L</code>, but in the right order.</p>
<p>This specification of <code>sorted(L)</code> is complete in that there is one unique list that satisfies it. However, it does not provide much help for actually constructing that list. That is what a sorting <em>algorithm</em> provides. There are many known algorithms for sorting, and you can learn about them from <a href="https://en.wikipedia.org/wiki/Sorting_algorithm" >Wikipedia</a>, for example. What matters for my point is that (1) given the specification, it is not a trivial task to construct an algorithm, (2) given a few algorithms, it is not a trivial task to write down a common specification that they satisfy (assuming of course that it exists). And that means that specifications and algorithms provide complementary pieces of knowledge about the problem.</p>
<p>In terms of levels of abstraction, specifications are more abstract than algorithms, which in turn are more abstract than implementations. In the example of sorting, the move from specification to algorithm requires technical details to be filled in, in particular the choice of a sorting algorithm. Moving on from the algorithm to a concrete implementation involves even more technical details: the choice of a programming language, the data structures for the list and its elements, etc.</p>
<p>In the universe of continuous mathematics, the relation between equations (e.g. differential equations) and the functions that satisfy them is exactly the same as the relation between specifications and algorithms in computation. Newton's equations can thus be seen as a specification for the elliptical orbits that Kepler had described a bit earlier. Like in the case of sorting, it is not a trivial task to derive Kepler's elliptical orbits from Newton's equations, nor is it a trivial task to write down Newton's equations as the common specification of all the (approximatively) elliptical orbits in the solar system. The two views of the problem are complementary, one being closer to the observations, the other providing more insight.</p>
<p>One reason why specifications and equations are more powerful is that they are modular. Two specifications combined make up another, more detailed, specification. Two equations make up a system of equations. An example is given my Newton's very general law of motion, which is extended by his law of gravitation to make a model for celestial mechanics. The same law of motion can be combined with different laws defining forces for different situations, for example the motion of an airplane. In contrast, there is no way to deduce anything about airplanes from Kepler's elliptical planetary orbits. Functions and algorithms satisfy <em>complete</em> specifications, and conserve little information about the <em>components</em> from which this complete specification was constructed.</p>
<h2>A challenge for computational science</h2>
<p>Computational science initially used computers as a tool for applying structurally simple but laborious computational algorithms. The focus was on efficient implementations of known algorithms, later also on developing efficient algorithms for solving well-understood equations. The steps from specification to algorithm to implementation were done by hand, with little use of computational tools.</p>
<p>That was 60 years ago. Today, we have computational models that are completely unrelated to the mathematical models that go back to the 19th century. And when we do use the foundational mathematical models of physics and chemistry, we combine them with concrete systems specifications whose size and complexity requires the use of computational tools. And yet, we still focus on implementations and to a lesser degree on algorithms, neglecting specifications almost completely. For many routinely used computational tools, the implementation is the only publicly accessible artefact. The algorithms they implement are often undocumented or not referenced, and the specifications from which the algorithms were derived are not written down at all. Given how crucial the specification level of scientific models has been in the past, we can expect to gain a lot by introducing it into computational science as well.</p>
<p>To do so, we first need to develop a new appreciation for <a href="https://f1000research.com/articles/3-101/v2" >scientific models as distinct from the computational tools that implement them</a>. We then need to think about how we can actually <a href="https://peerj.com/articles/cs-158/" >introduce specification-based models into the workflows of computational science</a>. This requires designing computational tools that let us move freely between the three levels of specification, algorithm, and implementation. This is in my opinion the main challenge for computational science in the 21st century.</p>
<h2>Finally...</h2>
<p>Some readers may have recognized that the title of this post is a reference to two books, <a href="https://mitpress.mit.edu/sites/default/files/sicp/full-text/book/book.html" >Structure and Interpretation of Computer Programs</a> (with a <a href="https://sarabander.github.io/sicp/html/index.xhtml" >nice though inofficial online version</a>) and <a href="https://mitpress.mit.edu/books/structure-and-interpretation-classical-mechanics" >Structure and Interpretation of Classical Mechanics</a> (also <a href="https://tgvaughan.github.io/sicm/toc.html" >online</a>). The second one is actually somewhat related to the topic of this post: it is a textbook on classical mechanics that uses computational techniques for clarity of exposition. More importantly, both books focus on inducing a deep understanding of their topics, rather than on teaching superficial technical details. This humble blog post cannot pretend to reach that level, of course, but its goal is to spark developments that will culminate in textbooks of the same quality as its two inspirations.</p>
<h3>Comments retrieved from Disqus</h3>
<ul>
<li><i>Konrad Hinsen:</i><p>A recommended follow-up read: <a href="https://www.quora.com/What-is-declarative-programming/answer/Alan-Kay-11" rel="nofollow noopener" title="https://www.quora.com/What-is-declarative-programming/answer/Alan-Kay-11">What is declarative programming?<br></a> by Alan Kay. His "what" and "how" is almost the same distinction as "specification" vs "algorithm".</p></li>
</ul>
<!-- Local Variables: -->
<!-- mode: markdown -->
<!-- End: -->
Some comments on AlphaFold2020-12-022020-12-02Konrad Hinsen<p>Many people are asking for my opinion on the recent <a href="https://deepmind.com/blog/article/alphafold-a-solution-to-a-50-year-old-grand-challenge-in-biology" >impressive success of AlphaFold at CASP14</a>, perhaps incorrectly assuming that I am an expert on protein folding. I have actually never done any research in that field, but it's close enough to my research interests that I have closely followed the progress that has been made over the years. Rather than reply to everyone individually, here is a public version of my comments. They are based on the limited information on AlphaFold that is available today. I may come back to this post later and expand it.</p>
<!-- more -->
<p>First of all, the GDT scores obtained by AlphaFold are impressive, which is of course the reason for all the buzz at the moment. The GDT score measures how close a predicted structure is to the experimentally determined one. It is defined on a scale from 0 to 100 and can roughly be interpreted as the percentage of amino acid residues that were placed correctly. For about 2/3 of the proteins in this year's competition, AlphaFold achieved a GDT score in the 90s, whereas in the not so distant past, a score in the 70s was already considered very good. Which exact techniques were used to obtain the predicted structures is not something I can comment on: as far as I know, no technical details have been made public so far. Nor is AlphaFold a publicly available program or service that scientists could explore or apply to their own work. So all we know for now is that DeepMind, the company behind AlphaFold, has figured out a way to obtain good scores at CASP14. In the following I will assume that this is not just good luck, and that the method is applicable to a much larger class of proteins than the CASP candidates.</p>
<p>The scores obtained by AlphaFold are clearly a sign of significant progress. But does it mean that we have "a solution to a 50-year-old grand challenge in biology", as the press release claims? That depends on what exactly one considers that challenge to be.</p>
<p>If the challenge of protein folding is taken to be a purely pragmatic one, i.e. being able to predict structure from sequence, then AlphaFold is a candidate for a solution. How much of a solution will depend on further evaluations that remain to be done, on a larger range of proteins. CASP is limited to proteins for which experimental structures are (just) available. But some proteins resist experimental structure determination, for example because they have no well-defined structure at all. A robust structure prediction tool would have to identify such cases, rather than predict bogus structures. Allosteric proteins, which are proteins that can take more than one stable structure, provide another set of interesting test cases. A third case of interest is protein pairs that differ minimally in their sequence but importantly in structure. The goal of evaluating the robustness of a tool is to understand how it behaves at best, at worst, and for important edge cases, such that its users can judge the trustworthiness of its results.</p>
<p>For many scientists, including myself, having a black-box structure prediction tool is not sufficient to declare the protein folding problem solved. A solution requires an in-depth understanding of the mechanisms that determine protein structure. Whether or not AlphaFold can contribute to identifying these mechanisms is a question that scientists can only start to examine, and only if AlphaFold becomes sufficiently accessible and inspectable for critical examination by outside experts. I hope this will happen, and in fact I am optimistic that it will happen: the problem is important enough to deserve a serious effort by everyone involved. AlphaFold is not the end of the quest for a solution of the protein folding problem, but it could well turn out to be the beginning of a new chapter in the story.</p>
<!-- Local Variables: -->
<!-- mode: markdown -->
<!-- End: -->
The four possibilities of reproducible scientific computations2020-11-202020-11-20Konrad Hinsen<p>Computational reproducibility has become a topic of much debate in recent years. Often that debate is fueled by misunderstandings between scientists from different disciplines, each having different needs and priorities. Moreover, the debate is often framed in terms of specific tools and techniques, in spite of the fact that tools and techniques in computing are often short-lived. In the following, I propose to approach the question from the scientists' point of view rather than from the engineering point of view. My hope is that this point of view will lead to a more constructive discussion, and ultimately to better computational reproducibility.</p>
<!-- more -->
<p>The format of my proposal is inspired by the well-known <a href="https://www.gnu.org/philosophy/free-sw.en.html" >"four freedoms" that define Free Software</a>. The focus of reproducibility is not on legal aspects, but on technical ones, and therefore my proposal is framed in terms of <em>possibilities</em> rather than freedoms.</p>
<h2>The four essential possibilities</h2>
<p>A computation is reproducible if it offers the four essential possibilities:</p>
<ol>
<li>The possibility to inspect all the input data and all the source code that can possibly have an impact on the results.</li>
<li>The possibility to run the code on a suitable computer of one's own choice in order to verify that it indeed produces the claimed results.</li>
<li>The possibility to explore the behavior of the code, by inspecting intermediate results, by running the code with small modifications, or by subjecting it to code analysis tools.</li>
<li>The possibility to verify that published executable versions of the computation, proposed as binary files or as services, do indeed correspond to the available source code.</li>
</ol>
<p>All of these possibilities come in degrees, measured in terms of the effort required to actually do what is supposed to be possible. For example, inspecting the source code of a computation is much easier for a notebook containing the top-level code, with links to repositories of all dependencies, than for a script available from the authors on request. Moreover, the degree to which each possibility exists can strongly vary over time. A piece of software made available on an institutional Web site is easily inspectable while that site exists, but inspectability drops to zero if the Web site closes down.</p>
<p>The reproducibility profile of a computation therefore consists of four time series, each representing one of the possibilities expressed on a suitable scale with its estimated time evolution. The minimum requirement for the label "reproducible" is a non-zero degree for all four possibilities for an estimated duration of a few months, the time it takes for new work to be carefully examined by peers.</p>
<h2>Rationale</h2>
<p>The possibility to inspect all the source code is required to allow independent verification of the software's correctness, and in particular to check that it does what its documentation claims it does.</p>
<p>The possibility to run the code is required to allow independent verification of the results.</p>
<p>The possibility to explore the behavior of the code is a <em>de facto</em> requirement to fully accomplish the goals of the first possibility. For all but the most trivial pieces of software, inspection of the source code is not enough to convince oneself that it does what it is claimed to do.</p>
<p>The possibility of verifying the correspondence of source code and executable versions is motivated by the complexity of today's software build procedures. Mistakes can as easily be introduced in the build process as in the source code itself. This point is well made by Ken Thompson's Turing Award speech <a href="https://www.cs.cmu.edu/~rdriley/487/papers/Thompson_1984_ReflectionsonTrustingTrust.pdf" >Reflections on Trusting Trust</a>, if you replace mischief by mistake in his arguments.</p>
<h2>Discussion in the context of the state of the art</h2>
<p>The possibility to inspect all the source code is a criterion that is in principle widely accepted, although many people fail to realize its wide-ranging consequences. "All the source code that can possibly have an impact on the results" actually means a <em>lot</em> of software. It includes many libraries, but also language implementations such as compilers and interpreters. Moreover, inspecting a dependency first of all requires precisely identifying it. This remains a difficult task today, and therefore most published computations today do not offer the first essential possibility, no matter how much effort a reader is willing to invest.</p>
<p>It is tempting to introduce another degree of compliance by requiring that only the most relevant parts of the total source code be inspectable. However, that defies the whole purpose of independent verification. Who decides what it relevant? Usually the author of the computation. But if the code declared to be irrelevant by the author is not inspectable, we have to take the author's word for its irrelevance.</p>
<p>The possibility to run the code is also a widely accepted criterion, though not everyone accepts the additional requirement of executability "on a suitable computer of one's own choice". Software made available as a service (e.g. in the cloud) is considered sufficient for reproducibility by some researchers. Executability is much more susceptible to decay over time than inspectability of the source code, and this is one of the main topics of debate today. Is long-term reproducibility needed? Is it achievable? The answers vary across disciplines. There is unfortunately a strong tendency to auto-censoring here: many scientists believe that long-term reproducibility is not realistic and <em>therefore</em> should not be asked for. This is definitely not true and it is better to frame the question as a trade-off: what is a reasonable price to pay for long-term reproducibility, in a given discipline?</p>
<p>The possibility to explore the behavior of the code is rarely mentioned in discussions of reproducibility. And in fact, exploring the behavior of non-trivial code written by someone else is such a difficult task that many scientists prefer not to require anyone to do it. I am not aware of any scientific journal that expects reviewers of submitted work to check the code of any computation for correctness or at least plausible correctness, which in practice requires examining its behavior. And yet, the scientific method requires <em>everything</em> to be inquirable. It may not be a realistic expectation today, but it should at least be a goal for the future.</p>
<p>Since code explorability is rarely required or even discussed, there is no clear profile of practical implementations either. It's a criterion that requires expert judgement, the expert being a fellow researcher from the same discipline as the author of a computation. It is the software analog of a "well-written" paper, which is a paper that a reader can easily "get into".</p>
<p>The possibility of verifying the correspondence of source code and executable versions is also rarely mentioned. It is also the least fundamental one of the four essential possibilities, because in principle it can be abandoned if a computation is fully reproducible from source code. In practice, however, that is rarely a realistic option. The size and complexity of today's software assemblies makes it impractical to re-build everything from source code, a process that can take many hours. Nearly all software assemblies we run in scientific computing contain some components obtained in pre-built binary form. While it is perfectly OK for most people, most of the time, to use such pre-built binaries, inquirability requires the possibility to check that these binaries really correspond to the source code that the authors of a computation claim to have used. This is a possibility where a low degree can be quite acceptable.</p>
<h2>Please comment!</h2>
<p>As I said, the goal of this blog post is to start a discussion. Your comments are valuable, possibly more so than the post itself. How important are the four possibilities in your own discipline? How well can they be realized within the current state of the art? Are there additional possibilities you consider important for reproducibility?</p>
<p>Check also the comments on Twitter by exploring the replies to <a href="https://twitter.com/khinsen/status/1329832546474061824" >this tweet</a>.</p>
<h2>Notes added after publication</h2>
<h3>2020-11-22</h3>
<p><a href="https://twitter.com/jermdemo/status/1329866889867059200" >Jeremy Leipzig</a> points out
<a href="https://icerm.brown.edu/topical_workshops/tw12-5-rcem/icerm_report.pdf" >the 2012 ICERM workshop document</a>, whose appendix A discusses several levels of reproducibility. Its last level ("open or reproducible research") covers in a general way the four possibilities I discuss above. The lower levels describe research output in which at least one of the four possibilities is not provided.</p>
<h3>2020-11-23</h3>
<p><a href="https://twitter.com/ivotron/status/1329873600472621057" >Ivo Jimenez</a> refers to <a href="https://www.niso.org/standards-committees/reproducibility-badging" >ongoing work</a> at NISO (National Information Standards Organization, USA) to define recommended practices, and <a href="https://twitter.com/npch/status/1330453823568171008" >Neil Chue Hong</a> says they will be out soon.</p>
<p><a href="https://twitter.com/ivotron/status/1330612647763570690" >Ivo Jimenez</a> also mentions an interesting collection of <a href="https://sysartifacts.github.io/" >resources on artifact evaluation for computer systems conferences</a>.</p>
<h3>Comments retrieved from Disqus</h3>
<ul>
<li><i>Roberto Di Cosmo:</i><p>Thanks for this nice post: I like the classification, and I love the acknowledgment of the difficulty to have a "one size fits all" solution when it comes to reproducibility, as the dimension of the problem and the resources available to address it really vary a lot across disciplines, and even inside discipline. A nice example of a <i>"scientific journal that expects reviewers of submitted work to check the code of any computation for correctness or at least plausible correctness, which in practice requires examining its behavior"</i> is Image Processing OnLine (<a href="https://ipol.im" rel="nofollow noopener" title="https://ipol.im">https://ipol.im</a>) that goes a long way along the road to reproducibility.</p><ul>
<li><i>Konrad Hinsen:</i><p>Thanks for mentioning IPOL! I haven't been able to find reviewing guidelines on their Web site, but I will contact the team to find out what exactly their reviewing process evaluates.</p></li>
</ul>
</li>
<li><i>Nicolas Rougier:</i><p>In terms of code interactivity, I find the <a href="https://distill.pub/" rel="nofollow noopener" title="https://distill.pub/">https://distill.pub/</a> journal to be really good even though I imagine it's a lot of work for authors. But it's really nice to be able to play with the model. In my own domain (computational neuroscience) I dream of having really interactive model where you can test what happens if you modify this or that parameter or simply change the random seed. I suspect this wont't come anytime soon since most journals do not even really care about the code, but who knows.</p><ul>
<li><i>Konrad Hinsen:</i><p>Thanks for that nice example, which illustrates possibility #3: the possibility to explore how a computation works. Much of the work by Bret Victor (<a href="http://worrydream.com/)" rel="nofollow noopener" title="http://worrydream.com/)">http://worrydream.com/)</a> is similar to the <a href="http://distill.pub" rel="nofollow noopener" title="distill.pub">distill.pub</a> you cite. But as you say, these are very much examples of hand-crafted presentation software, and thus require a huge investment by the authors. Making such presentations more accessible should be one priority in method and tool development. Jupyter widgets are one step in that direction.</p></li>
</ul>
</li>
</ul>
<!-- Local Variables: -->
<!-- mode: markdown -->
<!-- End: -->