<?xml version="1.0"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"> <channel> <title>Konrad Hinsen&#039;s blog</title> <link>https://blog.khinsen.net/</link> <atom:link href="https://blog.khinsen.net//rss.xml" rel="self" type="application/rss+xml" /> <language>en-us</language> <pubDate>2026-05-19T15:59:14.556064+02:00</pubDate> <item> <title>Automating science</title> <link>https://blog.khinsen.net/posts/2026/05/19/automating-science.html</link> <pubDate>2026-05-19</pubDate> <author>Konrad Hinsen</author> <guid isPermaLink="true">https://blog.khinsen.net/posts/2026/05/19/automating-science.html</guid> <category><![CDATA[ science ]]></category> <description><![CDATA[ <p>The advent of AI agents based on large language models (LLMs) has put the idea of automating the intellectual and cognitive work of researchers on the table. A lively, sometimes even heated discussion is already going on. A frequently missing piece in this debate is the question why we, individually and as a society, actually do science. I will examine this question first, and then consider what it implies for introducing automation into science.</p>

<!-- more -->

<h2>Science</h2>

<p>First of all, what <em>is</em> science? Any short definition is necessarily a caricature, but I hope that the following caricature is at least a useful one: science is a collective process that aims at accumulating reliable knowledge about the world we live in, emphasizing doubt and epistemic humility in order to counterbalance human cognitive biases. In other words: science proceeds with the assumption that anything can turn out to be wrong, and that the default answer to any question is &quot;we don't know&quot;. Everything we think we know can be questioned and revised in the light of new evidence or new critical examination.</p>

<p>Next, why do we do science? This depends very much on who is that &quot;we&quot;. Science started in the 16th century as a leisure activity by wealthy or sponsored people to satisfy their curiosity. Nowadays, science is funded by governments in order to support economic growth and policy decisions, which is a much more utilitarian stance. And yet, indivdual researchers are still largely motivated by curiosity. But curiosity and utilitarianism are not as distinct as they may seem. From an evolutionary perspective, it makes sense for organisms living in a complex  world to use spare resources for acquiring knowledge that may be useful in an uncertain future. Curiosity is thus a trait that helps making people and societies more robust.</p>

<p>Scientific institutions have the role of maintaining and extending the collective knowledge base, which is a complex network of interconnected pieces of information. There is knowledge about the world we live in, of course, but also knowledge about how to make observations and how to interpret their results, plus theories and knowledge about these theories, and a lot more. And all that knowledge comes with a judgement of its reliability attached to it, which is important because the ultimate goal of science is obtaining reliable knowledge.</p>

<p>If you imagine the collective knowledge base as a huge library full of books and journals, or as a collection of Web sites that look like Wikipedia, you are missing an important piece. Information archives are important for science, but they cannot capture everything required to interpret this information. The archives contain marks on paper, or bit patterns for the digital ones. The procedural knowledge required to make sense of these information snippets, and to relate them to the world we live in, is embodied in practicing scientists. You cannot learn chemistry from reading chemistry books alone. At some point, you have to manipulate chemicals, touch them, smell them, mix them, and see what happens. More generally, if you want to learn everything required to understand and contribute meaningfully to the chemistry literature, you need to work as an apprentice to an experienced chemist. If for some reason all chemists die, the books and Web sites will become unintellegible. This is not abstract theory. We have written documents from the past that nobody can read any more because nobody living today knows the language and writing system used by the authors.</p>

<p>Most popular narratives about science concentrate on the task of extending the collective knowledge base, by making new observations about the world or new theories and models explaining such observations. Maintenance gets a lot less attention. It consists essentially of two processes: training the next generation of scientists, and re-examining existing knowledge, in the light of new information or new ideas for representing this knowledge. A modern textbook on classical mechanics looks very different from Isaac Newton's &quot;Principia Mathematica&quot;, but it describes the same theoretical framework. What has changed is the notation and the presentation, making the material easier to understand and apply, and easier to integrate with other theories in physics but also in other fields using the same mathematical notation. And the more a theory is applied, integrated, and tested, the more reliable it becomes. The best evidence for the reliability of Newton's mechanics (when applied within its limits of applicability) is the fact that it underlies a huge part of the technology we use every day. Centuries of refinement have turned Newton's intellectual exercise into knowledge that we can rely on. Maintenance matters!</p>

<p>Not all scientific knowledge has been revised and applied for centuries. How then do we judge its reliability? That's an important question that is not examined often enough, in particular in the ongoing AI-for-science debate. An early and still relevant technique is double checking. If multiple researchers do similar work and obtain similar results, their results strengthen each other's reliability. And if the results disagree, the causes of the differences can be explored systematically. The simple version of double-checking that I have described here works only for studies of simple systems, where &quot;similar work&quot; and &quot;similar results&quot; are well-defined concepts. But the idea can be extended to more complex systems, where one would examine the coherence of findings from a large number of individual studies.</p>

<h2>Trust</h2>

<p>But there remains an important condition: a judgement of reliability requires a detailed understanding of all the studies involved. Nobody can have that level of competence in more than one or two narrow domains. And yet, everybody doing research needs to rely on results from other domains and disciplines. A biologist performing data analysis is rarely a trained statistician, for example. And a physicist performing numerical simulations is rarely a trained numericist. All researchers nowadays need to trust the reliability judgements of experts in other domains. And that's also what decision makers in politics and industry do in order to figure out which scientific findings they should turn into plans for action.</p>

<p>Human societies rely on webs of trust, because trust is the foundation for cooperation. In today's industrial societies, this web links together individuals, institutions, ideas, technologies, and physical objects, via numerous mechanisms such as reputation, certification, accountability, or punishment by law. Consider why you trust the train that you take to work every morning to transport you safely. Your trust builds on a trust in the engineers who designed the train, scientific findings that the engineers relied on, the workers that built the train, laws that define safety-related obligations, government agencies that oversee the respect of these obligations, and a lot more.</p>

<p>A large part of the boring grunt work of parlaments and government agencies is maintaining this web of trust, in contact with the scientific web of trust, among others. The web of trust behind train safety has grown over centuries, since long before the first railways were built. A society's web of trust is a big part of its <a href="https://en.wikipedia.org/wiki/Social_capital" >social capital</a>.</p>

<p>Digital technology remains a challenge for the web of trust, because it evolves much faster than traditional trust mechanisms can adapt. The one technology that is almost completely exempt from legal and contractual obligations concerning safety and reliability is software. Major perturbations such as the <a href="https://en.wikipedia.org/wiki/2024_CrowdStrike-related_IT_outages" >CrowdStrike incident</a> have contributed to a growing awareness about this problem, but so far nothing much has changed at the legal level. Software vendors are not sanctioned for negligence, nor even for intentional malice (such as <a href="https://en.wikipedia.org/wiki/Grok_sexual_deepfake_scandal" >Grok producing deepfake porn</a>).</p>

<p>In science, digital technologies have likewise been adopted enthusiastically and uncritically. The publication and quality control process, which has been based on journal publications and peer review since about the 1960s, is no longer adequate for today's research work, which due to the support by digital technology now features large collaborations, big datasets, and complex computational analyses. The <a href="https://en.wikipedia.org/wiki/Replication_crisis" >replication crisis</a> is to a large part the result of this mismatch between the imagined value of peer review as a quality control mechanism and its real value as a rough credibility check. As with the safety issues I mentioned above, we are only starting to understand and correct for this evolution. And while we are grappling with these issues, LLMs are causing another earthquake in the foundations of the scientific web of trust.</p>

<h2>Automation</h2>

<p>To what degree can science possibly be automated? Let's start with the highest imaginable level: fully automated science. That would be a machine that supplies supposedly reliable knowledge via some sort of interface, perhaps a supercharged chatbot. You could ask the machine a question, and it would enter into a dialog to request additional input from you, before in the end giving you an answer. This answer could well be &quot;I don't know yet, ask again a month from now while I do some more research&quot;. Obviously this machine would have to be more than a bunch of computers. It would have to interact with the real world, making observations, setting up experiments, etc. Think of a network of computers and robots if you want a concrete image.</p>

<p>Would you trust such a machine to provide reliable answers?</p>

<p>Would you agree on having the machine do experiments on you? Would you trust its affirmation that these experiments are in your best interest?</p>

<p>For most of us, answering such questions comes down to trusting others who we perceive as experts or authorities, or who are involved in designing or operating the machine. What would be the profile of an expert whose affirmations about the machine's reliability you would trust? Which institution would you trust to issue a certification for the science machine?</p>

<p>If you have some expertise in science or engineering yourself, you might want to start by inspecting the processes that led to the creation of the machine. That's a good start. You might end up becoming one of those experts that the rest of us rely on. But if the machine will do all future science, then there won't be human scientists left a few decades later. And maybe no engineers either. So... who will take over your job as an expert? Why would your grandchildren trust the machine? And who will keep the machine running? It can't look after itself, as a living organism does.</p>

<p>The good news is that nobody talking about automating science actually proposes this extreme level of automation. The bad news is the obvious conclusion that many people who propose automating science are unaware of many of the aspects of the process they wish to automate. My proposal: when discussing automation, always say explicitly where you see the interface between machines and humans. It's always there, somewhere. As long as there are humans interested in accumulating reliable knowledge, there will be a science process run by humans, who delegate specific tasks to machines. As we have been doing for quite a while already, e.g. when using <a href="https://en.wikipedia.org/wiki/DNA_sequencer" >DNA sequencers</a>, or when deploying software on a computer. Automation, in science and elsewhere, has been with us for a few centuries, since the beginning of industrialization.</p>

<p>There are three main motivations for automating a task, as compared to have humans perform it:</p>

<ul>
<li><p><em>Economy.</em> Machines make many things cheaper than humans do, at least in our current economic model that ignores externalities such as resource depletion and environmental pollution. Often the machines produce less useful or less versatile products, but at a so much lower price that the trade-off looks favorable. As an example, consider buying an industrially produced chair as compared to making a chair yourself, or having one tailor-made by a craftsperson.</p></li>
<li><p><em>Quality.</em> Machines do a better job at producing certain items. Staying in the carpentry theme, consider nails. Humans have made and used nails since pre-historical times, but with the arrival of industrial-made nails, human-made nails have disappeared. Machines do a better job at tasks that require high precision. They make nails that are both better <em>and</em> cheaper.</p></li>
<li><p><em>Complexity.</em> Some artefacts are so complex that industrial production is the only viable option. Consider a modern car with its mechanical and electronic complexity. I doubt that anyone has ever even tried to make such a car using nothing but human labor.</p></li>
</ul>

<p>In the current debate on automating science, the only motivation I see cited is economy: LLMs would allow us to do more science given the same number of people and the same resources. Most proponents of LLMs for science (e.g. <a href="https://sakana.ai/ai-scientist/" >this one</a>, to give a concrete example) conveniently gloss over what &quot;more science&quot; actually means. They use the same bibliometric proxies whose <a href="https://sfdora.org/" >inadequacy for research assessment is finally being recognized</a>: more science means more papers. Some largely LLM-written papers have already been accepted in scientific journals, so the claim that LLMs can write papers that can pass peer review is credible. However, &quot;passing peer review&quot; is not the same as &quot;useful contributions to science&quot;. In other words, the problem is not so much LLMs as an outdated quality assessment process from the 1950s that has not kept up with the enormous changes in research over the last 70 years.</p>

<p>If we want to update our quality assessment, the question we should focus on is: how can we assess the reliability of knowledge that we obtain with the help of LLMs? Again this is not a new question. It's a question that we have asked about every single scientific instrument or experimental setup since the dawn of science. The goal is not to eliminate unreliable information sources. They often contribute useful information, and in some cases, such as in the beginnings of a new field of research, all available information may be of low reliability. That's fine, as lack of reliability can be compensated by diversity and coherence. The sum of many information sources is often more reliable than any single one on its own. But it does matter that we can estimate the reliability of each information source. Which, for experimental setups, we usually can.</p>

<p>It is much harder to estimate the reliability of computed information, due to the complexity of software. And so... like society at large (see the last section), when it comes to software, scientists have mostly suspended the doubt that used to be their trademark. In parallel to developing computational methods, we should have developed processes for establishing trust in them, but we didn't. Only with the arrival of LLMs we realized that establishing trust is an important and difficult problem. Well, better late than never. Let's start. My contributions so far are <a href="https://hal.science/hal-05274018v1" >this opinion piece</a> about reviewing research software, and <a href="https://osf.io/preprints/metaarxiv/nt96q_v2" >this preprint</a> that analyzes the reviewability of software and AI tools. Unfortunately, in the meantime we will have to deal with the ongoing massive assault of our journals by LLM-generated submissions, most of which are likely to be of bad quality.</p>

<p>My prediction is that, once the excitement about &quot;automating science&quot; has died off, we will forget this idea and concentrate on using LLMs under human supervision for well-defined tasks in which they have proven to be useful, reliable, and cost effective. The last part is rarely discussed, but it's important to keep in mind that today's AI operators run at a huge loss in order to encourage massive adoption of their product. They won't do this forever, so prices will increase, while research budgets for non-AI topics are diminishing in many countries. Nevertheless, LLMs could turn out to be a good trade-off for specific tasks in software development, in data analysis, or in the presentation of results.</p>

<p>However, establishing responsible LLM use in research is possible only if researchers can try out and evaluate these tools before committing to their use and making themselves dependent on them. This cannot happen in a few weeks. It cannot happen either in the current strongly polarized atmosphere where people are divided into two opposing camps, one crying &quot;Only AI can save us!&quot; and the other one replying &quot;AI is the devil!&quot; More than anything else, we need to remember the self-doubting attitude inherent to science, and admit that anyone's views on LLMs may need a revision.</p>

<p>It is also dubious if responsible use is possible at all with today's generation of LLMs. In addition to the ethical issues, which I will address in the next section, there is a contradiction between the complete opacity of these models and the transparency requirements of science. This means you can use them only if you can audit the results in some way, as you can in some software development settings. It doesn't help that the companies that control these models openly support a government that is actively destroying scientific institutions. There is absolutely no reason to trust these companies to support science; at best, we can hope that they completely ignore research applications in developing their tools. The minimum condition for LLMs that are safe for science would be a disclosure of the training data set and the tweaks that happen after the ingestion of the training data. There are various projects to construct LLMs under such conditions, but they don't seem to be ready for practical applications.</p>

<h2>Taking a step back</h2>

<p>In the last section, I have looked at automation in science by LLMs with a narrow focus on the topic. That means I have taken into account the properties of science as far as automation is concerned, and the properties of LLMs as far as they relate to scientific research. I have <em>not</em> taken into account other characteristics of science and LLMs. This narrow-focus view is our culture's default way of analyzing things. It keeps complexity to a minimum, which is helpful. But it also hides potentially relevant aspects from view. Which is why I will now adopt a wider focus: taking into account more aspects, though necessarily in less detail.</p>

<p>Science doesn't happen in a monastery located on a remote island. It is embedded in industrial societies, where it has ties to philosophy, politics, education, industry, and a lot more. LLMs didn't fall from the sky. They were developed by people inside organizations, tied to philosophy, politics, education, industry, and a lot more. LLMs are deployed on physical machines that need to be designed, built, and operated. All that means that adoption of LLMs in science implies also bringing some of the LLM context into science. And if science one day becomes a major user of LLMs, in terms of quantity or prestige, the reverse will happen as well.</p>

<p>There are two major criticisms of today's LLMs that derive from this wide-focus view:</p>

<ul>
<li><p>The impact of LLM use on the natural environment, via their enormous resource requirements.</p></li>
<li><p>The process by which LLMs were created: much of the training material was used without permission from its authors, and the unpleasant human labor involved (screening for atrocities) was outsourced to people who are too poor to be able to refuse the job.</p></li>
</ul>

<p>Many scientists reject LLMs for these reasons, much like they reject experiments on animals or excessive air travel to conferences: not because LLMs are bad for science, but because they are bad for entities outside of science, be they humans, non-human organisms, or the entire biosphere. Expressed in the jargon of economics, LLM use has significant negative <a href="https://en.wikipedia.org/wiki/Externality" >externalities</a>.</p>

<p>Negative externalities are of course not specific to LLMs. Much of what we do in daily life (and even more so in scientific research) comes with negative externalities that we prefer not to think about because we cannot really do much about them. Climate change is the most visible one: the metabolism of our societies runs on fossil fuels, without which we couldn't even feed the human population of the planet, let alone provide the material security and comfort that we are used to. The ubiquity of negative externalities in our lives is probably the main reason why so many people, including scientists, do not see a particular issue with the negative externalities of LLMs. It's just one more item on the long list of negative externalities that we accept in order to go on with life. In this light, LLM rejection is comparable to <a href="https://en.wikipedia.org/wiki/Veganism" >veganism</a> or <a href="https://en.wikipedia.org/wiki/Flight_shame" >flight shame</a>: a conscious rejection of social norms in order to make at least a first step away from industrial societies' path towards ever increasing resource consumption and exploitation of other living beings.</p>

<p>Can these ethical issues with LLMs be overcome? In theory, yes. There are ideas for eliminating every single one. Resource consumption can be reduced by designing more efficient hardware. Less generalist LLMs could be trained with less training material, which could then be gathered with permission of its authors. Screening for atrocities becomes a non-issue when training domain-specific LLMs for science. The whole training process can be made transparent. Unfortunately, in the current economic context, none of this is likely to happen. And even in the best imaginable scenario, it would take years to decades to develop ethical LLMs for science.</p>

<p>Such a delay, however, is inacceptable to those putting forward ethical arguments <em>for</em> LLM use: an acceleration of knowledge acquisition in ethically relevant domains such as health research. What if LLMs can help us cure cancer more rapidly? Is it ethically defendable <em>not</em> to do this? Most of these arguments are fallacious. Whereas the ethical arguments <em>against</em> LLMs are based on real observed negative externalities, the ethical arguments <em>for</em> LLM use that I have seen so far are based on speculation about hypothetical benefits. I have not seen anyone outline a credible path to an accelerated development of cancer treatments with the help of LLMs. The best you can say is that it is not logically impossible. My suspicion is that proponents of such arguments severely underestimate what it takes to develop cancer treatments. Experiments and clinical trials take a lot of time, which is not compressible by computation of any kind. And never forget the trust issue: in the end, a practicing oncologist must trust new treatments before they can actually make a difference.</p>

<p>There are, however, quite probably some less sensational contexts in which LLM use does speed up research that has credible societal benefits. And therefore the argument &quot;my LLM use is ethically justifiable because the benefits outweigh the negative externalities&quot; cannot be rejected in general. However, so far I haven't seen any attempt to estimate such a trade-off, let alone the combination of net ethical benefit and reliable outcomes.</p>

<h2>And the verdict is...</h2>

<p>Let me end this post with my personal conclusion: I do not use LLMs for any aspect of my scientific research. Not for writing articles (nor blog posts), not for writing software, not for information retrieval, not for anything else. My research has always been more methodological than applied, and over the years it has moved more and more towards foundational questions such as reproducibility, in the topic space of metascience and philosophy of science. I consider these topics important, but not urgent. They don't justify contributing to massive harm elsewhere, nor putting quantity above quality - quite on the contrary!</p>

<p>What I haven't made up my mind about yet is the use of LLM-written software. LLM use in software developments comes in many shades, from code cleanup to 100% vibe coding. The latter is incompatible with the transparency requirements of science anyway, except for code snippets small enough to be audited by humans. My provisional policy is to take a critical look at LLM-supported software before adopting it. Yes, that's vague, but the only way I see to refine my policy is through practice, and that takes time! What I will not do, however, is completely reject LLM use by others. That would imply no longer collaborating with many of my colleagues, and that's a bad idea: nobody has anything to gain from the scientific community splitting into pro-AI and anti-AI camps.</p>

<h2>Recommended reading</h2>

<ul>
<li><p><a href="https://irisvanrooijcogsci.com/2025/08/12/ai-slop-and-the-destruction-of-knowledge/" >AI slop and the destruction of knowledge</a> by cognitive scientist Iris van Rooij. She illustrates with a concrete example what happens when LLM-generated erroneous information is incorporated unchecked into formerly trusted scientific knowledge repositories. This is happening in many places right now, and it is not clear how we will ever manage to clean up these knowledge repositories again, assuming we even decide to do it.</p></li>
<li><p><a href="https://www.nature.com/articles/d41586-026-01100-y" >Scientists invented a fake disease. AI told people it was real</a> by science journalist Chris Stokel-Walker illustrates the other side of the knowledge destruction process: fake information clearly marked as such is nevertheless absorbed by LLMs and contributes to their output.</p></li>
<li><p><a href="https://www.nature.com/articles/d41586-025-03504-8" >How much of the scientific literature is generated by AI?</a> by science journalist Miryam Naddaf. A report on the magnitude of LLM use in article preparation, and why it is quite difficult to estimate.</p></li>
<li><p><a href="https://www.science.org/content/article/ai-agents-may-be-skilled-researchers-not-always-honest-ones" >AI agents may be skilled researchers—but not always honest ones</a> by science journalist Nicola Jones. Another story about AI agents that are intended to automate aspects of research but do so unreliably.</p></li>
<li><p><a href="https://artificialbureaucracy.substack.com/p/context-widows" >Context Widows</a> by Kevin Baker. He explains in much more detail than I did above how the goal displacement from &quot;quality contribution to science&quot; to &quot;citation metrics&quot; in the 1960s and 1970s prepared the ground for an exploitation of the new goals via LLMs, while the initial goal of quality contributions to science is silently abandoned.</p></li>
<li><p><a href="https://www.popularbydesign.org/p/academics-need-to-wake-up-on-ai-part-4c6" >Academics Need to Wake Up on AI, Part III</a> by sociologist Alexander Kustov. His key point is that today's LLMs can produce research papers that are no worse than many human-written ones that pass peer review. He concludes that in the context of today's incentives and funding criteria, academics cannot afford not to use LLMs without losing out to their competitors who do. This confirms Kevin Baker's point about goal displacement.</p></li>
<li><p><a href="https://bsky.app/profile/mjcrockett.bsky.social/post/3mkuqwkk7ls2d" >A BlueSky thread</a> by cognitive scientist Molly Crocket, pointing out the disequilibrium in science funding that prioritizes AI development while defunding everything else. Quote: &quot;We risk all of science if we rush to build “AI Scientists”, before we understand the value of human science.&quot;</p></li>
<li><p><a href="https://arxiv.org/abs/2602.10181v1" >Why do we do astrophysics?</a> by astrophysicist David W. Hogg. He argues that for non-utilitarian fields such as his own, automating research work makes no sense because the main value of the research is not the findings but the maintenance of the research community. Many of his arguments are of interest to utilitarian perspectives as well, so this is worth reading even if you care only about &quot;useful&quot; science.</p></li>
<li><p><a href="https://davidbessis.substack.com/p/the-fall-of-the-theorem-economy" >The fall of the theorem economy</a> by mathematician David Bessis. His main point is that the value of theorem proving in mathematics is not the catalog of proven theorems but the insight gained from coming up with the proof. Theorem proving by LLMs doesn't provide this value. He predicts that the increasing use of LLMs will lead to a shift of evaluation criteria, away from valuing proofs as a proxy for the work that went into constructing them. Similar arguments can be made in other disciplines, e.g. theoretical physics.</p></li>
<li><p><a href="https://ergosphere.blog/posts/the-machines-are-fine/" >The machines are fine. I'm worried about us.</a> by physicist and mathematician Minas Karamanis. He worries about the consequences of students using LLMs to speed up their PhD work. The students don't learn much about research, and the supervisors could well be tempted to use LLMs directly and stop bothering with students. In either case, we lose the next generation of scientists able to do research, with or without LLMs.</p></li>
<li><p><a href="https://www.digitalcultureandeducation.com/volume-162" >Against the uncritical adoption of ‘AI’ technologies in academia</a> by a multidisciplinary team of researchers. A very detailed and well-documented analysis of the numerous issues that makes many past and present AI technologies problematic in research and education.</p></li>
</ul>
 ]]></description> </item><item> <title>Preparing for scientific deepfakes</title> <link>https://blog.khinsen.net/posts/2026/03/06/scientific-deepfakes.html</link> <pubDate>2026-03-06</pubDate> <author>Konrad Hinsen</author> <guid isPermaLink="true">https://blog.khinsen.net/posts/2026/03/06/scientific-deepfakes.html</guid> <category><![CDATA[ computer-aided research ]]></category><category><![CDATA[ reproducible research ]]></category> <description><![CDATA[ <p>By now, most scientists have probably seen figures, tables, and even entire journal articles made by so-called &quot;generative AI&quot;, containing more or less subtle mistakes or inconsistencies. What I haven't seen yet, but expect to see soon, is the scientific equivalent of deepfakes: made-up results that come with made-up code that reproduces them. This is likely to become a new challenge for reproducible research.</p>

<!-- more -->

<p>Like just about any other community that depends strongly on code, scientific computing is increasingly polarized concerning the use of &quot;generative AI&quot;, here meaning <a href="https://en.wikipedia.org/wiki/Large_language_model" >large language models</a> (LLMs) generating code from natural-language prompts. The two opposing camps are <em>LLM enthusiasts</em>, who believe that scientists should embrace these tools in order to reduce the human effort spent on software development and get more research done, and <em>LLM skeptics</em>, who doubt the reliability of vibe-coded software, dislike its black-box nature, and fear for the credibility of scientific findings. Many researchers also have ethical objections to the use of LLMs, but they are mostly unrelated to the quality of vibe-coded software. The discussion between the two camps, and the people in between who remain undecided, focuses on community-developed software libraries that are maintained over many years. What I will discuss in the following is the impact of LLM-based coding agents on the top layer of the scientific software stack, i.e. the highly <a href="https://gwern.net/doc/technology/2004-03-30-shirky-situatedsoftware.html" >situated software</a> that computes results for a specific research project, such as the figures and tables that are presented in a journal article.</p>

<p>Among the many differences between community-owned tools and libraries on one hand, and project-specific code on the other hand, the most relevant here is the different means of <a href="https://osf.io/preprints/metaarxiv/nt96q_v2" >establishing trust</a>. Community-level software performs well-defined wide-spectrum tasks that have some form of specification outside of the software, be it in its documentation or in theoretical papers that describe the implemented methods. At least in principle, such software can be evaluated even if it is a black box, for example via a test suite. For project-level software, there usually is no source of ground truth that could be used to verify the obtained results. The code transforms project-specific inputs, e.g. experimental data, into project-specific outputs, e.g. plots. There is most often no way to run it unchanged on other, well-known inputs, in order to check if it reproduces the corresponding well-known outputs.</p>

<p>One of the fundamental tenets of the reproducible research movement is that such code (and its inputs) must always be published along with the results, in order to permit verification of the latter. And this actually happens more and more often. Papers are complemented by scripts, notebooks, or workflows that interested readers can examine and re-run to see if they find the same results. However, the code and input data are considered &quot;supplementary material&quot;, and with few exceptions not subjected to any form of review (see <a href="https://blog.khinsen.net/posts/2025/04/11/reviewing-software.html" >my earlier post on this topic</a> for a more in-depth discussion).</p>

<p>This project-specific code is a very tempting target for vibe coding. Much of its actual work is delegated to libraries, and today's coding agents know popular libraries very well. Better, in fact, than the average researcher. Why bother to look up API details, if your LLM assistant knows how to use it? And when used in good faith for simple problems of data analysis and presentation, the resulting code is typically short and quite readable, fulfulling its intended role as executable documentation of the research project.</p>

<p>But suppose that good faith is lacking. There are researchers who make up data, with or without the help of generative AI. Maybe they want to back up a made-up claim with false but credible reasoning, or maybe they just want some publishable result without regard for its veracity. In either case, generative AI will fulfill their wishes, producing not only the fake results but also the code that generates them. And all it takes to cover up is a more elaborate prompt. You can ask a coding agent to make a modified version of a respected library that returns the desired fake results, and then write an unremarkable script that calls this library. You then ask your coding agent to write a <a href="https://guix.gnu.org/" >Guix</a> recipe for building and running your code, using the modified library. That's what I call a scientific deepfake: a software assembly that looks good superficially and works flawlessly, but does something nasty in the depths of the software stack that nobody is likely to look at. If you read Ken Thompson's <a href="https://doi.org/10.1145/358198.358210" >Reflections on trusting trust</a>, you will see that the nastiness can be pushed down arbitrarily far into the software stack. With ever increasing effort, but LLM coding agents will make this ever more doable in practice.</p>

<p>What this means for reproducible research is that automated checks for reproducibility become meaningless. In the pre-LLM era, a carefully constructed software assembly that reproduced a figure on the click of a button was valuable evidence for building trust. In the vibe coding era, it means nothing. Whatever criteria an automated check applies, it is possible to vibe-code a software stack that satisfies all of them while also producing a predefined result. Even a check for suspect modifications deep in the software stack is impossible to do reliably, since patches may well be done for fixing bugs or improving performance. The quality of being suspect is not formalizable. Only a human reviewer can judge if a modification is legit or not.</p>

<p>If scientific deepfakes become widespread, we might even see a reversal of today's best-practice attitudes. A complex software assembly that just works but is too large to be inspected may be seen as suspect, whereas a short script whose outputs are robust under version changes of its dependencies may become the new gold standard for credibility.</p>
 ]]></description> </item><item> <title>Explorable explorable explanations</title> <link>https://blog.khinsen.net/posts/2025/11/12/explorable-explorable-explanations.html</link> <pubDate>2025-11-12</pubDate> <author>Konrad Hinsen</author> <guid isPermaLink="true">https://blog.khinsen.net/posts/2025/11/12/explorable-explorable-explanations.html</guid> <category><![CDATA[ computer-aided research ]]></category> <description><![CDATA[ <p>A much cited essay by Bret Victor, <a href="https://worrydream.com/ExplorableExplanations/" >&quot;Explorable Explanations&quot;</a>, argues for supporting and encouraging <em>active reading</em> in communicating ideas. Explanatory text should thus be complemented by interactive visualizations and computational demonstrations, allowing the reader to actively engage with the ideas. If you haven't read Victor's essay yet, please do so now, and then come back here. It's not very long. What I am going to discuss is a variation on Victor's proposal, and I won't repeat his well-presented arguments.</p>

<!-- more -->

<p>The starting point for my own work on this topic was the observation that scientific communication, traditionally centered around journal articles, is exactly of the kind that would benefit from more interactivity and potential for engagement, including <em>critical</em> engagement, as described by Victor. That observation has been made many times by many people in many contexts. Anyone who has worked with interactive data management or visualization tools on a computer sooner or later comes to the conclusion that we could do a lot better today than exchange information on printed pages (or their electronic equivalent, the PDF file). Many have proposed <a href="https://en.wikipedia.org/wiki/Notebook_interface" >computational notebooks</a>, such as the popular <a href="https://en.wikipedia.org/wiki/Project_Jupyter#Jupyter_Notebook" >Jupyter notebooks</a>, as a more interactive update to the traditional article. Some notebook-based tools, more notably <a href="https://quarto.org/" >Quarto</a>, are even explicitly advertised as publishing technology. I have <a href="https://doi.org/10.59350/w7hbc-twt65" >expressed my disagreement before</a>, and proposed <a href="https://doi.org/10.59350/646qp-jv671" >some</a> <a href="https://doi.org/10.59350/ead50-57j29" >ideas</a> for improving on notebooks. Now is the time to follow up with a working prototype.</p>

<p>Bret Victor's explorable explanations are a much more promising approach, in my opinion, because they focus on guiding the reader towards exploring the most salient ideas, rather than forcing them to follow the logic of a sequential computer program from top to bottom. Moreoever, their interactive elements are designed for non-destructive experimentation,
whereas a notebook encourages its readers to explore by <em>modifying</em> the author's work, and thus easily losing track of what the author's had written initially.</p>

<p>However, Victor's examples, implemented as Web pages embedding interactive components written in JavaScript, have some serious shortcomings as well when seen through the lens of scientific communication (which was not his use case). The most important one is the opacity of the interactive components. Consider his &quot;digital filter&quot; example. The interactive component is a great way to understand the relation between inputs, outputs, and filter parameters. But you can't easily see the computations done behind the scenes. You have to trust the author to have implemented the equations governing the filter correctly in JavaScript code. Unless you are a seasoned Web developer, in which case you will look at the HTML source code in your navigator, go through the list of script files, find and look up <a href="https://worrydream.com/ExplorableExplanations/Script/filter.js" ><tt>filter.js</tt></a>, and figure out how that code works and how it relates to the equations it implements. For a more complex computation using canned library code, this approach becomes completely hopeless. Just like for a notebook, of course. Both notebooks and explorable explanations are <em>shallow</em>, meaning that they make a surface layer of computation very accessible but don't provide access to the code that they build on.</p>

<p>For scientific communication, we need to take one more step. We need <em>explorable</em> explorable explanations. The explanation itself, including its interactive elements, must be explorable as well. The source code of all visualizations, animations, or interactive tools must be accessible in a straightforward way. Readers must be able to dig as far down into the software stack as they deem necessary for their understanding.</p>

<p>This has been the core objective of my <a href="https://hyperdoc.khinsen.net/" >HyperDoc</a> project, which I have <a href="https://doi.org/10.59350/g0hyh-qzx40" >written about before</a>. For an example of an explorable explanation in HyperDoc, see <a href="https://hyperdoc.khinsen.net/94FE4-micrograd" >Micrograd</a>, an explanation of neural networks built on top of an automatic differentiation engine. Unlike Bret Victor's examples, an explorable explanation in HyperDoc is <a href="https://en.wikipedia.org/wiki/Hypertext" ><em>hypertext</em></a>, i.e. it consists of smaller interlinked nodes. These nodes can be narratives, i.e. text with embedded images, tables, code, etc., like for example <a href="https://hyperdoc.khinsen.net/94FE4-micrograd/5756A-automatic-differentiation" >this one</a>. But the hypertext nodes can also be code pages, such as <a href="https://hyperdoc.khinsen.net/94FE4-micrograd/12A2A-reverse-mode-automatic-differe" >this one</a>, which is perfectly ordinary code written in <a href="https://en.wikipedia.org/wiki/Common_Lisp" >Common Lisp</a> <small>(if you wonder about this somewhat unusual choice of programming language for scientific computing, see <a href="https://hyperdoc.khinsen.net/693AE-hyperdoc/B1DAF-goals-and-values" >HyperDoc's goals and values</a>)</small>. Interactive tools are another kind of node, as for example <a href="https://hyperdoc.khinsen.net/94FE4-micrograd/8499D-moons-dataset-explorer" >this one</a>, which lets you explore the impact of the two parameters of a synthetic dataset. Moreover, any piece of computed data can be a node in this hypertext as well, as for example <a href="https://hyperdoc.khinsen.net/94FE4-micrograd/1FB3A-classifier-model" >this multi-layer perceptron</a> (click on the individual neurons to see their parameters).</p>

<p>There is also something similar to a notebook: the <a href="https://hyperdoc.khinsen.net/94FE4-micrograd/B499A-neural-network-training-playgr" >playground</a>. It is basically a text editor in which you can write code and comments, and execute one or several expressions by clicking the &quot;Eval&quot; button, or typing Shift-Enter. The result is then shown in a new pane to the right. As the author of the HyperDoc, I have provided initial playground code that serves as a guide for exploration. You can always go back to that code via the &quot;Reset&quot; button, but otherwise, your changes are persistent on your computer, so you can come back to it later. Unfortunately, code execution in the playground is disabled on the Web server, for security reasons: it would let anyone execute arbitrary code, potentially breaking the server, storing illicit material, run bitcoin mining code, etc. Until I get around to implementing a sandbox, you will have to <a href="https://codeberg.org/khinsen/hyperdoc-demo" >install everything on your own machine</a> to use the playground.</p>

<p>Another important feature of HyperDoc is alt-clicking to see &quot;behind the scenes&quot;, which usually means &quot;see the source code&quot;. On the <a href="https://hyperdoc.khinsen.net/94FE4-micrograd/1FB3A-classifier-model" >multi-layer perceptron</a>, alt-click the tab labelled &quot;Layers&quot; to see how the &quot;Layers&quot; view is implemented. <small>You can also do that with the &quot;Graph&quot; tab for the &quot;Graph&quot; view, but that will show you a rather complicated piece of code, since getting <a href="https://graphviz.org/" >Graphviz</a> to arrange nodes exactly as you want can be a rather messy affair.</small> Once you have a piece of code before your eyes, move the mouse cursor around. Whenever some code element is shown with a grey background when you hover over it, clicking on it will show you where it comes from. You can navigate the entire codebase in this way, including libraries. Symbols shown in bold are part of the Common Lisp standard. If you click on them, you get to the corresponding page in the language standard. There is a lot more you can usefully alt-click, so go ahead and play with it!</p>

<p>The relatively small hypertext nodes that are shown side-by-side as you navigate through the links support <em>deep</em> active exploration. When a narrative points to a dataset, for example, you can explore the dataset interactively while still having the explanation in the narrative in view. Note that you <em>can</em> also embed a dataset or a code snippet inside a narrative, if you consider that the best way to present your work. I do it for small code snippets in some places. But unlike with notebooks, or with Bret Victor's explainable explanations, this is only one way to present things, and it's neither the default nor the most straightforward way.</p>

<p>There is a second, less obvious advantage to structuring the presentation in terms of small freely interlinkable nodes: modularity, reusability, and comprehensibility of implementation. Web pages are usually written for a potentially large number of anonymous readers. Most of these readers engage only superficially with the material. Scientific communication is different. Readers are peers. They read my work and I read theirs. Author-reader relations are not always individually symmetric, of course, but overall, everyone is both author and reader. Providing straightforward access to all the code, nicely structured into small units, encourages readers to learn not only about your subject, but also learn from your coding and presentation techniques. Given favorable legal conditions, readers that stumble upon a nice visual presentation can copy the code and adapt it for their own use in their own work.</p>

<p>This explains the title of this post: I want explorable explanations to be meta-explorable. I want them to be <a href="https://en.wikipedia.org/wiki/Tools_for_Conviviality" >convivial technology</a> for scientific computing, rather than a technology controlled by a small group of experts. My current prototype doesn't achieve that goal yet, for various reasons that I will discuss in a separate post, but I believe that it already shows that convivial scientific computing is not just a pipedream.</p>
 ]]></description> </item><item> <title>Explaining software and computational methods</title> <link>https://blog.khinsen.net/posts/2025/06/25/hyperdoc.html</link> <pubDate>2025-06-25</pubDate> <author>Konrad Hinsen</author> <guid isPermaLink="true">https://blog.khinsen.net/posts/2025/06/25/hyperdoc.html</guid> <category><![CDATA[ computational science ]]></category> <description><![CDATA[ <p>How can we document software and computational analyses in such a way that others can convince themselves of their validity, and build on them for their own work? The question has been around for many years, and a number of attempts have been made to provide partial answers. This post provides a brief review and describes my own tentative answer, inviting you to play with it.</p>

<!-- more -->

<p><a href="https://en.wikipedia.org/wiki/Explainable_artificial_intelligence" >Explainable AI</a> is a hot topic today. Many people, in particular in research, are worried about the rapid adoption of AI-based techniques that provide <a href="https://quoteinvestigator.com/2024/10/22/question-answer/" >answers that cannot be questioned</a>, because nobody is able to figure out how the AI's output is derived from its huge inputs. However, most researchers seem to overlook that we have exactly the same problem with traditional software. Even if your full software stack is Open Source, its size and complexity make it unrealistic for anyone to compile a list of all the models, hypotheses, and assumptions that its various authors baked into it. In thirty years of practicing computational science, I have seen quite a few unreasonable assumptions in source code, sometimes even pointed out in comments, but out of sight of the vast majority of users who run the code but never look at it. This is why I advocate that <a href="https://blog.khinsen.net/posts/2025/04/11/reviewing-software.html" >we should peer-review research software</a>.</p>

<p>However, reviewing software source code is envisageable only of it is actually written to be understandable by fellow researchers, who are mostly not software professionals. We won't get reviewable code by simply adopting software engineering methods from the software industry, as we have mostly done over the last decades. Fortunately, people have started thinking about explainable and understandable software more than 40 years ago.</p>

<h2>Literate programming</h2>

<p>The well-known computer science textbook <a href="https://mitpress.mit.edu/9780262510875/structure-and-interpretation-of-computer-programs/" >Structure and Interpretation of Computer Programs</a> (see <a href="http://sarabander.github.io/sicp/" >here</a> for a free online version) states one of its goals in the preface to the first edition, which was published in 1984:</p>

<blockquote>
<p>First, we want to establish the idea that a computer language is not
just a way of getting a computer to perform operations but rather
that it is a novel formal medium for expressing ideas about
methodology. Thus, programs must be written for people to read, and
only incidentally for machines to execute.</p>
</blockquote>

<p>One can argue that the authors achieved that goal, because the code they present is indeed very readable. However, it consists of textbook examples, and its target audience is computer science students. Writing a large software system in such a way that its <em>users</em> could understand it is a much more challenging task, which the book does not address.</p>

<p>That same year, Donald Knuth <a href="https://doi.org/10.1093/comjnl/27.2.97" >introduced the idea of literate programming</a>. He summed up the motivation in one sentence:</p>

<blockquote>
<p>Let us change our traditional attitude to the construction of
programs: Instead of imagining that our main task is to instruct a
<em>computer</em> what to do, let us concentrate rather on
explaining to <em>human beings</em> what we want a computer to do.</p>
</blockquote>

<p>Knuth developed the <a href="https://en.wikipedia.org/wiki/Web_(programming_system)" >WEB programming system</a> to support this new style of writing software, and applied it himself, most notably to his well-known and widely used programs <a href="https://www.bookfinder.com/search/?isbn=978-0201134377&submitBtn=Search&mode=isbn&st=sr&ac=qr" >TeX</a> and <a href="https://isbndb.com/book/0201134381" >Metafont</a>, but also <a href="https://www-cs-faculty.stanford.edu/~knuth/programs.html" >many smaller and more specialized ones</a>. TeX and Metafont are not particularly large software systems by today's standards, but they are not textbook examples either. One feature of literate programming as advocated by Knuth is writing code and narrative together, in a single file, and use tools to extract these two aspects for further processing. Writing literate code imposes a new writing style for code, and that is an important part of the exercise.</p>

<p>Many literate programming tools have been created after WEB, but few of them were ever adopted by anyone else than their authors. Perhaps the most widely used one today is the <a href="https://orgmode.org/worg/org-contrib/babel/intro.html" >Babel</a> package that is part of <a href="https://orgmode.org/" >Org Mode</a> for <a href="https://www.gnu.org/software/emacs/" >Emacs</a>. It offers many features that go beyond Knuth's original concept, so it is hard to say if its users really apply literate programming as intended by Knuth.</p>

<p>A major limitation of literate programming is the restriction to static artifacts, i.e. program source code and documentation. The code is never executed and need not even be executable. This limits the scope of exploration. For any but the most trivial software, developing a good understanding of its operation requires running examples on various inputs and inspecting the outputs, in addition to reading the code. Traditional literate programming tools do not prevent this in any way, but they do not support it either.</p>

<h2>Computational notebooks</h2>

<p>Just a few years after literate programming, and possibly inspired by it, the first <a href="https://en.wikipedia.org/wiki/Notebook_interface" >notebook interfaces</a> appeared in two commercial software packages: <a href="https://www.mathcad.com/" >MathCAD</a> and <a href="https://www.wolfram.com/mathematica/" >Mathematica</a>. Many similar interfaces followed, out of which the most popular one today is <a href="https://jupyter.org/" >Jupyter</a>, which goes back to 2011 (then under the name &quot;IPython notebook&quot;). While many people consider literate programming and notebook interfaces roughly the same, there are few important differences. Most of all, literate programming is technology for <em>publishing</em> documented software, whereas notebooks are primarily interactive interfaces for <em>performing</em> explorative computations. They are in fact an extension of the much older <a href="https://en.wikipedia.org/wiki/Read%E2%80%93eval%E2%80%93print_loop" >read-eval-print loop</a>, to which they add documentation, preservation of outputs, and usually data visualization. Another important difference is that literate programming can document arbitrary units of software but no input or result data, whereas notebooks document a specific computation, i.e. a linear chain of expressions or commands, including their inputs, outputs, and intermediate results.</p>

<p>The restriction to a linear narrative embedding a linear sequence of code snippets is the main limitation of notebooks. They cannot explain complex software assemblies that would require multiple indepenendent narratives. And they don't given access to the library code that is called from the notebook's code snippets. If you see <code>print(median(data))</code> in a notebook written by your former student who you remember for always confusuing mean and median in your statistics class, you would probably like to click on <code>median</code> and see the code behind it, but that's not part of a typical notebook interface.</p>

<p>As the name suggests, notebooks were inspired by the <a href="https://en.wikipedia.org/wiki/Lab_notebook" >lab notebooks</a> that experimentalists use to keep track of what they did and what they observed, and they fulfill a similar role for doing computer-aided research. Neither kind of notebook is intended for publication in the traditional academic sense, although both can be shared publicly in the context of <a href="https://en.wikipedia.org/wiki/Open-notebook_science" >open-notebook science</a>. Computational notebooks are sometimes advertised as publishing technology, and even as the successor of the traditional scientific article (see <a href="https://www.theatlantic.com/science/archive/2018/04/the-scientific-paper-is-obsolete/556676/" >here</a> for example). And there are publishing systems that take notebooks as inputs, e.g. <a href="https://quarto.org/" >Quarto</a>. The term &quot;notebook&quot; is probably overused today, meaning different things to different people. What matters for my purposes is the distinction between a log of a research activity and a publication written for explaining research to someone else. I am interested in the latter, which is not well supported by most of today's notebook variants.</p>

<h2>Explorable explanations</h2>

<p>A different take on the question of how to make computational techniques understandable is Bret Victor's essay &quot;<a href="https://worrydream.com/ExplorableExplanations/" >Explorable Explanations</a>&quot;. It focuses on interactive exploration of a computational model by the reader of a pedagogical narrative. Technically, it's a narrative with embedded interactive programs. The code of the latter is usually hidden, it's not the topic of the explanation. Many notebook interfaces nowadays also provide interactive elements for readers, e.g. Jupyter's <a href="https://jupyter.org/widgets" >widgets</a>, which work the same way: they provide functionality, but their implementation is opaque to the user, unlike the code in the notebook itself.</p>

<p>There are in general three categories of code in computational explanations:</p>

<ol>
<li>Code that you want the reader to look at, and maybe run.</li>
<li>Code that you want the reader to run, and maybe look at.</li>
<li>Lower-level support code.</li>
</ol>

<p>Notebooks focus on the first kind, explorable explanations focus on the second kind. Neither one makes an effort to let you access the third kind.</p>

<h2>Explainable software systems</h2>

<p>Finally, there has been some work in software engineering on making software systems as a whole explainable to non-developers (see <a href="https://doi.org/10.1109/VISSOFT55257.2022.00009" >here</a> for an example). This work is about complex software systems, which cannot be described by any single narrative. An explainable software system combines multiple techniques and technologies to achieve its goal: narratives with embedded computations, like a notebook, but also interactive elements, like an explorable explanation, plus views on data or on processes that are automatically generated from a working system, rather than hand-crafted separately as part of documentation. This is the most comprehensive approach. It also removes the distinction between author and reader. Everyone experiences the system through the same interface, though different people typically focus on different parts of it.</p>

<p>After a few years of experience with <a href="https://gtoolkit.com/" >Glamorous Toolkit</a>, which is the software platform underlying the work cited above, I am convinced that this is a good approach for making computational work explainable (for the author) and explorable (for everyone else). It removes the main limitation of notebooks, where explanation is restricted to a single top-level narrative. It also removes the opacity of the code from explorable explanations, because the implementation of interactive elements is easily accessible. From a bird's eyes view, you have a large codebase and any number of narratives that can refer to code, transclude code, and embed the results of running code, in addition to embedding other media, such as video clips. That's a very expressive medium for communcating computational ideas. It can be seen as a variant of <a href="https://en.wikipedia.org/wiki/Hypermedia" >hypermedia</a> that focuses on code. Crafting high-quality explanations is still a challenging task, but then, that is true for writing high-quality pedagogical material in traditional media (such as books) as well.</p>

<p>However, I also came to the conclusion that Glamorous Toolkit is not a good medium for communicating computational science. It is great for communication within an organization, among a relative small number of people who know each other and work on a common project at the same time. We have this kind of collaboration in research, of course, but we also have the long-term loose-knit open and anonymous collaboration that happens in a scientific community, involving hundreds of people, spread all over the globe, working on a challenging topic for several decades. An important communication concept in this mode of collaboration is the <em>publication</em>, which traditionally is a narrative published in a journal and then available for everyone to read and build on. In the digital era, this requires an update, which so far has not really happened. Publications moved from printed pages to PDF files, but we still struggle to integrate digital datasets, software, and interactive exploration tools.</p>

<p>Glamorous Toolkit is a comprehensive solution for all that's missing from the traditional article, but it lacks the concept of a publication, i.e. a <a href="https://en.wikipedia.org/wiki/Version_of_record" >version of record</a> that everyone can consult, and that everyone can refer to, knowing that whoever follows the reference will access exactly the same information. Publication adds two requirements to an explainable software system. One is durability: a publication must remain usable and present itself identically over a few decades, which is the time scale of evolution of most scientific disciplines. The second one is an asymmetry between authors and explorers (formerly called readers). Researchers consult at least 100 times more publications than they co-author. Exploring a publication must be easy, with no technical obstacles such as software installation. You cannot expect the same level of engagement from an explorer of a publication as from a collaborator in a team. You can, on the other hand, expect authors to invest more effort into polishing their work for the benefit of many anonymous explorers.</p>

<h2>HyperDoc</h2>

<p>In today's computing environments, the only technical choice for computational publications is making them available on a Web server, for access with a standard Web browser, because a Web browser is the only piece of software that you can reasonably expect every explorer to have available. That leaves a lot of options for designing and implementing such a publication system. Exploring them, and developing experience with authoring computational publications in such a setting, is the goal of my latest project: <a href="https://hyperdoc.khinsen.net/" >HyperDoc</a>. The link points to a server that runs a demo with a few small publications: two software packages whose documentation is based on HyperDoc, and two computational publications, one of which (<a href="https://hyperdoc.khinsen.net/5B51C-acute-respiratory-infection-in" >a tiny data paper</a>) being cited and reused by the other one (<a href="https://hyperdoc.khinsen.net/377F4-influenza-epidemics-in-france" >a simple data analysis</a>).</p>

<p>The user interface is heavily inspired by Glamorous Toolkit. The browser shows a lineup of panes, each of which represents an object in memory, for which there can be any number of views that are accessible via the tabs. Hypertext pages and code pages are just particular types of objects, with primary views that render them for reading. Visualizations and interactive elements are implemented as views on classes of data objects. By holding the alt key when clicking on the tab for a view, you get to see the source code rather than the output of the view. That's how all code is instantly inspectable without cluttering the interface.</p>

<p>The basic publishing unit is called a HyperDoc. It has <em>text pages</em> (written in HTML or Markdown) and <em>code pages</em>, which are plain source code files. Code pages are rendered with some useful decoration, most notably links from function and class names to the source code that defines these items, but also links to text pages. Text pages can contain links to other pages, but also links to the value of a computed expression, which is how you link to e.g. visualizations. Instead of linking, you can also embed (or, if you prefer fancier jargon, <a href="https://en.wikipedia.org/wiki/Transclusion" >transclude</a>) other views.</p>

<p>The source code for a HyperDoc typically resides in a version-controlled repository. See <a href="https://codeberg.org/khinsen/ari-incidence-france" >here</a> for an example. The code is written in Common Lisp (more on that later), and the whole repository is a standard Common Lisp code repository that should look familiar to any Common Lisp programmer. The only particularity is that it also contains hypertext pages, and needs to conform to a few conventions. Having a HyperDoc represented as a source code repository means that installation, distribution, and archiving are taken care of by existing infrastructure for software. Software citation mechanisms such as <a href="https://codemeta.github.io/" >CodeMeta</a> are applicable as well: the &quot;Authors&quot; view on a HyperDoc shows author information from a CodeMeta file.</p>

<p>One important element for exploration is unfortunately missing from the online demo: the &quot;Playground&quot; view that permits explorers to run arbitrary code in the context of a data object, e.g. to inspect data by running code snippets. Allowing the execution of arbitrary code on a server is a major security risk, which is why the playground is disabled. You can, however, install everything on your own computer (e.g. via <a href="https://guix.gnu.org/" >Guix</a>, or <a href="https://www.quicklisp.org/beta/" >Quicklisp</a> plus <a href="https://ultralisp.org/" >Ultralisp</a>) and then use the playground. A safe public playground requires <a href="https://en.wikipedia.org/wiki/Sandbox_(computer_security)" >sandboxing</a>, which isn't implemented yet.</p>

<h2>Common Lisp</h2>

<p>As I already mentioned, HyperDoc as well as all the demo code is written in <a href="https://common-lisp.net/" >Common Lisp</a>. It is one of the very few programming languages that fulfills my two primary criteria:</p>

<ol>
<li>Strong support for code introspection, making it possible e.g. to access
 the source code of a function or a class.</li>
<li>A stable language definition and library ecosystem, for durability.</li>
</ol>

<p>Criterion 1 is about simplicity of implementation for HyperDoc. It is possible to implement HyperDoc in or for static languages such as Fortran or C++, but the effort would be much higher. For a single-person exploratory project, it looks prohibitive. If HyperDoc turns out to be attractive to enough people, more elaborate implementations become doable.</p>

<p>Criterion 2 is about suitability for scientific publication. Technically, HyperDoc would be  almost as easy to implement in Python as in Lisp. But Python and its scientific library ecosystem have undergone too many breaking changes in recent years to be able to support a long-term publication infrastructure. You can always <em>run</em> old Python code in a snapshotted environment. That's how <a href="https://blog.khinsen.net/posts/2025/06/20/computational-reproducibility.html" >reproducibility</a> works. But being a good citizen of scientific publishing also requires that people can <em>build on</em> code and data from a ten-year-old publication. That requires both old and new HyperDocs to run on top of a shared collection of libraries. I <a href="https://github.com/activepapers/activepapers-python" >gave up on the Python edition of ActivePapers</a> for a reason. Otherwise, HyperDoc would have become a user interface layer for ActivePapers.</p>

<h2>Next steps</h2>

<p>The <a href="https://hyperdoc.khinsen.net/" >HyperDoc demo</a> contains four very small and simple HyperDocs. The challenge I have set myself for the next phase of the project is to implement examples that people would actually want to explore, i.e. the HyperDoc equivalents of journal articles, monographs, or textbooks. This might include integrating <a href="https://leibniz.khinsen.net/" >Leibniz</a>, my equally experimental Digital Scientific Notation.</p>

<p>If you want to help with this project, the most useful contribution right now would be feedback on the four HyperDocs in the online demo. I am well aware of one difficulty: few people reading this post are likely to be familiar with the Common Lisp language. I made the choice <em>not</em> to explain Common Lisp in these HyperDocs. Just like a journal article tacitly supposes prior knowledge of English and math notation, my HyperDocs tacitly assume some  knowledge of its computational notation. Otherwise, they would be more of a Common Lisp tutorial than explanations of computational data analyses. My hope is that the careful choice of names makes the code at least superficially comprehensible. Again, that's something I'd appreciate feedback on.</p>

<p>The best place for feedback are the issue trackers of the code repositories: <a href="https://codeberg.org/khinsen/hyperdoc" >HyperDoc</a> itself and the three other demo HyperDocs:</p>

<ul>
<li>the <a href="https://codeberg.org/khinsen/tabular-data" >Tabular data</a> library</li>
<li>the data paper &quot;<a href="https://codeberg.org/khinsen/ari-incidence-france" >Acute respiratory infections in France</a>&quot;</li>
<li>the analysis paper &quot;<a href="https://codeberg.org/khinsen/influenza-epidemics-in-france" >Influenza epidemics in France</a>&quot;</li>
</ul>
 ]]></description> </item><item> <title>Why computational reproducibility matters</title> <link>https://blog.khinsen.net/posts/2025/06/20/computational-reproducibility.html</link> <pubDate>2025-06-20</pubDate> <author>Konrad Hinsen</author> <guid isPermaLink="true">https://blog.khinsen.net/posts/2025/06/20/computational-reproducibility.html</guid> <category><![CDATA[ reproducible research ]]></category><category><![CDATA[ computational science ]]></category> <description><![CDATA[ <p>Thirty years after my first contact with computational (ir)reproducibility, I am happy to note that many things have improved. Reproducibility, computational and otherwise, is increasingly recognized as an important aspect of scientific quality control, and mostly considered worth striving for. However, I also note that more and more people, including reproducibility activists, have lost contact with the day-to-day reality in which reproducibility matters. Reproducibility is becoming an item on a checklist, and its precise incarnation the subject of political bickering aimed at making it easy to check off that item. So let's take a look at <em>why</em> computational reproducibility matters for researchers.</p>

<!-- more -->

<p>If you have read papers from the field of cryptography, you probably know <a href="https://en.wikipedia.org/wiki/Alice_and_Bob" >Alice and Bob</a>. They used to invest a lot of effort into communicating privately, making sure that nobody else could listen to them. But then, Alice and Bob discovered Open Science, and now they talk to each other in public spaces, in addition to publishing all their data and code of course. And now we know that they are researchers in  computational biophysics!</p>

<p>Let's listen to Alice and Bob as they meet at a conference.</p>

<p><em>Alice:</em> I have computed the equilibrium distance between the ligand and the active site of our pet protein. <strong>It's 0.9 nm.</strong></p>

<p><em>Bob</em>: I have computed the same distance, but <strong>I find 1.1 nm.</strong></p>

<p>For any Open Science practitioner, the next step should be obvious.</p>

<p><em>Alice:</em> Uhhh... Well... I will look at your code, and you look at mine. Let's meet again tomorrow.</p>

<p><em>Bob:</em> OK!</p>

<p>The next day, they meet again.</p>

<p><em>Alice:</em> I couldn't compile your code. Look at this error message!</p>

<p><em>Bob:</em> It works for me! You use Debian 12? I still run Debian 9. That's surely what makes the difference. But I also have good news: I managed to run your code on my machine. The only problem is that... I get <strong>0.8 nm.</strong></p>

<p><em>Alice:</em> I use <code>libode</code> version 3.4. The documentation says it must be
compiled with <code>gcc 10</code> or later. You probably have an older <code>gcc</code>.</p>

<p><em>Bob:</em> Uhhh... Well... I will have to install a virtual machine with Debian 12, and you with Debian 9. Shall we meet again in a week?</p>

<p><em>Alice:</em> OK!</p>

<p>A week later, no solution is in sight.</p>

<p><em>Alice:</em> Under Debian 9, I managed to run your code. I get 1.1 nm, like you do. But I don't understand why! <strong>Your code is unreadable.</strong></p>

<p><em>Bob:</em> Under Debian 12, <strong>your code yields 0.85 nm for me.</strong> That's not your value of 0.9 nm. Nor 1.1 nm as I get using my method. I don't understand why!</p>

<p>That's what real life looks like. Whatever reproducibility policies you advocate, they aren't worth much unless they open a way for Alice and Bob to figure out why they get different numbers. Reproducibility needs to support effective debugging.</p>

<p>In terms of the increasingly consensual <a href="https://www.nationalacademies.org/our-work/reproducibility-and-replicability-in-science" >terminology defined by the US National Academies of Science, Engineering, and Medicine</a>, Alice and Bob have two issues to resolve:</p>

<ol>
<li>Bob cannot <em>reproduce</em> Alice's value of 0.9 nm. He finds 0.85 nm instead, in spite of running the same code in what he believes to be the same computational environment: Debian 12.</li>
<li>Neither Alice nor Bob can <em>replicate</em> the other's finding using their own code. Alice finds 0.9 nm, Bob finds 1.1 nm.</li>
</ol>

<p>Now let's consider two popular recipes for reproducibility:</p>

<ol>
<li><p><em>Just use <a href="https://www.docker.com/" >Docker</a>.</em> Both Alice and Bob should package their code plus environment as a container image. They could then exchange their images, and each check that they retrieve the other's value. Problem 1 is solved. However, the investigation of problem 2 becomes nearly impossible. The code inside the container images is a black box. Exploring point 2 would require reading, recompiling, and modifying the code. That's not possible if all you have is a container image.</p></li>
<li><p><em>Just use <a href="https://docs.conda.io/en/latest/" >conda</a>.</em> For some reason, many people seem to believe that conda has some magical capability to solve reproducibility issues - see <a href="https://doi.org/10.1038/d41586-023-01469-0" >this report</a> for example. In reality, it's not fundamentally different from Debian's package manager (and many others), and in practice it is worse because conda users tend to focus on fast-moving bleeding-edge code whereas Debian package maintainers place a high value on stability. If Alice and Bob used conda, they would probably both fail at even compiling the other's code in their own environments, and also fail at trying to reproduce the other's environment.</p></li>
</ol>

<p>Neither simple recipe is really helpful. Both containerization and package managers are useful tools in the quest for reproducibility, but there is no &quot;just use...&quot; that really works. A big reason is that neither container nor packaging tools were developed with reproducibility in mind. They are by design <em>deployment</em> technologies, meant to facilitate the task of installing, updating, and running software on a computer. Which is an important task of course, but it's not reproducibility.</p>

<p>Let's tackle the question from the other end: what would Alice and Bob need for debugging?</p>

<p>To investigate the reproducibility issue, they need to understand why the same code, compiled and run on two different installations of Debian 12, yields different results. Alternatively, to eliminate the reproducibility issue, they would need a more precise way to describe and reconstruct an environment than the label &quot;Debian 12&quot;. And to understand their replicability issue, they need their own code plus as much as possible of its dependencies to be understandable, and easy to modify and run, because exploring replicability invariably leads to tinkering with each other's code.</p>

<p>The fundamental reason for the reproducibility issue is that &quot;Debian 12&quot; is not a precise specification for a computational environment. Packages get constantly updated in between two Debian releases. To make it worse, the order in which packages are installed can also make a difference. If you want to understand why, <a href="https://www.fun-mooc.fr/en/courses/reproducible-research-ii-practices-and-tools-for-managing-comput/" >follow the MOOC</a>, in particular its second module, &quot;Managing software&quot;.</p>

<p>If Alice had provided a precisely specified computational environment, rather than just saying &quot;Debian 12&quot;, then the reproducibility issue in the above story would simply disappear. Bob would have run Alice's environment, confirmed the result of 0.9 nm, and then they would have moved on to their replicability issue, which is scientifically more relevant. Reproducibility issues are nothing more than a nuisance. They are about results being different due to differences in computational environments that most computer users have no control over, and don't really care about until they have to.</p>

<p>So how do you provide a precisely specified computational environment? One possible answer is a container image. But as I explained above, if all you have is a container image, you can't explore replicability issues any more. What you want is a precisely specified <em>and reproducible</em> environment. Meaning that you <em>can</em> run it as-is, and get <em>exactly</em> the same results, bit for bit, but you can <em>also</em> change what's inside and see what difference that makes.</p>

<p>My experience from many talks, courses, and informal discussions I have had over the last years on the topic of reproducibility is that as soon as I say &quot;bit for bit&quot;, a resistance forms in parts of the audience. Just over the last month, I have been called a &quot;reproducibility extremist&quot; twice. The counter-argument I hear is always the same: we don't really <em>need</em> bit-for-bit identical results. The tacit assumption is that going for a less strict goal, somewhere in between &quot;Debian 12&quot; and a precise specification, would be both easier and sufficient. And that's where I disagree. Something less rigorous may or may not be sufficient in a specific situation. But it is definitely not <em>easier</em>.</p>

<p>In fact, I have no idea how Alice could come up with a specification for her computational environment that would promise Bob a result between, say, 0.88 nm and 0.92 nm. I <em>do</em> know, however, how Alice can provide a bit-for-bit reproducible specification. It's actually very easy in theory. Computers are deterministic machines. If you give them the same inputs, they produce the same outputs. Bit-for-bit reproducibility is a matter of bookkeeping: keeping track precisely of all the steps that were performed in constructing a computational environment. And bookkeeping is something that computers are quite good at.  In contrast, good-enough-for-me reproducibility can only be evaluated through case-by-case domain expertise, and cannot be designed at all.</p>

<p>Unfortunately, what is easy in theory turns out to be difficult in practice. The computational infrastructure that we all use, i.e. our operating systems, package managers, compilers and other software build tools, containerization tools, etc., was not designed with reproducibility in mind. Outside of science, approximately nobody cares about reproducibility. The one exception I know of is cybersecurity experts, who require reproducibility (bit-for-bit again) as a guarantee that a compiled program was really derived from the source code that it pretends to be generated from, without the secret addition of malware. Most computer users need no more in terms of software management than installing and updating programs. That's what today's infrastructure makes easy.</p>

<p>Nevertheless, it is possible to make a computational environment bit-for-bit reproducible, at least when run on identical hardware. If you want to learn how to do it, <a href="https://www.fun-mooc.fr/en/courses/reproducible-research-ii-practices-and-tools-for-managing-comput/" >follow the MOOC</a>. We show how to do it using <a href="https://snapshot.debian.org/" >Debian snapshots</a>, staying in the context of a well-known and widely used Linux distribution. We also show how to do it using <a href="https://guix.gnu.org/" >Guix</a>, a next-generation package manager (and Linux distribution) that is one of the very few tools that <em>was</em> designed with reproducibility in mind. Neither path is as simple as I would like it to be, because they both involve tools that lack user-friendly interfaces for now. It's bit like using <code>git</code> for version control: it does the job, but it can be a pain to use. More work is clearly required. But it will only happen if larger parts of the scientific community agree that it is worth doing, and direct funding towards this task.</p>

<p>My conclusion is that bit-for-bit reproducibility is something that we can solve once and for all and push into the infrastructure, such that Alice and Bob needn't worry about it any more. If instead we go for good-enough-for-me reproducibility, it will remain a nuisance for generations of scientsts to come. If that is an extremist opinion, so be it.</p>
 ]]></description> </item><item> <title>Why we should review research software</title> <link>https://blog.khinsen.net/posts/2025/04/11/reviewing-software.html</link> <pubDate>2025-04-11</pubDate> <author>Konrad Hinsen</author> <guid isPermaLink="true">https://blog.khinsen.net/posts/2025/04/11/reviewing-software.html</guid> <category><![CDATA[ computational science ]]></category> <description><![CDATA[ <p>At the recent <a href="https://scicodes.github.io/workshop-2025/#mini-symposium" >SciCodes Symposium</a>, I brought up the question of reviewing research software during the panel discussion. One panelist then raised the question of <em>why</em> we should review research software. I found this question surprising at first, but I do agree that it deserves an answer. Here is mine.</p>

<!-- more -->

<p>My goal with bringing up the question was to learn out the state of the art: which institutions, publishers but possibly also others, encourage, enforce, or conduct scientific or technical reviews of research software? The answer turned out to be &quot;very few&quot;. There are evaluations of research software, but they are mostly restricted to checking if it is published, if it can be compiled, if it reproduces published results, or if it follows Open Source best practices. The question of whether it does what it claims to do, and whether what it does is scientifically relevant, is rarely asked to reviewers.</p>

<p>Why is this state of the art deplorable? I see two reasons for reviewing research software:</p>

<ol>
<li><p>For the same reason that we review papers: to detect mistakes, biases,
and tacit assumptions that should be made explicit.</p></li>
<li><p>In order to incentivize software authors to write their software in such a way
that is is understandable by an outsider to the development process.</p></li>
</ol>

<p>It's the first reason that I had considered obvious, but apparently it isn't. An important aspect of the scientific method is peer criticism, i.e. the critical inspection of everyone's work by their competent peers. This doesn't strictly require a formal reviewing process. The only strict requirement is to make all material available for inspection. However, formal reviewing processes have proven valuable in many fields of science and engineering. The label &quot;peer reviewed&quot; has justly been considered a reason to trust some publications more than others. Peer review has been criticized a lot recently, and I agree that the <em>specific</em> processes that we have used since the 1950s to review publications is no longer adequate, but that means that we should update those processes, not abolish them.</p>

<p>In today's state of affairs, if I put an obviously unreasonable assumption into a paper, there's a good chance that it will not pass the peer review process of the journal that I submit it to. But if I make that same unreasonable assumption in software source code, it is highly probable that nobody will ever notice it. The discovery of such a case in 1997 was actually one of the major events that got me doubting about the level of rigor in computational science. I was re-implementing from scratch a popular model for protein energetics, the <a href="https://ambermd.org/" >Amber</a> force field. In the course of exploring how this model was actually defined in detail, I discovered that the value it assigned to certain atomic interaction energies depended on the order in which the atoms appeared in an input file. That's physically completely unreasonable, and no reviewer would have let it pass in a paper. But expressed in Fortran and well hidden in unpublished source code, its users remained blissfully ignorant of this anti-feature. Yes, in Open Source software there would have been a slightly higher chance of early discovery. But unless an institution explicitly asks some experts to proof-read the source code, it can easily take decades before somebody directs a critical look at the right place. And unless the result of such an expert review is made easily findable, it won't have an impact on how software actually gets used.</p>

<p>The second reason is more subtle but perhaps even more important in the long run. If research software has a good chance of being reviewed, research software developers have to write their code and documentation in such a way that it is actually reviewable, i.e. understandable by someone else. And that would also make it more understandable to its users. Today's state of the art in software understandability in the small is much better than it was when I started in computational science, 30 years ago. At that time, most code was written by a single person with no training in software engineering. It was still considered good practice to use single-letter variable names in order to save compile time. Today, most software is developed by teams, and that requires a higher level of cross-individual comprehensibility. But software complexity has also increased in those 30 years. My personal feeling is that the net effect is software being <em>less</em> understandable today than it was 30 years ago, but I can't back that up with any hard evidence. In any case, reviewing should lead to a clear improvement.</p>

<p>Are there any reasons <em>not</em> to review research software? I can think of only one: it takes time and effort. A lot of time and effort initially, to come up with suitable reviewing processes and appropriate tooling support. Less but lasting effort thereafter, to apply these processes in practice. But then, doing research with unreliable software tools also consumes time and effort, if quality control is done at a later stage and research projects have to be repeated after fixing software bugs. And since some mistakes will never be identified, research should overall be less reliable. You could expect something like a <a href="https://en.wikipedia.org/wiki/Replication_crisis" >replicability crisis</a>. Is that what we want to happen?</p>
 ]]></description> </item><item> <title>Going for robustness: science</title> <link>https://blog.khinsen.net/posts/2025/03/25/robustness-in-science.html</link> <pubDate>2025-03-27</pubDate> <author>Konrad Hinsen</author> <guid isPermaLink="true">https://blog.khinsen.net/posts/2025/03/25/robustness-in-science.html</guid> <category><![CDATA[ polycrisis ]]></category><category><![CDATA[ science ]]></category> <description><![CDATA[ <p>This is a follow-up to my <a href="https://blog.khinsen.net/posts/2025/03/05/going-for-robustness.html" >earlier post entitled &quot;Going for robustness&quot;</a>, focusing on scientific research.</p>

<p>What is &quot;robust science&quot;? I see at least two interpretations, and I am going to discuss both of them: robustness of scientific findings, and robustness of the process of doing science, which includes in particular the robustness of the web of scientific research institutions: first and foremost universities and research labs, but also learned societies, funding agencies, publishers, etc.</p>

<!-- more -->

<h2>Robust knowledge</h2>

<p><a href="https://www.cambridge.org/us/universitypress/subjects/general-science/popular-science/reliable-knowledge-exploration-grounds-belief-science?format=PB&isbn=9780521406703" >&quot;Reliable Knowledge: an exploration of the grounds for belief in science&quot;</a> is a book by physicist-turned-philosopher-of-science <a href="https://en.wikipedia.org/wiki/John_Ziman" >John Ziman</a>. It's a good book, which I recommend every scientist to read. Its title is a very good definition of the central goal of science: obtaining knowledge that is reliable. Or robust, which means the same in this context. Reliable or robust knowledge is knowledge that has been subjected to various quality control processes: verification, empirical tests, cross-checks, etc. Such knowledge can be expected to remain valid even if one of its underpinnings is later discovered to be shaky. And that's a highly desirable quality for any knowledge that important decisions are based on.</p>

<p>The quality control processes in science require robust technology for making observations, and for communicating and preserving these observations. Plus robust technology for creating, updating, and applying the scientific models we make to summarize and explain observations. Perhaps less obviously, quality control also requires a shared understanding of all this technology within research communities, in order to ensure that scientists can judge the reliability of a scientifc instrument, the domain of applicability of a computational method, the limits of validity of a scientific model, and a myriad of other characteristics of both the subjects and the research techniques of any given domain of scientific enquiry. In particular, such a shared understanding is the basis for a sound judgement of the degree of robustness of scientific findings.</p>

<p>The <a href="https://en.wikipedia.org/wiki/Replication_crisis" >replicability crisis</a> has demonstrated that the quality control processes of science have started to fail. It is often interpreted as a sign of diminishing quality of published work, due to negligence, incompetence, or outright fraud. While all of these certainly happen, it is important not to overlook the most important aspect of the crisis: an overestimation of how replicable published scientific work can be expected to be.  It is the <em>unexpected</em> irreplicability of many results that turned its discovery into a crisis. After all, while the ultimate goal of science is robust knowledge, this does not mean that each individual published result must be robust. Much of the quality control happens after publication, on much longer time scales, based on the confrontation of many different findings with overlapping applicability. In my view, the main lesson from the replicability crisis is that many scientific disciplines lack a sufficient shared understanding of their techniques. What supports this view is the notable absence of the oldest and most mature domains of research, experimental and theoretical physics and chemistry, from the replicability crisis. There are of course non-replicable results in these domains, but they dealt with routinely and hardly ever make the news. Younger disciplines lack this level of maturity, as do the recent computational branches of physics and chemistry.</p>

<p>One of the underappreciated challenges to constructing shared understanding is the use of computers. Like democracy (see my <a href="https://blog.khinsen.net/posts/2025/03/05/going-for-robustness.html" >last post</a>), science has been steamrolled by information technology, which has developed so fast that the quality control mechanisms have not been able to adapt. Computers and software have given us shiny new tools that are relatively easy to apply but very hard to understand. Whereas sophisticated statistical inference methods once required collaborating with a trained statistician, in addition to access to very expensive computers, they are now menu entries in software that runs on everybody's desktop. Should we then be surprised that many of the horror stories of the replicability crisis involve sophisticated statistical methods?</p>

<p>Likewise, the shift in the 1990s from hardware and software custom-made for science to commodity technology primarily designed for commercial applications has seriously reduced scientists' understanding of and agency over computational methods and tools. For an in-depth discussion, see my <a href="https://archive.org/details/onward-redressing-the-balance" >talk at SPLASH'24</a> and the <a href="https://dl.acm.org/doi/10.1145/3689492.3689808" >paper</a> that goes with it. With the technology of the tech industry, science has also adopted beliefs and attitudes from the tech industry that are in blatant opposition to the principles of science. We have accepted as normal that models and methods implemented in software are opaque and exempted from scientific quality control. We have also accepted as normal that a small number of software developers can impose their ideas of methodological progress on a majority of computationally illiterate users through compulsory software updates.</p>

<p>In order to make science robust again, one important goal to aim for is <a href="https://berjon.com/digital-sovereignty/" >digital sovereignty</a> for science. This doesn't mean that we shouldn't adopt commodity technologies, but we must do so critically, and tweak them to our needs. That is something that we can all start to do individually, at a small scale, if we are willing to give up some productivity in exchange. Critically examining technology takes time and effort, as does tweaking or replacing it. If you choose this path, you will probably produce fewer papers. It's therefore not a realistic option for early-career researchers, but established scientists <em>can</em> make this choice. As I said in my last post, the most important step is to start doing <em>something</em>, no matter how small. Replace a piece of proprietary software by an Open Source alternative. Get more familiar with the Open Source software you use. Establish contact with its developers, if only to tell them what support you would need from them in order to become a more responsible user. Consider the technological choices made by the developers, and in particular if they respect the needs of science: transparency and reviewability. Next, learn enough about your software that you can judge its robustness (more on that in the next episode in this blog posts series). And, more generally, do whatever it takes to improve your computational literacy.</p>

<p>However, individual action will only get us so far. Science is fundamentally a collective process, in which we stand on the shoulders of giants to see further, and look over each other's shoulders to to check for mistakes and biases. Information technology has steamrolled these processes as well. It has enabled larger and more complex research projects, gathering ever larger and more diverse teams. That is progress in a way, because it allows us to tackle more difficult questions. However, the quality control processes of the 1950s are not adapted to such projects. A paper written by an interdisciplinary team of 15 scientists is still sent for review by two or three individuals doing their examination in isolation. They have neither the competence nor the time required to perform a thorough review.</p>

<p>To make science robust again, we need to update our quality control processes. As a rule of thumb, reviewing a paper requires the same mix of competences as doing the work described in it. Reviewing interdisciplinary work requires an interdisciplinary team. Yes, that takes more time and effort. It will reduce our productivity. The same can be said for reviewing software and machine-learning techniques, a task for which we first have to develop appropriate processes. I have described a few possible steps in <a href="https://osf.io/preprints/metaarxiv/nt96q_v1" >this preprint</a>. All of them will be easier to implement if more researchers are more familiar with the medium of software, so the individual actions outlined above matter here as well.</p>

<p>An important obstacle to digital sovereignty for science is the near-complete lack of interest in techniques and infrastructure for the digital era demonstrated by today's scientific institutions. It parallels the lack of interest for the digital transformation shown by governments, and that is probably not a coincidence: most of scientific research today is organized and funded by institutions that directly depend on government funding. Governments in the Western world have decided long ago to leave the digital sphere to &quot;the market&quot;, meaning in practice a handful of corporations. We shouldn't expect them to make a different decision for the scientific institutions that they oversee. Which leads me to the second aspect of robust science: the robustness of our institutions and the research processes managed by them.</p>

<h2>Robust processes</h2>

<p>Recent events in the USA have illustrated how dependent today's scientific research is on government decisions. Even though research funding is rather diversified in the USA, compared to more centralized countries such as France, very few institutions can continue business as usual if the federal government decides to withdraw its funding.</p>

<p>This critical dependence on state funding for research is a rather recent phenomenon. In the early days of science, in the 16th century, science was much like art in that its practitioners had to be either wealthy or have wealthy sponsors. In exchange, they had a lot of freedom in their work, being bound only by the rules that were formulated by the emerging scientific community. As science progressed on its own growth path, it attracted the attention, and money, of more and more people interested in what we now call applied science, i.e. research done with the goal of enabling change in the world. Science, capitalism, and industry are in fact closely interwtined, and are all core processes of the era that sociologists call <a href="https://en.wikipedia.org/wiki/Modernity" >modernity</a>.</p>

<p>Science got its growth boost after World War II, when governments started to adopt economic growth as a goal they should actively support in order to make their countries more competitive on the international markets. They began investing heavily in scientific research, both fundamental and applied. The growth of science has thus been intimately related to economic growth for more than half a century. Just like the recent adoption of commercial technology lead to a tacit acceptance of the social values of the tech industry, the motivation of doing science for industrial growth in the 20th century lead to a tacit acceptance of the values of industry, in particular the values of efficiency and productivity.</p>

<p>Then quantitative management took over industry and science. Just like economic growth was quantified as growth of GDP, scientific productivity was quantified via bibliometry, with similarly perverse consequences. The pressure for productivity and impact has been steadily increasing over recent years. The recent drastic funding cuts in the USA can be seen as a move to weaken the political opposition, but it also makes sense economically: why fund research if you have already decided to ignore its outcomes? In France, we see similar plans to cut down public research and keep only &quot;the best&quot; (see <a href="https://www.timeshighereducation.com/news/cnrs-researchers-want-boss-quit-over-two-tier-keylabs-plan" >this article</a> for a summary in English), even though the political discourse justifying these plans is very different in both style and arguments. In light of these recent developments, the scientific institutions that welcomed the massive post-WWII state investments in research signed a deal with the devil, in that they became very dependent on government politics. That's as fragile as it can get.</p>

<p>On the other hand, the question of which problems society should allocate resources to for scientific investigation is undeniably a political one. One way to make scientific institutions more robust while orienting their work towards questions of societal interest is to anchor science more firmly in the <a href="https://en.wikipedia.org/wiki/Public_sphere" >public sphere</a>. Imagine, for example, universities adding &quot;consulting and counseling&quot; to their missions, along with the traditional missions of teaching, scholarship, and research. Actors from the public sphere, such as associations, municipal councils, or even school classes, could book an appointment to discuss the scientific investigation of a question they care about. Importantly, this is not research as a service. It's the people who care about the question that would also do much, maybe most, of the research work, with professional scientists accompanying and advising them. This isn't actually a revolutionary idea, it's already practiced at very small scales in some <a href="https://en.wikipedia.org/wiki/Citizen_science" >citizen science</a> projects. The major novelty in my proposal is to make this an official and very visible part of a university's missions. In the long run, it would increase awareness of and knowledge about science in the population, but also awareness of and knowledge about societal issues among professional scientists. Funding would still come mostly from public money, but not necessarily from the most centralized level, i.e. national governments. Setting this up is above the means of an individual, but a single university could take such a step towards robustness. Robustness of its own operation in the short term, and robustness of science as a social process in the long run.</p>

<p>Anchoring science more firmly in the population and in all our institutions is not only a path to more robust science, but also to a more robust society. Looking beyond the technicalities of scientic research, the foundation of science is the belief in a shared reality to which nobody has full access, but which we can understand as a collective by maintaining epistemic humility and checking each other's work and affirmations. That's a good antidote to the fake news and tribalist rhetorical warfare that authoritarian regimes thrive on. I believe that it is also a better basis for deliberating on political issues than what we to today in representative democracies: delegating all decisions to a small elite that excels in radiating certainty on topics that they understand at best superficially. And maybe, just maybe, this could also be a step towards dealing with existential societal problems such as the transgression of planetary boundaries.</p>
 ]]></description> </item><item> <title>Going for robustness</title> <link>https://blog.khinsen.net/posts/2025/03/05/going-for-robustness.html</link> <pubDate>2025-03-05</pubDate> <author>Konrad Hinsen</author> <guid isPermaLink="true">https://blog.khinsen.net/posts/2025/03/05/going-for-robustness.html</guid> <category><![CDATA[ polycrisis ]]></category> <description><![CDATA[ <p>I suspect that most people in the Western world (at least) are realizing that we are <a href="https://en.wikipedia.org/wiki/May_you_live_in_interesting_times" >living in interesting times</a>. News of floodings, droughts, and wildfires are ever more frequent. We hear that this is due to climate change, which most governments promise to fight, but don't. Our economies keep growing, but our quality of life is not improving. Digital tools are ever more prominent in our lives, but don't make us happy either. The political systems of more and more Western countries are changing, with a clear tendency towards authoritarianism. In daily life, more and more things work less and less well, as products are not available due to problems in a remote country, important but low-status jobs are hard to recruit for, and everyone has to spend more and more time trying to work around the problems caused by all of the above.</p>

<!-- more -->

<p>I see mainly two attitudes that people adopt. The dominant one says: We are facing a difficult period, but we have lived through difficult periods before. Let's all put in a serious effort to fix a few problems, one by one, and life will be better than ever before. This is in particular the attitude of almost everyone holding political or economic power. A growing minority says: the problems we are facing are symptoms of a <a href="https://en.wikipedia.org/wiki/Polycrisis" >polycrisis</a>, they are due to structural issues with our societies, and it will take profound changes to get out of the mess we are in. Let me tell you right away that I am in the second camp.</p>

<p>Before discussing one way for individuals to react pragmatically to the polycrisis, I will give my bird's-eye view of how we got to this point. Be warned that, by focusing on a single aspect of the history of Western civilization, it is inevitably a caricature. My point is to expose a thread running through this history that is easy to miss when submerged by the details. If you do want to dive into the details, the best resource I know of is the podcast <a href="https://www.thegreatsimplification.com/" >The Great Simplification</a>.</p>

<p>Since humanity invented agriculture some 5000 years ago, we have been on a path of growth: human population has grown steadily, and with it our use of resources (food, energy, minerals, ...) and our power over nature, defined as all life other than human. The growth imperative is by now a firmly established foundation of Western culture. For an early example, see the Jewish-Christian Bible which says (Genesis 1:28):</p>

<pre><code>And God said to them, “Be fruitful and multiply and fill the earth and subdue it,
and have dominion over the fish of the sea and over the birds of the heavens and
over every living thing that moves on the earth.”
</code></pre>

<p>It is fashionable, in particular among left-leaning progressives, to blame the problems of our time on capitalism, but capitalism is merely the most efficient implementation we have found so far of this much older divine imperative to grow and dominate. Soviet-style communism, for example, was constructed on the same tacit premise that growth is the only way forward.</p>

<p>Growth got a first turbo boost with the discovery and exploitation of fossil fuels. It opened the way to industrialization, a cascade of ever more innovation and exploitation of resources, supporting continued population growth but also more complex societies, in which people took ever more specialized roles with increasing productivity. In particular, specialization made it possible to have some people focus entirely on science and engineering, creating a feedback loop that lead to very rapid technological development.</p>

<p>If you focus on the positive outcomes, this is a phenomenal success story. I don't need to re-tell it here because you have been exposed to it for all your life. The success story is not wrong, but it is grossly incomplete. The resources required to feed the growth machine could only be obtained at a high cost not only to nature, over which we have acquired solid dominion, but also increasingly to humans living in the non-industrialized parts of the world, which colonial empires started exploiting for resources. Moreover, innovation had its downsides for the population of the exploiting countries as well, in the form of undesirable side effects such as environmental pollution and climate change.</p>

<p>A second turbo boost happened in the second half of the 20th century with information technology and in particular its application to finance. It lead to a major simplification of economic goals and societal values via the necessity for quantification. Growth thus became growth of GDP (gross domestic product), which is roughly the sum of all monetary transactions. This reformulated objective has perverse consequences, because everything that makes money change hands is good for the economy, even if it is harmful in other respects. Planned obsolescence is perhaps the best-known example, but even the &quot;natural&quot; catastrophes that are augmented by climate change lead to repair work that is paid for - and GDP goes up! Similarly, selling first carcinogens and then expensive medical treatments for cancer is better in terms of GDP than reducing carcinogens. Exploitation thus continues to accelerate, and as economic inequality is increasing rapidly, the borderline between exploiters and exploited is shifting as well. We are moving towards a society in which a handful of billionaires are exploiting most of the human and <a href="https://en.wikipedia.org/wiki/David_Abram#The_more-than-human_world" >more-than-human</a> life on the planet.</p>

<p>Today, the growth imperative is baked into our monetary system, due to money creation operating predominantly via loans that must be paid back with interest. It is so fundamental to our societies that no person or institution, not even a state, can simply decide to cancel or suspend it. That's why well-intentioned policies for protecting the environment end up being abandoned or watered down, as illustrated by the <a href="https://www.politico.eu/article/most-eu-firms-exempted-from-green-reporting-under-proposed-omnibus-bill/" >recent news about the European Unions's &quot;Green Deal&quot;</a>.</p>

<p>Since our planet is finite, growth cannot go on forever. While in theory, growth of GDP doesn't strictly require increasing exploitation of resources, in practice the two are strongly correlate. How long can we go on before we reach physical or biological limits? If you ask scientists who have actually done serious research on this question, you get the answer &quot;a few decades at best&quot; (see e.g. research on <a href="https://en.wikipedia.org/wiki/Planetary_boundaries" >planetary boundaries</a>).</p>

<p>As I said earlier, I don't believe that our problems can be fixed one by one, because they are interdependent symptoms of structural issues that have been with us for 5000 years. We need to shift the goals and values of our societies to something sustainable. That cannot happen quickly, nor by decree. It can only happen slowly, with more and more people and institutions consciously changing their habits. The question then is: what can we do to get such a shift started? In particular at a small scale, as individuals, families, associations, small businesses, etc.? One answer is given by the title of this post: we can go for robustness.</p>

<p>Robustness is a property of a system that can continue to function in difficult circumstances: in spite of perturbations, in spite of some degradation of its internal processes. Life as a whole is robust. Most organisms are robust as well: they can survive many hardships, and continue to function at a good-enough level in case of illness to some degree. The same holds for ecosystems: they can survive the loss of a few species, and adapt to changing environmental conditions, but only within limits. In the human sphere, markets are good examples of robust systems, as long as there are many independent buyers and sellers for each type of product.</p>

<p>The main enemy of robustness is the quest for efficiency. A system optimized for efficiency has no reserves left for adapting to perturbations. It becomes fragile. As I have described above, we have been optimizing society and its subsystems (e.g. institutions) for growth, efficiency, and productivity, for 5000 years. And we have accelerated the pace of optimization with the two turbo boosts of fossil fuels and information technology. That's why everything around us has become fragile. A virus outbreak in China can turn into a pandemic, because people can and must travel far and fast. The availability of many goods, including essential medication, can be endangered by a perturbation touching a single place in the supply chain, because having just one supplier for each ingredient is most efficient.</p>

<p>Robustness is not absolute and permanent. A system can be fragilized by perturbations that are too important for it to handle. Formerly robust ecosystems such as rainforests have been fragilized by human exploitation. Another interesting example is the fragilization of state governance. Modern democracy is a governance form that developed along with industrialization. Overall, its mature forms have served the people living in industrialized countries rather well. <a href="https://en.wikipedia.org/wiki/Separation_of_powers" >Separation of powers</a> protected democracies against individuals or small groups grabbing power, providing robustness. Bureaucratic procedures protected citizens from a myriad of problems that these citizens could not have managed on their own, such as impostors claiming to be dentists. These protection mechanisms have started to fail a few decades ago. Deregulation has made both governments and the economy more efficient, but at the price of increasing fragility. Corporations have become more powerful than many states, meaning that democratically elected parliaments and governments and are no longer in control. Perhaps most importantly, democracies have not been able to keep up with the rapid development of information technologies. Today, a single person can manipulate democratic elections around the world by controlling a social network. That's fragility at its best.</p>

<p>While the efficiency-versus-robustness lens is only one out of many perspectives that can help up understand the world, it has the advantage of providing a guideline for doing better: we need to de-emphasize efficiency and care more about robustness. This is not easy, as I will discuss shortly, but it is something <em>you</em> can do, as an individual or as a member of a group in which you have influence. In contrast, the big-challenges-of-society lens that points us to climate change or biodiversity loss as important problems we need to solve makes us helpless, because appropriate decisions can only be taken at a collective level that isn't yet ready to tackle these challenges. Going for robustness at small scales will help you cope with the already inevitable consequences of the big problems. And if the attitude becomes widespread and bubbles up to the level of large institutions such as states, it will actually solve the big problems. Not as fast as we would like to, but then, we need to undo 5000 years of moving in the wrong direction, and that can't happen overnight.</p>

<p>As I already said, going for robustness is not as straightforward as it may seem at first sight. The three main obstacles are:</p>

<ol>
<li><p>Robustness is contextual and the context of most systems is always
changing. That makes robustness impossible to quantify. Attempting
to come up with a &quot;robustness score&quot; and optimize it will not result
in actual robustness.</p></li>
<li><p>Making the systems that matter for you more robust is often beyond
your reach. It depends on other people, on institutions,
ecosystems, etc.</p></li>
<li><p>Increasing robustness often comes with a price to pay in terms of
money or time. Which is of course why humans have been neglecting
robustness in their rush towards growth.</p></li>
</ol>

<p>If you set out one morning to make, say, your food supply robust, you are likely to give up soon because the task seems impossible. You are dependent on so many people to get your food on the table! You might end up concluding that you should grow your own vegetables and cook all your meals yourself. Which most of us cannot do because they lack the required knowledge and resources. But more importantly, this line of reasoning is fallacious. You yourself aren't fail-safe! If you have an accident, or become ill, you are going to starve if you depend on your own food production. You also depend on the ecosystem of your garden. Especially in the era of climate change, heavy rain or droughts can easily ruin a whole year's harvest in any single place.</p>

<p>A better approach is to tackle the issues slowly, taking small steps. Consider the fragilities in your food supply. Who do you buy your food from? Do you have alternative sources? Where does your source buy the food from? Can you trace the supply chain back to the farmers that produce grains and vegetables? Probably not, unless they are close to you and the supply chain is short. Consider such opacity as equivalent to fragility. This seems to suggest that you should buy from a variety of local producers, via more than one intermediate if you need one. But... droughts and floodings could wipe out your region's production. The world is a horribly risky place. There is no way to avoid starvation!</p>

<p>If you ever start to panic through such reasoning, it's a sign that you have gone too far. A key to robustness is avoiding extreme choices. How about buying <em>half</em> of your food supply locally, and the other half as before? Spread the risk, in particular when you cannot estimate it well. And if the cost is too high for you at this time, in terms of money or inconvenience, make it a quarter rather than half. Better make a small step than no step at all. Then iterate, and don't hesitate to revise your choices in a later iteration, as you refine your understanding of your fragilities. The most important part is keeping your new quest for robustness on your mind. But don't overdo it either: a completely inefficient, even inert, system is robust but that's not where you want to end up.</p>

<p>In upcoming posts, I will discuss how to apply this approach in various settings, in particular in science and information technology. I hope that this will make the very abstract above discussion more concrete and actionable. In the meantime, start to think about the robustness of your personal or professional environments. How robust is your housing? Your sources of income? The communities that you are part of? What kind of event could get you or people close to you in trouble? And inversely, what measures can you imagine to protect yourself against such risks?</p>
 ]]></description> </item><item> <title>Modular malleability</title> <link>https://blog.khinsen.net/posts/2024/10/16/modular-malleability.html</link> <pubDate>2024-10-16</pubDate> <author>Konrad Hinsen</author> <guid isPermaLink="true">https://blog.khinsen.net/posts/2024/10/16/modular-malleability.html</guid> <category><![CDATA[ software ]]></category> <description><![CDATA[ <p><em>This is a contribution to the <a href="https://forum.malleable.systems/t/challenge-problem-fearless-extensibility/205" >Challenge problem: Fearless extensibility</a> by the <a href="https://malleable.systems/" >Malleable Systems Collective</a>.</em></p>

<!-- more -->

<p>As superheroes know very well, <a href="https://en.wikipedia.org/wiki/With_great_power_comes_great_responsibility" >with great power comes great responsibility</a>. Malleable systems offer a lot of power to their users. In exchange, users have to take responsibility for the system they have tailored to their needs, since there is nobody else to blame. If you are the only user of your malleable system, the main cause of worry is that your adaptations might have undesirable side effects that you didn't foresee. You could have broken some functionality of the system that you don't understand very well. Or you might have increased the system's attack surface for malware. If you are the administrator of a system used by others, the changes that you see as improvements may also be a source of trouble for other users, for example by breaking compatibility for <em>their</em> extensions.</p>

<p>In the following, I will outline how a malleable system architecture could reduce the risk of such undesirable side effects. The risk will never be zero, or course. As long you have power over your system, you will also retain responsibility for it. A good system architecture can help you, but not make the system foolproof. I focus on malleable systems that make no fundamental distinction between developers and users, as oppopsed to two-level systems that offer &quot;plugins&quot; or &quot;extensions&quot; to users but reduce their power relative to the main code of the system. An example for the kind of system I have in mind is <a href="https://www.gnu.org/software/emacs/" >Emacs</a>, whereas a Web browser is a good example for a plugin-based system.</p>

<p>They key concept in my design is <em>modularity</em>. To illustrate it, I will start from a top-notch malleable system that is <em>not</em> modular: Smalltalk. Its early implementations from Xerox PARC, up to <a href="https://archive.org/details/smalltalk80langu00gold" >Smalltalk-80</a> were small systems intended to be used by a single person and, more importantly, intended to be fully understandable by a single person (see <a href="https://www.cs.virginia.edu/~evans/cs655/readings/smalltalk.html" >Design Principles Behind Smalltalk</a> by Smalltalk co-creator Dan Ingalls). Its later descendants, with the exception of <a href="https://cuis.st/" >Cuis Smalltalk</a>, have abandoned this idea, aiming instead to become full-featured software development environments for use by software professionals. However, they retained the simple architecture of Smalltalk-80: one &quot;sea of objects&quot;, one system dictionary for class names, a single namespace for methods, etc. You can change the behavior of fundamental values such as <code>true</code> just as easily as you can change your own code, and it's in fact quite easy to crash a Smalltalk system with such techniques (though it rarely happens by accident).</p>

<p>Let's look at how this lack of modularity can hurt in practice. All of the following mistakes and accidents have happened to me. Some out of ignorance of best practices, when I was a Smalltalk newbie. Others accidentally, when I wasn't careful enough when making changes to my code. Yet others because I wanted to change parts of the system that I didn't understand them well enough to foresee the consequences of my changes.</p>

<p>First example: you have some method in your code that you want to rename. In Smalltalk, you don't do that in a text editor. A Smalltalk system treats code more like a database, and offers specific tools for tasks such as renaming. So you tell such a tool to replace the old method name by the new one. This is a global operation on the system. If someone else has used the same method name in other parts of the system, it becomes part of the operation, with unforeseeable consequences. The renaming tool shows you all the affected locations, and lets you select which ones to change, but it's easy to make a mistake especially if the list is long. A good renaming tool will let you restrict the operation to selected <em>packages</em> (which are groups of classes that have been added to the system in a single operation), but that's not always the restriction you actually need.</p>

<p>Second example: you change your code and break some code elsewhere that depends on yours. This can happen in any programming system, of course, but in Smalltalk it happens more easily because there is no notion of an interface definition between different parts of the code. Everyone's code is just a bunch of classes added to the system. You cannot easily know what features of your code someone else relies on, nor signal clearly to others which features of your code you consider stable and safe for others to use.</p>

<p>Third example: you want to add two packages to your system that depend on the same dependency but requiring different versions. You cannot have multiple classes with the same name in your system, so this is impossible. What actually happens if you try is that the version loaded last replaces earlier versions, breaking the code that depends on that earlier version. This is a classic example of <a href="https://en.wikipedia.org/wiki/Dependency_hell" >dependency hell</a>, a problem shared by most programming systems, malleable or not.</p>

<p>My proposal for reducing the risk of such events is to introduce a hierarchical module structure for everything: functions, classes, method names, and maybe even objects. A module is roughly a subsystem, a well-defined part of the system that has a well-defined interface. &quot;Hierarchical&quot; means that modules can themselves contain submodules. The submodules of a module would be an implementation detail, i.e. not part of the module's interface.</p>

<p>One safety feature provided by such modules is that each module can be locked or unlocked. Locked modules can be used and inspected, but not changed. You have to unlock the module first, which should make you think if that's really what you want to do. In a multi-user system, unlocking a module would be subject to authorization. Modules could also be locked against read access, for example to be used as permission tokens in an <a href="https://en.wikipedia.org/wiki/Object-capability_model" >object capability system</a>.</p>

<p>Another safety feature is the isolation of submodules. If both module A and module B use the same submodule X, then the two copies of X are independent. A and B could use different versions of X (a big step out of dependency hell), and C could be locked in A but unlocked in B. If you want to hack on X, you wrap a it in a test module, with the rest of the system continuing to use a stable release.</p>

<p>As an example, suppose I am writing a <a href="https://www.zotero.org/" >Zotero</a> client for my malleable system (which is something <a href="https://github.com/khinsen/GT-Zotero" >I have actually done</a> in Smalltalk). My client lives in a new module, initially without any interface, meaning that no other code can interact with it. It's there only for the &quot;end user&quot;, accessible via a REPL or via GUI elements. The Zotero Web API communicates via JSON data, so I need a JSON parser. That's my first submodule, taken from an existing module library. Normally I'd keep it locked, but I might temporarily unlock it if I suspect a bug in it, or if I want to tweak it for debugging my own code. The HTTP client for accessing the Zotero API is dealt with in the same way. I also want my client to have a GUI, so I add a suitable GUI toolkit as another submodule. My development universe is my code plus a handful of submodules. While I am working on my Zotero client, I don't touch code anywhere else, though I may well <em>look at</em> other parts of the system for inspiration, e.g. other modules that use Web APIs and JSON data. The system's development tools therefore grant read-only access to the whole system.</p>

<p>Some of these ideas have been implemented elsewhere. Common Lisp, for example, has a package system that permits creating namespaces for variables, functions, classes, etc. There is also a non-standard mechanism for <a href="https://lispcookbook.github.io/cl-cookbook/packages.html#package-locks" >package locks</a> that has some of the features of my module locks. Moreover, today's implementations are file-based, with a source-code file acting as a unit of code for editor operations. That provides another level of protection against accidental modification, though not malicious attacks.</p>

<p>However, I am not aware of any malleable system that implements my idea fully, nor any that would allow me to retrofit hiearchical modularity easily enough that I'd be tempted to try. There is of course a good chance that some out-of-mainstream system does provide everything I am asking for. If you know one, please leave a comment!</p>

<p>One obstacle is that most systems have a global namespace. If you want hierarchical modularity, then every name must be resolved in the context of the containing module. Names that resolve identically everywhere are a problem. In Smalltalk, as in Emacs, all namespaces are global. In Common Lisp, it's only the namespaces for packages and systems that are global, but that's enough to make hierarchical modularity difficult to implement.</p>

<p>As an example of a system architecture <em>without</em> a global namespace, consider <a href="https://nixos.org/" >Nix</a> or <a href="https://guix.gnu.org/manual/en/html_node/System-Configuration.html" >Guix</a>, which are Linux systems that abandon the global namespaces of standard distributions, i.e. global directories such as <code>/bin</code> or <code>/lib</code>, for getting out of dependency hell. They let users define any number of software environments, which can be attached to a user account or run as containers. In the latter case, the environments are completely isolated from each other. Guix (and maybe Nix, which I know less well) allows the creation of containers from inside another container, making hierarchically structured containers possible. <a href="https://guix.gnu.org/manual/en/html_node/Invoking-guix-shell.html" >Guix containers</a> thus come very close to my submodules in terms of modularity and safety features. But since they live in the realm of Linux processes, at the level of binary executables, they cannot be considered a programming system, let alone a malleable one. They are meant for deploying software, not for creating or modifying it.</p>

<p>A second obstacle is global data types. In Smalltalk, an object is accessible from everywhere in the system. It carries its class and thus all of its methods with it. In a modular ystem, if the interface of module A hands out objects defined in submodule X, then it exposes implementation details of its particular incarnation of X. This problem is not limited to object-oriented systems. Any system that attaches names to data types and interprets them at the system level, in whatever way, has to deal with similar issues. </p>

<p>One solution is to restrict data types used at interfaces to a fixed set of foundational data types that are common to all modules. That comes down to something like the JSON data model. That's what Web APIs are based on, so it's certainly doable. And the approach is well aligned with the idea of malleability, because glue code between modules can easily tweak such data items. Any type checking, static or dynamic, must be <a href="https://en.wikipedia.org/wiki/Structural_type_system" >structural</a> rather than <a href="https://en.wikipedia.org/wiki/Nominal_type_system" >nominal</a>. The main disadvantage I see is restrictions on generic programming techniques.</p>

<p>Another solution could be to allow modules to adopt the data types of their submodules and label them as their own for the outside world. I haven't seen anything similar implemented anywhere, so it's safe to suppose that there are problems with this idea that I didn't spend enough time searching for.</p>

<p>More generally, the kind of modularization I propose here raises many questions of interface design. Some of today's popular techniques become impossible, and many others may need to be revised for pragmatic reasons, for example because they lose their power or convenience. In exchange, new options that weren't possible before will likely be discovered. It may turn out to be necessary, or at least desirable, to have explicit language elements for interfacing modules (see e.g. <a href="https://www.metaobject.com/Research/connectors-metaclass.pdf" >this paper</a>). In other words, exploring this space looks more like research than like design or development.</p>

<!-- Local Variables: -->

<!-- mode: markdown -->

<!-- End: -->
 ]]></description> </item><item> <title>The low-hanging fruit in computational reproducibility</title> <link>https://blog.khinsen.net/posts/2023/11/30/The-low-hanging-fruit-in-computational-reproducibility.html</link> <pubDate>2023-11-30</pubDate> <author>Konrad Hinsen</author> <guid isPermaLink="true">https://blog.khinsen.net/posts/2023/11/30/The-low-hanging-fruit-in-computational-reproducibility.html</guid> <category><![CDATA[ reproducible research ]]></category> <description><![CDATA[ <p>Yesterday I participated in the <a href="https://www.ouvrirlascience.fr/international-workshop-software-pillar-of-open-science/" >International workshop “Software, Pillar of Open Science”</a>, organized by the <a href="https://www.ouvrirlascience.fr/home/" >French Committee for Open Science</a>. In the course of the various presentations and discussions (both in public and during coffee breaks), I realized that something has been absent from such events all the time: the vast majority of scientists.</p>

<!-- more -->

<p>What prompted this insight was the juxtaposition of two observations: during the introduction, the importance of software in research (&quot;92% of all researchers say they rely on software&quot;), and during the panel on reproducibility, the difficulties resulting from the complexities of today's software stacks.</p>

<p>Here's a provocative proposition: we can solve computational reproducibility for a big majority of those 92% of researchers by buying them a license for <a href="https://www.wolfram.com/mathematica/" >Mathematica</a>.</p>

<p>It's not Open Source, and that's bad for Open Science. I agree. But it does everything that most of those researchers need, it's very easy to install and run, and it's stable. You can run 20-year-old Mathematica code in today's version, and get the same results in the vast majority of cases. No reproducibility issues.</p>

<p>It's worth asking the question how a commercial company can solve a problem that highly qualified academic researchers have been discussing for a decade and continue to declare difficult. My answer to this question is threefold: (1) commercial licenses provide the resources for ensuring <a href="https://science-in-the-digital-era.khinsen.net/#The%20sustainability%20doughnut%20of%20scientific%20software" >the floor of the sustainability doughnut</a>, (2) the contractual producer-client relation provides the information necessary for ensuring <a href="https://science-in-the-digital-era.khinsen.net/#The%20sustainability%20doughnut%20of%20scientific%20software" >the ceiling of the sustainability doughnut</a>, and (3) their audience is very different from the participants at software-for-open-science events.</p>

<p>The last aspect is my key message here. All the activities around software in Open Science are organized by and for people who work in computational science, meaning that computation is their principal tool of scientific inquiry. A large proportion of them has a degree in computer science. On the other hand, most of the 92% of researchers who depend on software do <a href="https://science-in-the-digital-era.khinsen.net/#Computer-aided%20research" >computer-aided research</a> but <em>not</em> computational science. Their main tools are instruments or mathematical theories. They use computers as auxiliary tools, mostly for routine data analysis tasks.</p>

<p>The people who contribute to Open Source projects for scientic software have overall the same profile as the participants of software-for-open-science events. They develop and document their software for this kind of profile as well. The invisible others can and do use this software as well, but it's a lot too complex for them. It's above the sustainability ceiling. But since the invisible others are invisible to the developers, they have no way to make their needs heard. In contrast to a commercial company, who knows all of its clients (they are paying for their license every year), cares about them (they are paying for their license every year), and regularly asks them about their needs and their degree of satisfaction.</p>

<p>I brought up this issue during the panel on sustainability, and discovered that there are others who have been thinking about it, for example panel member <a href="https://sloan.org/about/staff/joshua-m-greenberg" >Josh Greenberg</a> from the Sloan foundation (whom I'd also like to thank for an insightful discussion after the event). That's very promising. And here's my proposal for a first step into this direction: let's work on diversity and inclusion in Open Science. Make sure that all of the 92% of software-using researchers are represented.</p>

<!-- Local Variables: -->

<!-- mode: markdown -->

<!-- End: -->
 ]]></description> </item><item> <title>This blog gets a facelift</title> <link>https://blog.khinsen.net/posts/2023/11/16/This-blog-gets-a-facelift.html</link> <pubDate>2023-11-16</pubDate> <author>Konrad Hinsen</author> <guid isPermaLink="true">https://blog.khinsen.net/posts/2023/11/16/This-blog-gets-a-facelift.html</guid>  <description><![CDATA[ <p>Regular visitors to my blog have probably noticed that it looks different now. However, the visual changes are only a side effect of a more profound change: I now use a different static site generator, <a href="https://github.com/coleslaw-org/coleslaw" >coleslaw</a>.</p>

<!-- more -->

<p>It's been a while that I wanted to replace Disqus by a less invasive commenting system, and the recent announcement by Disqus to insert ads into the comments on my blog was what finally motivated me to actually invest some time to get this done. </p>

<p>The first task was to find a replacement for Disqus. One of my criteria was to allow commenting from the Fediverse, to remove the need for creating yet another account on yet another site just in order to be able to comment. The other criterion was not to depend on some third party service that might disappear or turn evil one day. In reply to <a href="https://scholar.social/@khinsen/111345801956217482" >a question on Mastodon</a>, <a href="https://neuromatch.social/@mstimberg/111346021003557962" >Marcel Stimberg</a> pointed me to a <a href="https://carlschwan.eu/2020/12/29/adding-comments-to-your-static-blog-with-mastodon/" >post by Carl Schwan</a> explaining how to use replies to a post-related toot as a channel for commenting. That looked just fine: no need for anyone to set up new accounts, just a one-time investment for updating my blog-generation code.</p>

<p>Next, I explored how to implement this technique in the static site generator I was using before, <a href="https://github.com/greghendershott/frog" >Frog</a>. It turned out to be more complicated than I expected, because Frog allows only a fixed set of metadata fields on a post. Adding a field is certainly not impossible, but I'd have had to make changes to many places in the code to add parsing code for the new field and then pass its optional value around from function to function until its final destination in HTML rendering.</p>

<p>Before attacking such a major code surgery, I checked out other static site generators on a few-hour train ride, looking for one that supports arbitrary metadata or, better yet, is more hackable than Frog. After all, I might want to make other changes in the future, so having a codebase that I feel comfortable hacking on is likely to be valuable. Given my recently renewed interest in Common Lisp (see <a href="/posts/2023/10/09/deconstructing-the-mastodon-client.html" >this post</a>) for the reasons), I quickly settled on Coleslaw as a candidate to take a closer look at.</p>

<p>Coleslaw has a fixed set of metadata fields as well, but that set is defined by the slots of a class. Just add a slow, and you have a new metadata field. Very hackable! Moreover, the codebase is reasonably small, and while it's not a model of clarity, the ability to explore the code in a live programming environment makes it rather easy to get into, contrary to the more static and debug-hostile Racket code of Frog.</p>

<p>So that's why you are now looking at a Coleslaw-generated blog. It's my personal modified fork for now. I may look into factoring out my add-ons as plugins and submit them upstream, but this is absolutely not a high-priority project. Many people have their own fork of Coleslaw with similar personalizations, and that looks just fine. The forks are even very discoverable via GitHub. I'd prefer having discoverability <em>beyond</em> a single forge, but I don't think that's doable today.</p>

<p>Even though the blog looks very different, the contents of the posts have not changed, and the URLs remain identical as well. That took another ten minutes of hacking on Coleslaw. The URLs of the RSS and Atom feeds have also remained the same. I have exported the comments from Disqus and added them as static HTML on the posts. You can no longer add comments on the old posts, but at least read the existing ones. As a bonus, I also imported the posts from my very first blog at wordpress.com, because Coleslaw comes with a Wordpress importer that makes this a very straightforward operation.</p>

<p>The visual presentation of the pages isn't really to my taste, but I am not sure I'll be able to come up with something significantly better with my current rudimentary knowledge of CSS. I'll leave that for a future facelist session, which may of course never happen.</p>

<!-- Local Variables: -->

<!-- mode: markdown -->

<!-- End: -->
 ]]></description> </item><item> <title>Following branching conversations on Mastodon</title> <link>https://blog.khinsen.net/posts/2023/11/05/following-branching-conversations-on-mastodon.html</link> <pubDate>2023-11-05</pubDate> <author>Konrad Hinsen</author> <guid isPermaLink="true">https://blog.khinsen.net/posts/2023/11/05/following-branching-conversations-on-mastodon.html</guid> <category><![CDATA[ social networks ]]></category> <description><![CDATA[ <p>This post is a follow-up to my previous one, <a href="http://blog.khinsen.net/posts/2023/10/09/deconstructing-the-mastodon-client.html" >Deconstructing the Mastodon client</a>. My topic is a scenario that traditional Mastodon clients handle rather badly, wheres my home-grown solution handled it very well: lengthy and branching conversations.</p>

<!-- more -->

<p>Such conversations happen all the time on social networks. Someone posts an interesting question or observation, which is commented by many others. Then comments are added to comments, and soon the replies form a branching tree that grows over a few days, sometimes even weeks. Keeping up to date with such a conversation is not supported by any Mastodon client I know of. Worse, due to the way Mastodon implements federation, some replies may never arrive on your instance.</p>

<p>What I did in the past is put a bookmark on the initial toot, and then check for new replies once per day or so. Once you get to dozens of toots, checking for new ones is already a minor effort. And although I know how to check for replies outside of my own instance, in practice I hardly ever do it because it's too laborious.</p>

<p>A <a href="https://git.sr.ht/~khinsen/malleable-message-manager/tree/main/item/examples/conversations.lisp" >simple script</a> that I run once per day makes this a lot easier. I still mark interesting conversations as bookmarks. But now it's my script that copies the whole tree into a mail folder, skipping toots that are already present. New additions to the tree thus show up as unread mails in my inbox, just like replies in a mailing list. Better yet, my script retrieves the whole tree twice: once from my own instance, and once by retrieving each toot from the instance it was posted to, checking on that instance for replies. Neither approach is sufficient on its own: my instance doesn't see all replies, but the foreign instances from which I retrieve toots won't show me non-public toots.</p>

<p>Nothing of this is rocket science, but it's a nice illustrations of the possibilities that open up once you take control over your personal information environment. I wish this were easier, and thus accessible to more people. But it won't get easier as long as most computer users find it perfectly normal that a small technophile elite defines what everyone else is able to do in their digital lives. So if you are reading this and think &quot;nice, but that's above my level of competence&quot;, the very least you should do is express your desire to be able to do such things on your own. On Mastodon, for example.</p>

<!-- Local Variables: -->

<!-- mode: markdown -->

<!-- End: -->
 ]]></description> </item><item> <title>Deconstructing the Mastodon client</title> <link>https://blog.khinsen.net/posts/2023/10/09/deconstructing-the-mastodon-client.html</link> <pubDate>2023-10-09</pubDate> <author>Konrad Hinsen</author> <guid isPermaLink="true">https://blog.khinsen.net/posts/2023/10/09/deconstructing-the-mastodon-client.html</guid> <category><![CDATA[ social networks ]]></category> <description><![CDATA[ <p>Ever since I joined Twitter in 2011, and then moved to Mastodon in 2022, I have been unhappy with the timeline view proposed by both of these communication platforms as their main interface. Now I have finally done something about it: I wrote my own Mastodon client. Or perhaps rather a non-client, because the concept of &quot;the client&quot; is a big part of what I disliked.</p>

<!-- more -->

<p>My use of social networks can be broken down into three categories:</p>

<ol>
<li>conversations, mostly public but sometimes private</li>
<li>keeping up to date with the work of a small number of people or institutions</li>
<li>staying in touch with communities I consider myself a part of, and following topics I find interesting</li>
</ol>

<p>These are not clearly separated categories. It's often messages from category 2 that start conversations, and occasionally messages from category 3. But most of my daily use of Mastodon consists of</p>

<ol>
<li>participating in ongoing conversations</li>
<li>reading the feeds of accounts I care about specifically</li>
<li>scanning all the other news feeds sporadically and often superficially, depending on how much time and interest I have at the moment</li>
</ol>

<p>A timeline view mixing all messages from all accounts I follow is somewhat acceptable for (3), but no good for (1) and (2). Mastodon proposes lists for (2), and notifications to help with (1), but neither mechanism is satisfying for me. Lists in particular suffer from an awkward user interface. Moreover, I do (3) exclusively on mobile devices (on the bus etc.), (1) almost exclusively on the desktop (as I don't like typing on on-screen keyboards), and (2) alternating between multiple devices.</p>

<p>There are, of course, many Mastodon clients, so I tried out a few of them. For a while, I used Fedilab on Android (for me: phone and e-ink tablet) for activity (3), and the default Web client and/or <a href="https://elk.zone/" >Elk</a>, mainly on the desktop, for (1) and (2). It was a workable setup, but not a satisfying one. In addition to the cumbersome list interface, what I found missing was synchronization between my usage of multiple devices For (2), I'd need to be able to efficiently access all messages I hadn't seen before, on any of my devices (two mobile, two desktop). As a long-time Emacs user, I also tried <a href="https://codeberg.org/martianh/mastodon.el" >mastodon.el</a>, which is nice but, like Emacs, it is desktop only, and thus doesn't help with my multi-device issues.</p>

<p>At some point I realized that what I wanted is not a better Mastodon <em>client</em>, but a better Mastodon <em>workflow</em>. What I care about is a data structure, a stream of toots, that is accessible via an HTTP API. I want to split this stream into several streams according to various criteria. For some substreams, I want to make sure I don't miss any message. For others, I need an interface to scan all messages when I feel like it, or search for specific keywords when I don't have time for scanning everything.</p>

<p>Can I get such interfaces to Mastodon streams without writing my own client? Yes, by repurposing existing software. Small streams of which I don't want to miss anything are much like e-mail (after spam filtering of course!). High-volume streams that I scan or search are much like RSS feeds. There is a lot of good software for managing e-mail and RSS feeds, for all platforms I use and even exotic platforms that I don't use (yet?). There are also good infrastructure tools in this space, in particular for e-mail. <a href="https://isync.sourceforge.io/" >isync</a>, for example, takes care of IMAP(S), letting me work with local files (Maildir) and not worry about networks, certificates, and their various modes of failure.</p>

<p>It actually takes surprisingly little software to transform Mastodon streams into e-mail and RSS feeds, if you can resist temptations of overengineering. A toot is a snippet of HTML with optional attachments (images, video, audio). That's also what a MIME message happens to be. A near-perfect match. RSS items are HTML snippets as well. No attachments, but you can include the same preview images that Mastodon clients display with toots. If you can find support libraries for mail, RSS, and the Mastodon API in a programming language that you know well enough, this becomes a very manageable side project.</p>

<p>If your preferences match mine, meaning you are happy to use Common Lisp for such a job, you can use <a href="https://sr.ht/~khinsen/malleable-message-manager/" >my code</a> as a starting point for your own Mastodon experiments. Its main support libraries are <a href="https://github.com/Shinmera/tooter" >tooter</a> for the Mastodon API, and <a href="https://github.com/40ants/mel-base" >mel-base</a> for e-mail. RSS is trivial if you have XML support, for which I use <a href="https://github.com/shinmera/plump" >plump</a>. My RSS aggregator is <a href="https://newsblur.com/" >Newsblur</a>, which has a reasonable Web interface for the desktop and a very nice Android app. For e-mail, I use K9 on Android, and Emacs on the desktop, but I am pretty sure any other e-mail client would work fine as well. The most time-consuming aspect turned out to be mel-base, a library that's insufficiently documented and not quite up to date, lacking support in particular for subject lines and account names containing Unicode characters.</p>

<p>If you have followed so far, you have probably noticed that my non-client supports nothing but reading toots. Each of my transformed toots ends with a link that opens it in the default Web client, where I can reply, boost, or like. The Web client is also what I use for administrative tasks. Bonus: I add another link to each toot that opens it in the instance of its author, where I have access to the full reply chain, of which my own instance often captures only a subset. A very simple solution to one of Mastodon's unfortunate limitations that are due to federation.</p>

<p>The hopefully generalizable lesson from this project is that it is possible to improve one's personal computing environment with reasonable effort, under the condition of accepting an initial learning curve for some technologies. The important question then is how to identify technologies that are worth learning, which I interpret as technologies that are likely to be useful again for other software personalization efforts. A first draft of a list of criteria:</p>

<ol>
<li><strong>Choose <a href="https://boringtechnology.club/" >boring technology</a>.</strong> You want well-known, well-documented, and stable infrastructure to build on. No surprises, no tech churn. Your learning effort should be a good investment.</li>
<li><strong>Choose small-scale rather than enterprise-grade technology.</strong> Your problems and challenges are very different from Microsoft's. Prefer small software stacks.</li>
<li>Corollary 1: <strong>choose carefully who you turn to for advice.</strong> Most conference talks, blog posts, StackOverflow discussions, etc. come from software professionals. Better listen to people like yourself (but no, I have no advice on where to find them, nor how to judge their competence).</li>
<li>Corollary 2: <strong>consider old technology.</strong> Most modern software development tools are designed for software professionals. Tools for small-scale development were common in the 1980s and 1990s, before computers became commodities. Technology from that era that's still supported today may well be your best bet. I am a happy user of <a href="https://www.gnu.org/s/emacs/" >Emacs</a>, Smalltalk (more precisely <a href="https://pharo.org/" >Pharo</a> with <a href="https://gtoolkit.com/" >Glamorous Toolkit</a> as my preferred user interface), and Common Lisp (more precisely <a href="https://www.sbcl.org/" >SBCL</a>). Python is from the 1990s as well, but since it was widely adopted by software professionals in the 2000s, its ecosystem suffers too much from tech churn for my taste.</li>
<li><strong>Build on  general protocols and file formats rather than specialized ones.</strong> Hierarchical filesystems rather than the Dropbox API. E-mail rather than Matrix. HTML, XML, and JSON files rather than JavaScript libraries or Web APIs.</li>
<li><strong>Consider debuggability.</strong> Delegate hard-to-debug stuff (e.g. networking, in particular with encryption) to other software. Choose tools that support debuggability. Debugging is a lot easier if you can build your own problem-specific debugging tools, which in turn is best supported by development tools that are extensible and focus on rapid feedback. Smalltalk systems are best in class in this respect, and Glamorous Toolkit even turned this into a design principle, called &quot;Moldable Development&quot;.</li>
</ol>

<p>Unfortunately, there is one more aspect to making good choices that is hard to generalize: you need some expertise in figuring out which problems you can solve yourself with reasonable effort and which are so hard that your efforts are better spent on delegating or circumventing them. Data synchronization is in this second category, but like most people I learned this the hard way (years ago), while trying to do it myself and losing both time and data in the process.</p>

<p><br></p>

<p>After a few weeks of using my setup, I am fully satisfied with it. I also note that my original ideas about defining my personal algorithmic feeds have evolved substantially with practical experience. Once I have taken care of conversations (they go to e-mail) and the small set of accounts I follow closely (a low-volume RSS feed), I ended up splitting the remaining toots (i.e. most of my timeline) by topics in the crudest imaginable way: substring search. It's not perfect but definitely good enough. There's always room for improvement. My main failure so far is in removing all the cat-related toots from my feeds. That may actually require AI-based image recognition. Some problems are hard!</p>

<p>I'd love to hear about similar projects in this space (tell me <a href="http://scholar.social/@khinsen" >on Mastodon</a>!). The only one I am aware of is <a href="https://steampipe.io/blog/mastodon" >Jon Udell's Steampipe-based client</a>. Steampipe provides an SQL/database view on many Web services, which is perfect for doing non-trivial queries. That's something my own setup doesn't address at all. It's not something I feel a need for  right now, but I may well add Jon's client to my toolbox one day.</p>

<!-- Local Variables: -->

<!-- mode: markdown -->

<!-- End: -->
 ]]></description> </item><item> <title>Welcome to my digital garden!</title> <link>https://blog.khinsen.net/posts/2022/08/31/welcome-to-my-digital-garden.html</link> <pubDate>2022-08-31</pubDate> <author>Konrad Hinsen</author> <guid isPermaLink="true">https://blog.khinsen.net/posts/2022/08/31/welcome-to-my-digital-garden.html</guid>  <description><![CDATA[ <p>A few years ago, I discovered Mike Caulfield's <a href="https://hapgood.us/2015/10/17/the-garden-and-the-stream-a-technopastoral/amp/" >The Garden and the Stream: A Technopastoral</a> and understood why I wasn't happy with my blog.</p>

<!-- more -->

<p>Blogs are streams, timelines of posts. Each post has a timestamp, and is considered &quot;finished&quot;. Later changes are technically possible, but culturally limited to corrections. A blog post is considered a published essay, and therefore comes with a date of publication. I am much more interested in gardens, which are collections of essays that are revised and improved over long time periods.</p>

<p>It took me a while to actually set up a digital garden and populate it with some content, but I eventually did it. I won't say much about it because it speaks for itself. It's just one click away: <a href="https://science-in-the-digital-era.khinsen.net/" >https://science-in-the-digital-era.khinsen.net/</a></p>

<p>Does this mean the end of this blog? No, but posts will become even rarer. A blog is still the best place to make announcements, or to comment on events. But I am a researcher, not a journalist. The fundamental job of a researcher is to curate and extend knowledge collections. That's what I will do from now on in my own little garden.</p>

<!-- Local Variables: -->

<!-- mode: markdown -->

<!-- End: -->
 ]]></description> </item><item> <title>The dependency hubs in Open Source software</title> <link>https://blog.khinsen.net/posts/2021/06/10/the-dependency-hubs-in-open-source-software.html</link> <pubDate>2021-06-10</pubDate> <author>Konrad Hinsen</author> <guid isPermaLink="true">https://blog.khinsen.net/posts/2021/06/10/the-dependency-hubs-in-open-source-software.html</guid> <category><![CDATA[ software ]]></category> <description><![CDATA[ <p>A few days ago, Google announced its experimental project <a href="https://deps.dev/" >Open Source Insights</a>, which permits the exploration of the dependency graph of Open Source software. My first look at it ended with a disappointment: in its initial stage, the site considers only the package universes of Java, JavaScript, Go, and Rust. That excludes most of the software I know and use, which tends to be written mainly in C, C++, Fortran, and Python. But I do have a package manager that has all the dependency information for most of the software that I care about: <a href="https://guix.gnu.org/" >Guix</a>. So I set out to do my own exploration of the Guix dependency graph, with a particular focus: identifying the hubs of the Open Source dependency network.</p>

<!-- more -->

<p>This was also a good opportunity to test the practical utility of <a href="https://github.com/khinsen/guix-gtoolkit" >a new GUI for Guix</a> that I have been working on recently as a side project. In fact, I added this dependency hub analysis to that GUI, so now you can access it with a simple click.</p>

<p>Software being the complex beast that it is, I have to start by properly defining the subjects of my inquiry. What exactly do I mean by &quot;package&quot;, &quot;dependency&quot;, and &quot;dependency hub&quot;?</p>

<p>The term <em>package</em> is widely used to describe a unit of development and distribution in software systems, but every package manager has a slightly different notion of what a package actually is. A package could be &quot;Python&quot;, or &quot;Python 3.8.2&quot;, or &quot;Python 3.8.2 built with gcc 7.5, version X of dependency Y, ...&quot;. Guix adopts the last, most fine-grained, definition. This is a good choice when you want to do reproducible software builds, but it is not very useful for analyzing dependency graphs. So I chose the level of name + version number, meaning that I consider &quot;Python 3.8.2&quot; a different package from &quot;Python 3.8.1&quot;. That's of course debatable as well. But in Guix, it is rare to have multiple versions of a piece of software coexist at the same time. When it does happen, there is a good reason, typically a significant evolution in the software that makes different dependents prefer different versions. An example is Python 2 vs. Python 3, or the different major versions of gcc. In those cases, looking at their dependencies and dependents separately does make sense.</p>

<p>The term <em>dependency</em> is also widely used with different meanings. The two most common ones are <em>runtime dependency</em> and <em>build dependency</em>. A runtime dependency of package X is a package that must be installed on the computer to <em>use</em> package X. In contrast, a build dependency is a package that is required in order to <em>build</em> package X, where <em>building</em> means anything required to turn source code into something executable. Think of it as a generalization of <em>compiling</em>. Usually the build dependencies are roughly a superset of the runtime dependencies: there are packages you need to build package X, e.g. a compiler, but which are then no longer required for using package X. It's the build dependencies that matter for the evolution of software systems, so that's the definition I used in my analysis.</p>

<p>Unfortunately, the complexity of defining dependencies doesn't end there. Many packages have <em>optional</em> dependencies. When they are available, some additional functionality is enabled. Do you count them or not? My pragmatic take is that I trust the Guix developers to have made good choices. So for me, a dependency is whatever it takes to build a package in Guix.</p>

<p>This leaves the notion of a <em>dependency hub</em> to be defined. In network science, a hub is a node that has an exceptionally high number of connections to other nodes, such that a large share of the information propagating through the network passes through the hubs. A software dependency graph differs from most networks in that its edges have a direction: A depending on B is not the same as B depending on A. This leads to several <em>a priori</em> reasonable definitions for hubs: 1. packages that have many dependencies, 2. packages that have many dependents, and 3. packages for which the sum of dependencies plus dependents is high. Let's immediately eliminate the last definition, as I see no interest in it. Definition 1 identifies the packages that are particularly <em>vulnerable</em> to <a href="https://hal.archives-ouvertes.fr/hal-02117588/document" >software collapse</a>, definition 2 the packages that can most easily <em>cause</em> software collapse.</p>

<p>The latter characteristic corresponds best to the capture of information flow as the defining feature of network hubs, and it also happens to be what I am most interested in. The information that flows in the network is requests for change. Nodes receive such requests from dependents, who are in fact the software's clients or users. They typically ask for improved or extended functionality. Nodes also receive requests from dependencies, when they implement changes that break backward compatibility and then ask <em>their</em> dependents to adapt to these changes. The nodes that potentially receive and send many requests for change are thus the nodes who have the most dependents. They are the hubs in the dependency network. Note, however, that the asymmetry in the dependency relation still matters. Nodes can ignore requests for change coming from their dependents, but they cannot ignore requests coming from their dependencies. It's called &quot;dependency&quot; for a reason!</p>

<p>At this point, I can take a break from theory and show you the results of my analysis. The top twenty hubs in the Guix dependency graph are:
<table>
    <tr>
        <th>Package</th> <th>Number of dependents</th>
    </tr>
    <tr>
        <td>perl 5.30.2</td> <td>7964</td>
    </tr>
    <tr>
        <td>pkg-config 0.29.2</td> <td>7938</td>
    </tr>
    <tr>
        <td>zlib 1.2.11</td> <td>7414</td>
    </tr>
    <tr>
        <td>ncurses 6.2</td> <td>7337</td>
    </tr>
    <tr>
        <td>libffi 3.3</td> <td>6687</td>
    </tr>
    <tr>
        <td>xz 5.2.4</td> <td>6535</td>
    </tr>
    <tr>
        <td>readline 8.0</td> <td>6503</td>
    </tr>
    <tr>
        <td>libxml2 2.9.10</td> <td>6302</td>
    </tr>
    <tr>
        <td>expat 2.2.9</td> <td>6170</td>
    </tr>
    <tr>
        <td>libunistring 0.9.10</td> <td>6150</td>
    </tr>
    <tr>
        <td>bzip2 1.0.8</td> <td>6070</td>
    </tr>
    <tr>
        <td>tzdata2019c</td> <td>6068</td>
    </tr>
    <tr>
        <td>Python 3.8.2</td> <td>6061</td>
    </tr>
    <tr>
        <td>bash 5.0</td> <td>6042</td>
    </tr>
    <tr>
        <td>gettext 0.20.1</td> <td>5768</td>
    </tr>
    <tr>
        <td>m4 1.4.18</td> <td>5621</td>
    </tr>
    <tr>
        <td>libgpg error-1.37</td> <td>5518</td>
    </tr>
    <tr>
        <td>libgcrypt 1.8.5</td> <td>5514</td>
    </tr>
    <tr>
        <td>libxslt 1.1.34</td> <td>5479</td>
    </tr>
    <tr>
        <td>gmp 6.2.0</td> <td>5363</td>
    </tr>
</table>
If you want more, <a href="/static/hubs.json" >here</a> is the full list as a JSON file, sorted by decreasing number of dependents.</p>

<p>If you have thought a bit about what to expect before looking at this table, you have probably included programming languages such as <tt>perl</tt> or <tt>python</tt> in this list. But perhaps you did not expect to see utilities such as <tt>pkg-config</tt> or <tt>bzip2</tt>. Remember these are <em>build</em> dependencies. The very first step in building a package, <em>any</em> package, is unpacking its source code. Many of the packages in my top-twenty list represent boring but essential infrastructure software. The software equivalent of the power grid and the road network: stuff that everybody just takes for granted. Such packages rarely get into the news, except when something goes seriously wrong, as in the case of the <a href="https://heartbleed.com/" >Heartbleed bug</a> affecting OpenSSL. Which, by the way, is at position 634 in my list. It would be much higher up in a network defined by different criteria, of course. There's more to software than build dependencies.</p>

<p>One motivation for writing this post was to point out a common fallacy in reasoning about Open Source software. A popular argument is that Open Source gives you the freedom to change software to fit your needs, by creating and maintaining your own fork. Or paying someone else to do it for you, if you are not an accomplished hacker yourself. The source code is there for anyone to grab, after all, and the license allows modification and redistribution.</p>

<p>This argument was valid in the 1980s. There were few packages, few dependencies, and a much higher percentage of computer users had programming experience. Today, you can perhaps maintain your own fork of Perl, but you cannot fork its hub position in the network, nor can you reasonably maintain forks of its 7964 dependants. If the Perl maintainers introduce a breaking change, those 7964 dependents will either adapt or disappear. Hypothetically, a large number of them could together envisage maintaining their own fork. But there are no good coordination mechanisms among developers of unrelated Open Source projects, and therefore this doesn't happen in practice.</p>

<p>In an <a href="https://blog.khinsen.net/posts/2020/02/26/the-rise-of-community-owned-monopolies.html" >earlier post</a>, I have written about community-owned monopolies in the Open Source universe. In that post, I wrote that for software users, there is no practical difference between Microsoft killing Windows 7 and the Python community killing Python 2, even though the former is proprietary and commercial, whereas the latter is Open Source. The reason is that both pieces of software are hubs in dependency networks. Microsoft and the Python developer community are two very different institutions, with very different goals, values, policies, legal status, etc. But that hardly matters for the average software user, whose work depends on a complex web of interacting pieces of software. At the level of that web, it's the information flow patterns that determine evolution. Requests for change, or non-change. Average software users have practically no way to make their needs heard by the people who manage the hubs. Even the best-intentioned altruistic Open Source hub maintainer cannot possibly keep every user's interests in mind, because there is no way to even be aware of them. A web of software is a very different beast than a single project. <a href="http://robotics.cs.tamu.edu/dshell/cs689/papers/anderson72more_is_different.pdf" >More is different.</a></p>

<p>In the almost 40 years since the beginnings of the Open Source movement, the mode of governance of Open Source projects has evolved significantly. Most importantly, all the people involved have realized that governance matters and must be consciously organized, rather than evolve through cumulative random accidents of history, which almost inevitably leads to a <a href="https://en.wikipedia.org/wiki/The_Tyranny_of_Structurelessness" >tyranny of structurelessness</a> in the long run. Now we must develop an awareness of similar issues at the level of the <em>web</em> of Open Source projects, followed by the development and implementation of better information flow and decision structures.</p>

<p>I will conclude this post with a technical remark. I did my dependency hub analysis using a relatively new tool in the software world, called the <a href="https://gtoolkit.com/" >Glamorous Toolkit</a>, to which I added an <a href="https://github.com/khinsen/guix-gtoolkit" >interface to Guix</a>. This toolbox significantly lowers the cost of developing new tools. In the screenshot below, you see on the left the user interface of my analysis. It's an additional view on the Guix package catalog, complementing various other views that are already in place. On the right, you see the complete code for this analysis, including the user interface (which also gives access to the list of dependents, not just the number). In contrast to traditional scripts, there is no overhead for reading data or writing out the results. My code works on data structures that are already in place. What is not obvious from the screenshot is that you get the right-hand panel via alt-click from the left-hand one, meaning that users of my little analysis tool always have direct access to the code. It isn't obvious either that modifying the code on the right will immediately update the view on the left, making development highly interactive. If you think notebooks are great, try Glamorous Toolkit. But be warned that you might then realize that notebooks are no longer the state of the art.</p>

<div class="figure">
<img src="/static/guix-gtoolkit-dependency-hubs.png" alt="" width="100%"/>
<p class="caption"></p>
</div>

<!-- Local Variables: -->

<!-- mode: markdown -->

<!-- End: -->
 ]]></description> </item><item> <title>The structure and interpretation of scientific models, part 2</title> <link>https://blog.khinsen.net/posts/2021/01/08/the-structure-and-interpretation-of-scientific-models-part-2.html</link> <pubDate>2021-01-08</pubDate> <author>Konrad Hinsen</author> <guid isPermaLink="true">https://blog.khinsen.net/posts/2021/01/08/the-structure-and-interpretation-of-scientific-models-part-2.html</guid> <category><![CDATA[ computational science ]]></category> <description><![CDATA[ <p>In <a href="https://blog.khinsen.net/posts/2020/12/10/the-structure-and-interpretation-of-scientific-models.html" >my last post</a>, I have discussed the two main types of scientific models: empirical models, also called descriptive models, and explanatory models. I have also emphasized the crucial role of equations and specifications in the formulation of explanatory models. But my description of scientific models in that post left aside a very important aspect: on a more fundamental level, all models are stories.</p>

<!-- more -->

<p>To illustrate my point, I will take up my running example from part 1: celestial mechanics. Newton's model for our solar system is, as I said, composed of several equations, the most famous of which, <em>F</em> = <em>m</em> ⋅ <em>a</em>, many readers will probably remember from a high-school physics class. But that equation means nothing on its own. It just says that there are three quantities, one of which being the product of the other two.</p>

<p>The minimal story required to make sense of this equation provides a definition of the three quantities involved. For acceleration (the <em>a</em>), this may look superficially simple: it's the second derivative of an object's position in time. The concepts of position and time are part of our everyday intuition, so that's the easy part. Velocity is an intuitive everyday concept as well, but its precise relation to position as a time derivative is not. For acceleration, nothing short of calculus will do. In fact, Newton invented calculus along with his physical theory! Defining mass (the <em>m</em>) and force (the <em>F</em>) is not a trivial task either. Both concepts are rooted in our everyday intuition about the world, but their role in Newton's law of motion requires a much more precise understanding. If you have doubts about this, try explaining the difference between <em>mass</em> and <em>weight</em> to someone who doesn't have a scientific education.</p>

<p>From this big-picture point of view, equations such as <em>F</em> = <em>m</em> ⋅ <em>a</em> are tiny pieces of our scientific models. They are the tips of icebergs whose massive underwater parts are the stories defining the underlying concepts and linking them to our intuition about the world, often through multiple and increasingly abstract layers. We tend to forget about these stories, because once we have understood them well enough, what we actually work with are the equations. But this works only for the well-established models whose stories are now found in textbooks. New research continuously introduces new models, often as small variants or extensions of existing ones. Their stories are told in scientific publications.</p>

<p>Historically, <a href="https://en.wikipedia.org/wiki/History_of_mathematical_notation" >mathematical notation</a> was introduced as a convenient shorthand for use in plain-language stories. The lengthy phrase &quot;force equals mass times acceleration&quot; thus became  <em>F</em> = <em>m</em> ⋅ <em>a</em>. The transition to symbolic equations encouraged the development of formal methods in mathematics, starting with algebraic transformations of simple equations. This approach was so successful that equations became the main focus of interest in science. Later, other formal representations were added for the non-numerical aspects of models, graphs being the prime example. The most recent addition to the collection of formal notations for scientific models is software. Today, scientists spend most of their time working with the formalized parts of scientific models, such as equations or algorithms, to the point of neglecting the stories that give them meaning.</p>

<p>What happens when people use the equations of scientific models without a proper understanding of their stories is nicely illustrated by the joke about the physics student who combines Einstein's <em>E</em> = <em>m</em> ⋅ <em>c²</em> with Pythagoras' <em>a²</em> + <em>b²</em> = <em>c²</em> to deduce <em>E</em> = <em>m</em> ⋅ (<em>a²</em> + <em>b²</em>). It works as a joke among physicists because in their community, everybody knows the two inputs and the contexts from which they are taken. For other people, there is nothing funny about this reasoning, and it can even look convincing. Such superficial use of scientific models without understanding their context is actually quite common in today's research: the inappropriate use of statistical inference methods is a major cause of the <a href="https://en.wikipedia.org/wiki/Replication_crisis" >reproducibility crisis</a>.</p>

<p>Computing technology has played a big role in alienating scientists from their models. Most obviously, computers have made it possible to apply scientific models and methods as black-box tools: in an automated fashion, without understanding them. But the attitudes of the software industry, whose development tools computational science has inherited, have also contributed to this tendency. The focus of the software industry is on professional developers making tools for others that almost magically solve some of their problems. Users then get a manual, or hands-on training, for learning how to use the tool, but the inner workings of the tool are something they shouldn't even have to think about. A good tool is one that minimizes learning requirements. Applied to science, this implies that users shouldn't have to know the stories behind the models. Everyone with a dataset should be able to do statistical inference with a few mouse clicks and get a nice visualization. But without the stories, we can easily draw wrong conclusions from nice graphics.</p>

<p>After a long period of separation of tools and stories, computational notebooks are now bringing some of the stories back. The enthusiastic adoption of notebooks by computational scientists is perhaps the best evidence for the importance of stories in science. But today's notebooks capture only the surface stories of a research project. It's tips of icebergs again. The typical notebook makes use of a large number of code libraries that are based on non-trivial scientific models, but the reader of the notebook remains completely unaware of them. Ideally, these models, with their stories, should be only a few clicks away.</p>

<p>So what would an electronic representation of scientific models look like, ideally? It's a collection of cross-referencing stories. In the celestial mechanics example, there's a story about positions, velocities, and accelerations, which refers to a story about time and to a story about derivatives. There is another story that explains mass. The story of Newton's law of motion, which also introduces the concept of force, can then refer to these more fundamental stories. If this description reminds you of Wikipedia, or in fact of any Wiki, you are right. Wikis are also collections of cross-referencing stories. What is missing in Wikis is a machine-readable version of the formalized parts of our models. Which, as I explained in <a href="https://blog.khinsen.net/posts/2020/12/10/the-structure-and-interpretation-of-scientific-models.html" >part 1</a>, needs to allow at least equations, specifications, and algorithms for its ingredients. Another feature that is missing in today's Wikis, although some people are working on it, is the possibility to integrate computational tools in the form of code snippets. Their role would be to give access to visualizations, simulations, and other exploration tools.</p>

<p>My own experiments in this domain are <a href="https://github.com/khinsen/leibniz/" >Leibniz</a>, a digital scientific notation for embedding machine-readable formal models into human-readable stories, and the <a href="https://github.com/activepapers/activepapers-pharo" >Pharo edition of ActivePapers</a>, which integrates datasets and computational tools into a Wiki-like collection of stories. Both ingredients require more work, and then need to be combined. There remains a lot of work to do.</p>

<!-- Local Variables: -->

<!-- mode: markdown -->

<!-- End: -->
 ]]></description> </item><item> <title>The structure and interpretation of scientific models</title> <link>https://blog.khinsen.net/posts/2020/12/10/the-structure-and-interpretation-of-scientific-models.html</link> <pubDate>2020-12-10</pubDate> <author>Konrad Hinsen</author> <guid isPermaLink="true">https://blog.khinsen.net/posts/2020/12/10/the-structure-and-interpretation-of-scientific-models.html</guid> <category><![CDATA[ computational science ]]></category> <description><![CDATA[ <p>It is often said that science rests on two pillars, experiment and theory. Which has lead some to propose <a href="https://physicsworld.com/a/the-third-pillar-of-science/" >one</a> or <a href="https://www.hpcwire.com/2019/04/18/is-data-science-the-fourth-pillar-of-the-scientific-method/" >two</a> additional pillars for the computing age: simulation and data analysis. However, the <em>real</em> two pillars of science are observations and models. Observations are the input to science, in the form of numerous but incomplete and imperfect views on reality. Models are the inner state of science. They represent our current understanding of reality, which is necessarily incomplete and imperfect, but understandable and applicable. Simulation and data analysis are tools for interfacing and thus comparing observations and models. They don't add new pillars, but transforms both of them. In the following, I will look at how computing is transforming scientific models.</p>

<!-- more -->

<h2>Empirical models</h2>

<p>The first type of scientific model that people construct when figuring out a new phenomenon is the <em>empirical</em> or <em>descriptive</em> model. Its role is to capture observed regularities, and to separate them from noise, the latter being small deviations from the regular behavior that are, at least provisionally, attributed to imprecisions in the observations, or to perturbations to be left for later study. Whenever you fit a straight line to a set of points, for example, you are constructing an empirical model that captures the linear relation between two observables. Empirical models almost always have parameters that must be fitted to observations. Once the parameters have been fitted, the model can be used to <em>predict</em> future observations, which is a great way to test its generality. Usually, empirical models are constructed from generic building blocks: polynomials and sine waves for constructing mathematical functions, circles, spheres, and triangles for geometric figures, etc.</p>

<p>The use of empirical models goes back a few thousand years. As I have described in <a href="https://blog.khinsen.net/posts/2017/12/19/data-science-in-ancient-greece.html" >an earlier post</a>, the astronomers of antiquity who constructed a model for the observed motion of the Sun and the planets used the same principles that we still use today. Their generic building blocks were circles, combined in the form of epicycles. The very latest variant of empirical models is machine learning models, where the generic building blocks are, for example, artificial neurons. Impressive success stories of machine learning models have led some enthusiasts to proclaim  <a href="https://www.wired.com/2008/06/pb-theory/" >the end of theory</a>, but I hope to be able to convince you in the following that empirical models of any kind are the beginning, not the end, of constructing scientific theories.</p>

<p>The main problem with empirical models is that they are not that powerful. They can predict future observations from past observations, but that's all. In particular, they cannot answer what-if questions, i.e. make predictions for systems that have never been observed in the past. The epicycles of Ptolemy's model describing the motion celestial bodies cannot answer the question how the orbit of Mars would be changed by the impact of a huge asteroid, for example. Today's machine learning models are no better. Their latest major success story as I am writing this is the <a href="https://deepmind.com/blog/article/alphafold-a-solution-to-a-50-year-old-grand-challenge-in-biology" >AlphaFold predicting protein structures from their sequences</a>. This is indeed a huge step forward, as it opens the door to completely new ways of studying the folding mechanisms of proteins. It is also likely to become a powerful tool in structural biology, if it is actually made available to biologists. But it is not, as DeepMind's blog post claims, &quot;a solution to a 50-year-old grand challenge in biology&quot;. We still do not know what the fundamental mechanisms of protein folding are, nor how they play together for each specific protein structure. And that means that we cannot answer what-if questions such as &quot;How do changes in a protein's environment influence its fold?&quot;</p>

<h2>Explanatory models</h2>

<p>The really big success stories of science are models of a very different kind. <em>Explanatory</em> models describe the underlying mechanisms that determine the values of observed quantities, rather than extrapolating the quantities themselves. They describe the systems being studied at a more fundamental level, allowing for a wide range of generalizations.</p>

<p>A simple explanatory model is given by the <a href="https://en.wikipedia.org/wiki/Lotka%E2%80%93Volterra_equations" >Lotka-Volterra equations</a>, also called predator-prey equations. This is a model for the time evolution of the populations of two species in a preditor-prey relation. An example is shown in this plot (Lamiot, CC BY-SA 4.0 <a href="https://creativecommons.org/licenses/by-sa/4.0">https://creativecommons.org/licenses/by-sa/4.0</a>, via Wikimedia Commons):</p>

<p><img src="https://upload.wikimedia.org/wikipedia/commons/5/5b/Milliers_fourrures_vendues_en_environ_90_ans_odum_1953_en.jpg" alt="predator-prey" width="600"/></p>

<p>An empirical model would capture the oscillations of the two curves and their correlations, for example by describing the populations as superpositions of sine waves. The Lotka-Volterra equations instead describe the interactions between the population numbers: predators and prey are born and die, but in addition predators eat prey, which reduces the number of prey in proportion to the number of predators, and contributes to a future increase in the number of predators because they can better feed their young. With that type of description, one can ask what-if questions: What if hunters shoot lots of predators? What if prey are hit by a famine, i.e. a decrease in their own source of food? In fact, the significant deviations from regular periodic change in the above plot suggests that such &quot;outside&quot; events are quite important in practice.</p>

<p>Back to celestial mechanics. The decisive step towards an explanatory model was made by Isaac Newton, after two important preparatory steps by Copernicus and Kepler, who put the Sun at the center, removing the need for epicycles, and described the planets' orbits more accurately as ellipses. Newton's laws of motion and gravitation fully explained these elliptical orbits and improved on them. More importantly, they showed that the fundamental laws of physics are the same on Earth and in space, a fact that may seem obvious to us today but wasn't in the 17th century. Finally, Newton's laws have permitted the elaboration of a rich theory, today called &quot;classical mechanics&quot;, that provides several alternative forms of the basic equations (in particular <a href="https://en.wikipedia.org/wiki/Lagrangian_mechanics" >Lagrangian</a> and <a href="https://en.wikipedia.org/wiki/Hamiltonian_mechanics" >Hamiltonian</a> mechanics), plus derived principles such as the conservation of energy. As for what-if questions, Newton's laws have made it possible to send artefacts to the moon and to the other planets of the solar system, something which would have been unimaginable on the basis of Ptolemy's epicycles.</p>

<p>So far I have cited two explanatory models that take the form of differential equations, but that is not a requirement. An example from the digital age is given by <a href="https://en.wikipedia.org/wiki/Agent-based_model" >agent-based models</a>. There is, however, a formal characteristic that is shared by all explanatory models that I know, and that distinguishes them from empirical models: they take the form of specifications.</p>

<h2>Specifications and equations vs. algorithms and functions</h2>

<p>Let's look at a simple problem for illustration: sorting a list of numbers (or anything else with a well-defined order). I have a list <code>L</code>, with elements <code>L[i]</code>, <code>i=1..N</code> where <code>N</code> is the length of the list <code>L</code>. What I want is a sorted version which I will call <code>sorted(L)</code>. The <em>specification</em> for <code>sorted(L)</code> is quite simple:</p>

<ol>
<li><code>sorted(L)</code> is a list of length <code>N</code>.</li>
<li>For all elements of <code>L</code>, their multiplicities in <code>L</code> and <code>sorted(L)</code> are the same.</li>
<li>For all <code>i=1..N-1</code>, <code>sorted(L)[i] ≤ sorted(L)[i+1]</code>.</li>
</ol>

<p>Less formally: <code>sorted(L)</code> is a list with the same elements as <code>L</code>, but in the right order.</p>

<p>This specification of <code>sorted(L)</code> is complete in that there is one unique list that satisfies it. However, it does not provide much help for actually constructing that list. That is what a sorting <em>algorithm</em> provides. There are many known algorithms for sorting, and you can learn about them from <a href="https://en.wikipedia.org/wiki/Sorting_algorithm" >Wikipedia</a>, for example. What matters for my point is that (1) given the specification, it is not a trivial task to construct an algorithm, (2) given a few algorithms, it is not a trivial task to write down a common specification that they satisfy (assuming of course that it exists). And that means that specifications and algorithms provide complementary pieces of knowledge about the problem.</p>

<p>In terms of levels of abstraction, specifications are more abstract than algorithms, which in turn are more abstract than implementations. In the example of sorting, the move from specification to algorithm requires technical details to be filled in, in particular the choice of a sorting algorithm. Moving on from the algorithm to a concrete implementation involves even more technical details: the choice of a programming language, the data structures for the list and its elements, etc.</p>

<p>In the universe of continuous mathematics, the relation between equations (e.g. differential equations) and the functions that satisfy them is exactly the same as the relation between specifications and algorithms in computation. Newton's equations can thus be seen as a specification for the elliptical orbits that Kepler had described a bit earlier. Like in the case of sorting, it is not a trivial task to derive Kepler's elliptical orbits from Newton's equations, nor is it a trivial task to write down Newton's equations as the common specification of all the (approximatively) elliptical orbits in the solar system. The two views of the problem are complementary, one being closer to the observations, the other providing more insight.</p>

<p>One reason why specifications and equations are more powerful is that they are modular. Two specifications combined make up another, more detailed, specification. Two equations make up a system of equations. An example is given my Newton's very general law of motion, which is extended by his law of gravitation to make a model for celestial mechanics. The same law of motion can be combined with different laws defining forces for different situations, for example the motion of an airplane. In contrast, there is no way to deduce anything about airplanes from Kepler's elliptical planetary orbits. Functions and algorithms satisfy <em>complete</em> specifications, and conserve little information about the <em>components</em> from which this complete specification was constructed.</p>

<h2>A challenge for computational science</h2>

<p>Computational science initially used computers as a tool for applying structurally simple but laborious computational algorithms. The focus was on efficient implementations of known algorithms, later also on developing efficient algorithms for solving well-understood equations. The steps from specification to algorithm to implementation were done by hand, with little use of computational tools.</p>

<p>That was 60 years ago. Today, we have computational models that are completely unrelated to the mathematical models that go back to the 19th century. And when we do use the foundational mathematical models of physics and chemistry, we combine them with concrete systems specifications whose size and complexity requires the use of computational tools. And yet, we still focus on implementations and to a lesser degree on algorithms, neglecting specifications almost completely. For many routinely used computational tools, the implementation is the only publicly accessible artefact. The algorithms they implement are often undocumented or not referenced, and the specifications from which the algorithms were derived are not written down at all. Given how crucial the specification level of scientific models has been in the past, we can expect to gain a lot by introducing it into computational science as well.</p>

<p>To do so, we first need to develop a new appreciation for <a href="https://f1000research.com/articles/3-101/v2" >scientific models as distinct from the computational tools that implement them</a>. We then need to think about how we can actually <a href="https://peerj.com/articles/cs-158/" >introduce specification-based models into the workflows of computational science</a>. This requires  designing computational tools that let us move freely between the three levels of specification, algorithm, and implementation. This is in my opinion the main challenge for computational science in the 21st century.</p>

<h2>Finally...</h2>

<p>Some readers may have recognized that the title of this post is a reference to two books, <a href="https://mitpress.mit.edu/sites/default/files/sicp/full-text/book/book.html" >Structure and Interpretation of Computer Programs</a> (with a <a href="https://sarabander.github.io/sicp/html/index.xhtml" >nice though inofficial online version</a>) and <a href="https://mitpress.mit.edu/books/structure-and-interpretation-classical-mechanics" >Structure and Interpretation of Classical Mechanics</a> (also <a href="https://tgvaughan.github.io/sicm/toc.html" >online</a>). The second one is actually somewhat related to the topic of this post: it is a textbook on classical mechanics that uses computational techniques for clarity of exposition. More importantly, both books focus on inducing a deep understanding of their topics, rather than on teaching superficial technical details. This humble blog post cannot pretend to reach that level, of course, but its goal is to spark developments that will culminate in textbooks of the same quality as its two inspirations.</p>

<h3>Comments retrieved from Disqus</h3>

<ul>
<li><i>Konrad Hinsen:</i><p>A recommended follow-up read: <a href="https://www.quora.com/What-is-declarative-programming/answer/Alan-Kay-11" rel="nofollow noopener" title="https://www.quora.com/What-is-declarative-programming/answer/Alan-Kay-11">What is declarative programming?<br></a> by Alan Kay. His "what" and "how" is almost the same distinction as "specification" vs "algorithm".</p></li>
</ul>

<!-- Local Variables: -->

<!-- mode: markdown -->

<!-- End: -->
 ]]></description> </item><item> <title>Some comments on AlphaFold</title> <link>https://blog.khinsen.net/posts/2020/12/02/some-comments-on-alphafold.html</link> <pubDate>2020-12-02</pubDate> <author>Konrad Hinsen</author> <guid isPermaLink="true">https://blog.khinsen.net/posts/2020/12/02/some-comments-on-alphafold.html</guid> <category><![CDATA[ science ]]></category><category><![CDATA[ proteins ]]></category> <description><![CDATA[ <p>Many people are asking for my opinion on the recent <a href="https://deepmind.com/blog/article/alphafold-a-solution-to-a-50-year-old-grand-challenge-in-biology" >impressive success of AlphaFold at CASP14</a>, perhaps incorrectly assuming that I am an expert on protein folding. I have actually never done any research in that field, but it's close enough to my research interests that I have closely followed the progress that has been made over the years. Rather than reply to everyone individually, here is a public version of my comments. They are based on the limited information on AlphaFold that is available today. I may come back to this post later and expand it.</p>

<!-- more -->

<p>First of all, the GDT scores obtained by AlphaFold are impressive, which is of course the reason for all the buzz at the moment. The GDT score measures how close a predicted structure is to the experimentally determined one. It is defined on a scale from 0 to 100 and can roughly be interpreted as the percentage of amino acid residues that were placed correctly. For about 2/3 of the proteins in this year's competition, AlphaFold achieved a GDT score in the 90s, whereas in the not so distant past, a score in the 70s was already considered very good. Which exact techniques were used to obtain the predicted structures is not something I can comment on: as far as I know, no technical details have been made public so far. Nor is AlphaFold a publicly available program or service that scientists could explore or apply to their own work. So all we know for now is that DeepMind, the company behind AlphaFold, has figured out a way to obtain good scores at CASP14. In the following I will assume that this is not just good luck, and that the method is applicable to a much larger class of proteins than the CASP candidates.</p>

<p>The scores obtained by AlphaFold are clearly a sign of significant progress. But does it mean that we have &quot;a solution to a 50-year-old grand challenge in biology&quot;, as the press release claims? That depends on what exactly one considers that challenge to be.</p>

<p>If the challenge of protein folding is taken to be a purely pragmatic one, i.e. being able to predict structure from sequence, then AlphaFold is a candidate for a solution. How much of a solution will depend on further evaluations that remain to be done, on a larger range of proteins. CASP is limited to proteins for which experimental structures are (just) available. But some proteins resist experimental structure determination, for example because they have no well-defined structure at all. A robust structure prediction tool would have to identify such cases, rather than predict bogus structures. Allosteric proteins, which are proteins that can take more than one stable structure, provide another set of interesting test cases. A third case of interest is protein pairs that differ minimally in their sequence but importantly in structure. The goal of evaluating the robustness of a tool is to understand how it behaves at best, at worst, and for important edge cases, such that its users can judge the trustworthiness of its results.</p>

<p>For many scientists, including myself, having a black-box structure prediction tool is not sufficient to declare the protein folding problem solved. A solution requires an in-depth understanding of the mechanisms that determine protein structure. Whether or not AlphaFold can contribute to identifying these mechanisms is a question that scientists can only start to examine, and only if AlphaFold becomes sufficiently accessible and inspectable for critical examination by outside experts. I hope this will happen, and in fact I am optimistic that it will happen: the problem is important enough to deserve a serious effort by everyone involved. AlphaFold is not the end of the quest for a solution of the protein folding problem, but it could well turn out to be the beginning of a new chapter in the story.</p>

<!-- Local Variables: -->

<!-- mode: markdown -->

<!-- End: -->
 ]]></description> </item><item> <title>The four possibilities of reproducible scientific computations</title> <link>https://blog.khinsen.net/posts/2020/11/20/the-four-possibilities-of-reproducible-scientific-computations.html</link> <pubDate>2020-11-20</pubDate> <author>Konrad Hinsen</author> <guid isPermaLink="true">https://blog.khinsen.net/posts/2020/11/20/the-four-possibilities-of-reproducible-scientific-computations.html</guid> <category><![CDATA[ reproducible research ]]></category><category><![CDATA[ scientific computing ]]></category> <description><![CDATA[ <p>Computational reproducibility has become a topic of much debate in recent years. Often that debate is fueled by misunderstandings between scientists from different disciplines, each having different needs and priorities. Moreover, the debate is often framed in terms of specific tools and techniques, in spite of the fact that tools and techniques in computing are often short-lived. In the following, I propose to approach the question from the scientists' point of view rather than from the engineering point of view. My hope is that this point of view will lead to a more constructive discussion, and ultimately to better computational reproducibility.</p>

<!-- more -->

<p>The format of my proposal is inspired by the well-known <a href="https://www.gnu.org/philosophy/free-sw.en.html" >&quot;four freedoms&quot; that define Free Software</a>. The focus of reproducibility is not on legal aspects, but on technical ones, and therefore my proposal is framed in terms of <em>possibilities</em> rather than freedoms.</p>

<h2>The four essential possibilities</h2>

<p>A computation is reproducible if it offers the four essential possibilities:</p>

<ol>
<li>The possibility to inspect all the input data and all the source code that can possibly have an impact on the results.</li>
<li>The possibility to run the code on a suitable computer of one's own choice in order to verify that it indeed produces the claimed results.</li>
<li>The possibility to explore the behavior of the code, by inspecting intermediate results, by running the code with small modifications, or by subjecting it to code analysis tools.</li>
<li>The possibility to verify that published executable versions of the computation, proposed as binary files or as services, do indeed correspond to the available source code.</li>
</ol>

<p>All of these possibilities come in degrees, measured in terms of the effort required to actually do what is supposed to be possible. For example, inspecting the source code of a computation is much easier for a notebook containing the top-level code, with links to repositories of all dependencies, than for a script available from the authors on request. Moreover, the degree to which each possibility exists can strongly vary over time. A piece of software made available on an institutional Web site is easily inspectable while that site exists, but inspectability drops to zero if the Web site closes down.</p>

<p>The reproducibility profile of a computation therefore consists of four time series, each representing one of the possibilities expressed on a suitable scale with its estimated time evolution. The minimum requirement for the label &quot;reproducible&quot; is a non-zero degree for all four possibilities for an estimated duration of a few months, the time it takes for new work to be carefully examined by peers.</p>

<h2>Rationale</h2>

<p>The possibility to inspect all the source code is required to allow independent verification of the software's correctness, and in particular to check that it does what its documentation claims it does.</p>

<p>The possibility to run the code is required to allow independent verification of the results.</p>

<p>The possibility to explore the behavior of the code is a <em>de facto</em> requirement to fully accomplish the goals of the first possibility. For all but the most trivial pieces of software, inspection of the source code is not enough to convince oneself that it does what it is claimed to do.</p>

<p>The possibility of verifying the correspondence of source code and executable versions is motivated by the complexity of today's software build procedures. Mistakes can as easily be introduced in the build process as in the source code itself. This point is well made by Ken Thompson's Turing Award speech <a href="https://www.cs.cmu.edu/~rdriley/487/papers/Thompson_1984_ReflectionsonTrustingTrust.pdf" >Reflections on Trusting Trust</a>, if you replace mischief by mistake in his arguments.</p>

<h2>Discussion in the context of the state of the art</h2>

<p>The possibility to inspect all the source code is a criterion that is in principle widely accepted, although many people fail to realize its wide-ranging consequences. &quot;All the source code that can possibly have an impact on the results&quot; actually means a <em>lot</em> of software. It includes many libraries, but also language implementations such as compilers and interpreters. Moreover, inspecting a dependency first of all requires precisely identifying it. This remains a difficult task today, and therefore most published computations today do not offer the first essential possibility, no matter how much effort a reader is willing to invest.</p>

<p>It is tempting to introduce another degree of compliance by requiring that only the most relevant parts of the total source code be inspectable. However, that defies the whole purpose of independent verification. Who decides what it relevant? Usually the author of the computation. But if the code declared to be irrelevant by the author is not inspectable, we have to take the author's word for its irrelevance.</p>

<p>The possibility to run the code is also a widely accepted criterion, though not everyone accepts the additional requirement of executability &quot;on a suitable computer of one's own choice&quot;. Software made available as a service (e.g. in the cloud) is considered sufficient for reproducibility by some researchers. Executability is much more susceptible to decay over time than inspectability of the source code, and this is one of the main topics of debate today. Is long-term reproducibility needed? Is it achievable? The answers vary across disciplines. There is unfortunately a strong tendency to auto-censoring here: many scientists believe that long-term reproducibility is not realistic and <em>therefore</em> should not be asked for. This is definitely not true and it is better to frame the question as a trade-off: what is a reasonable price to pay for long-term reproducibility, in a given discipline?</p>

<p>The possibility to explore the behavior of the code is rarely mentioned in discussions of reproducibility. And in fact, exploring the behavior of non-trivial code written by someone else is such a difficult task that many scientists prefer not to require anyone to do it. I am not aware of any scientific journal that expects reviewers of submitted work to check the code of any computation for correctness or at least plausible correctness, which in practice requires examining its behavior. And yet, the scientific method requires <em>everything</em> to be inquirable. It may not be a realistic expectation today, but it should at least be a goal for the future.</p>

<p>Since code explorability is rarely required or even discussed, there is no clear profile of practical implementations either. It's a criterion that requires expert judgement, the expert being a fellow researcher from the same discipline as the author of a computation. It is the software analog of a &quot;well-written&quot; paper, which is a paper that a reader can easily &quot;get into&quot;.</p>

<p>The possibility of verifying the correspondence of source code and executable versions is also rarely mentioned. It is also the least fundamental one of the four essential possibilities, because in principle it can be abandoned if a computation is fully reproducible from source code. In practice, however, that is rarely a realistic option. The size and complexity of today's software assemblies makes it impractical to re-build everything from source code, a process that can take many hours. Nearly all software assemblies we run in scientific computing contain some components obtained in pre-built binary form. While it is perfectly OK for most people, most of the time, to use such pre-built binaries, inquirability requires the possibility to check that these binaries really correspond to the source code that the authors of a computation claim to have used. This is a possibility where a low degree can be quite acceptable.</p>

<h2>Please comment!</h2>

<p>As I said, the goal of this blog post is to start a discussion. Your comments are valuable, possibly more so than the post itself. How important are the four possibilities in your own discipline? How well can they be realized within the current state of the art? Are there additional possibilities you consider important for reproducibility?</p>

<p>Check also the comments on Twitter by exploring the replies to <a href="https://twitter.com/khinsen/status/1329832546474061824" >this tweet</a>.</p>

<h2>Notes added after publication</h2>

<h3>2020-11-22</h3>

<p><a href="https://twitter.com/jermdemo/status/1329866889867059200" >Jeremy Leipzig</a> points out 
<a href="https://icerm.brown.edu/topical_workshops/tw12-5-rcem/icerm_report.pdf" >the 2012 ICERM workshop document</a>, whose appendix A discusses several levels of reproducibility. Its last level (&quot;open or reproducible research&quot;) covers in a general way the four possibilities I discuss above. The lower levels describe research output in which at least one of the four possibilities is not provided.</p>

<h3>2020-11-23</h3>

<p><a href="https://twitter.com/ivotron/status/1329873600472621057" >Ivo Jimenez</a> refers to <a href="https://www.niso.org/standards-committees/reproducibility-badging" >ongoing work</a> at NISO (National Information Standards Organization, USA) to define recommended practices, and <a href="https://twitter.com/npch/status/1330453823568171008" >Neil Chue Hong</a> says they will be out soon.</p>

<p><a href="https://twitter.com/ivotron/status/1330612647763570690" >Ivo Jimenez</a> also mentions an interesting collection of <a href="https://sysartifacts.github.io/" >resources on artifact evaluation for computer systems conferences</a>.</p>

<h3>Comments retrieved from Disqus</h3>

<ul>
<li><i>Roberto Di Cosmo:</i><p>Thanks for this nice post: I like the classification, and I love the acknowledgment of the difficulty to have a "one size fits all" solution when it comes to reproducibility, as the dimension of the problem and the resources available to address it really vary a lot across disciplines, and even inside discipline. A nice example of a <i>"scientific journal that expects reviewers of submitted work to check the code of any computation for correctness or at least plausible correctness, which in practice requires examining its behavior"</i> is Image Processing OnLine (<a href="https://ipol.im" rel="nofollow noopener" title="https://ipol.im">https://ipol.im</a>) that goes a long way along the road to reproducibility.</p><ul>
<li><i>Konrad Hinsen:</i><p>Thanks for mentioning IPOL! I haven't been able to find reviewing guidelines on their Web site, but I will contact the team to find out what exactly their reviewing process evaluates.</p></li>
</ul>
</li>
<li><i>Nicolas Rougier:</i><p>In terms of code interactivity, I find the <a href="https://distill.pub/" rel="nofollow noopener" title="https://distill.pub/">https://distill.pub/</a> journal to be really good even though I imagine it's a lot of work for authors. But it's really nice to be able to play with the model. In my own domain (computational neuroscience) I dream of having really interactive model where you can test what happens if you modify this or that parameter or simply change the random seed. I suspect this wont't come anytime soon since most journals do not even really care about the code, but who knows.</p><ul>
<li><i>Konrad Hinsen:</i><p>Thanks for that nice example, which illustrates possibility #3: the possibility to explore how a computation works. Much of the work by Bret Victor (<a href="http://worrydream.com/)" rel="nofollow noopener" title="http://worrydream.com/)">http://worrydream.com/)</a> is similar to the <a href="http://distill.pub" rel="nofollow noopener" title="distill.pub">distill.pub</a> you cite. But as you say, these are very much examples of hand-crafted presentation software, and thus require a huge investment by the authors. Making such presentations more accessible should be one priority in method and tool development. Jupyter widgets are one step in that direction.</p></li>
</ul>
</li>
</ul>

<!-- Local Variables: -->

<!-- mode: markdown -->

<!-- End: -->
 ]]></description> </item><item> <title>The landscapes of digital scientific knowledge</title> <link>https://blog.khinsen.net/posts/2020/07/08/the-landscapes-of-digital-scientific-knowledge.html</link> <pubDate>2020-07-08</pubDate> <author>Konrad Hinsen</author> <guid isPermaLink="true">https://blog.khinsen.net/posts/2020/07/08/the-landscapes-of-digital-scientific-knowledge.html</guid> <category><![CDATA[ science ]]></category> <description><![CDATA[ <p>Over the last years, an interesting metaphor for information and knowledge curation is beginning to take root. It compares knowledge to a landscape in which it identifies in particular two key elements: streams and gardens. The first use of this metaphor that I am aware of is <a href="https://hapgood.us/2015/10/17/the-garden-and-the-stream-a-technopastoral/" >this essay by Mike Caulfield</a>, which I strongly recommend you to read first. In the following, I will apply this metaphor specifically to scientific knowledge and its possible evolution in the digital era.</p>

<!-- more -->

<p>In the landscape metaphor, streams are timelines of information parcels. News, RSS feeds, Twitter, Facebook, but also scientific journals, are stream media. Gardens are continuously evolving information assemblies that are actively curated by their authors. Encyclopedias and dictionaries are perhaps the oldest examples. In the printed paper era, updating an information collection was expensive because everything had to be reprinted and redistributed. As a consequence, garden-type resources were rare. Digital gardens have no such overhead, and almost no cost other than the work of their curators. More and more people are setting up their own digital gardens as an alternative or complement to the personal stream, better known as a blog. Click <a href="https://joelhooks.com/digital-garden" >here</a>, <a href="https://tomcritchlow.com/blogchains/digital-gardens/" >here</a>, and <a href="https://www.christopherbiscardi.com/what-is-a-digital-garden" >here</a> to see a few examples of personal digital gardens. Like blogs, digital gardens can also be collective efforts, run by a company, a research group, or a larger community. The most widespread tool for digital gardening is the Wiki, but there are also more recent developments in this space, such as <a href="https://www.notion.so/" >Notion</a> or <a href="https://roamresearch.com/" >Roam</a>.</p>

<p>One distinction that I haven't seen mentioned yet in this context is the one between a garden and a park. Both are curated and thus continuously evolving. But whereas gardens are set up and maintained for the benefit and enjoyment of their owners, parks are created and maintained for the benefit and enjoyment of the public. The difference can be subtle, as digital gardens are often visible to the public as well. But they are more like the unwalled garden on the roadside that you can admire passing by than like the park in which you can take a walk and sit down reading a book. A good example of a digital park is <a href="https://www.wikipedia.org/" >Wikipedia</a>.</p>

<p>Science is all about acquiring information about our world and distilling it into knowledge, and therefore requires a fair bit of gardening. In its early days, it was managed as a garden by and for a small community of people who were motivated by curiosity and relied on personal wealth or on sponsors for doing their work. Universities employed scientists more for teaching than for doing research. Research was done by individuals or small teams, and presented at conferences or in journal articles, much like today. Unlike today, most scientists were up to date on everything that was happening in their field, and had personal exchanges with almost everyone else, in face-to-face meetings or by correspondence. Conferences were events in which conflicting results and different points of views were actively debated, enabling the formation of consensus. The streams of papers and conference contributions thus watered the garden of scientific knowledge.</p>

<p>All that changed after World War II, when science underwent rapid growth as states injected a lot of money while at the same time expecting the scientific community to cultivate a park rather than a garden, contributing to the common good. Keeping up to date with everybody else's work became more and more difficult, slowly eroding the possibility of consensus formation through live debate at conferences. Productivity metrics focusing on what is easiest to quantify ended up rewarding scientists for contributing to the stream of journal articles, but not for contributing to the cultivation of the park of scientific knowledge. Today, the streams of journal articles have become torrents whose distillation into knowledge is becoming ever more difficult. A good illustration is the (serious) <a href="https://science.sciencemag.org/content/368/6494/924.full" >proposal to use machine learning tools</a> to make sense of the &quot;tsunami&quot; of articles resulting from the intense research on the Covid-19 pandemic.</p>

<p>The design and implementation of new mechanisms for knowledge distillation and consensus formation is thus a major challenge for science today, and even though machine learning techniques may prove to be helpful, I expect this to remain a fundamentally human task for a long time to come. These new mechanisms must combine technological aspects (good tools for working towards these goals) and social aspects (incentives for scientists to participate in this work). As always, the social aspects are the harder problem. As a first step and as a source for inspiration, let's look at similar existing mechanisms in science and elsewhere. Which digital parks exist? How do they work? Can their mechanisms be adapted to other applications?</p>

<p>I have already cited Wikipedia as a prime example of a digital park. I had expected to see Wikis more widely used as a platform for collective information curation in science, be it as gardens or parks, but when I searched for examples I found surprisingly few, e.g. <a href="http://www.tricki.org/" >Tricki</a> (for mathematical problem-solving techniques) or the <a href="https://complexityzoo.uwaterloo.ca/Complexity_Zoo" >Complexity Zoo</a> (on classes of computational complexity). One problematic aspects of Wikis is that they present only a single view to the outside world. They are better suited for presenting an established consensus than for supporting the process of consensus formation in rapidly evolving fields. One of the rare cases of a Wiki used for coordinating collaborative research, rather than for summarizing the state of the art, is the <a href="https://asone.ai/polymath/index.php" >Polymath project</a>. It is probably not a coincidence that this has happened in mathematics, a domain whose working habits remain close to those of the early scientific community, with individuals having more agency than in disciplines that are more dependent on material resources.</p>

<p><a href="http://fed.wiki.org/" >Federated Wiki</a> is an interesting evolution of the Wiki concept (initiated by the original inventor of the Wiki, <a href="https://twitter.com/WardCunningham" >Ward Cunningham</a>) that allows individual contributors to maintain and publish their own view while at the same time encouraging reciprocal borrowing of content. <a href="https://www.youtube.com/watch?time_continue=111&v=2Gi9SRsRrE4" >This video</a> illustrates the process nicely. Whereas federated Wiki looks like a promising approach to consensus formation, the technical obstacles to setting up a federated Wiki are significant (contributors must manage personal Web servers and domains) and make it difficult to evaluate it in practice.</p>

<p>Perhaps the most frequent kind of digital park in science today is the collaborative software development project, hosted on platforms such as <a href="https://github.com/" >GitHub</a>, <a href="https://gitlab.com/" >GitLab</a>, or similar platforms operated by research institutions. Ignoring the differences resulting from the focus on code rather than prose, the main differences between platforms and Wikis are (1) a stronger emphasis on discussion (&quot;issues&quot;) and (2) the co-existence of multiple branches representing different public or private views of a common project, with one branch (conventionally named &quot;master&quot; or &quot;main&quot;) representing the current consensus.</p>

<p>Collaborative software projects are an interesting case study also for the question of incentives. The lack of recognition of software development as a research activity has been deplored for a long time. It is usually attributed to the relative novelty of software as a form of research output. But I suspect that the park nature of software, as opposed to the stream nature of journals, is also an important factor, because it makes it more difficult to evaluate an individual's contributions based on purely formal (and thus easily measurable) criteria. On the other hand, today's collaborative platforms make such an evaluation technically feasible, by counting for example the number of commits made by an individual, or the number of lines changed by those commits. Everybody involved in software development will probably agree that this is a stupid metric, but it's no more stupid than counting publications weighted by journal impact factor.</p>

<p>Another social aspect that is well illustrated by software is the difficulty of the transition from gardens to parks. Projects usually start out as gardens, with a small team developing software for its own use. Then early users start to join, who by necessity have to figure out for themselves how to adapt the software to their needs, and are thus likely to become contributors. With an increasing user base, developers have an interest to work on more robust code and better documentation, in order to reduce the effort of technical support. At that stage, the software becomes attractive to less technically minded users who see no need to ever get in touch with the development community. These users consider the software a park, even if its developers still consider it a garden, leading to contradictory tacit expectations on both sides about the priorities for future maintenance, which I have described <a href="https://blog.khinsen.net/posts/2020/02/26/the-rise-of-community-owned-monopolies.html" >in an earlier post</a>. Developers tend to contribute to this confusion by advertising their project as a park while maintaining it as a garden.</p>

<p>The above examples illustrate that the technical challenges of digital gardens and parks are somewhat understood and partially solved. Collaborative software development platforms in particular have proven very effective. Adapting their concepts to different use cases and different users looks definitely possible, although the effort required should not be underestimated, in particular for developing appropriate user interfaces. But the real challenge is creating incentives for collaboration, in a universe currently dominated by competition for limited resources.</p>

<h3>Comments retrieved from Disqus</h3>

<ul>
<li><i>dobeyog618:</i><p>Here is the best chance to get <a href="https://www.elite-citizenship.com/" rel="nofollow noopener" title="https://www.elite-citizenship.com/">citizenship by investment antigua</a>.</p></li>
<li><i>dobeyog618:</i><p><a href="https://www.westausttrees.com/" rel="nofollow noopener" title="https://www.westausttrees.com/">ausin tree special</a></p></li>
<li><i>civobor871:</i><p>Hey, Thank you for this share, Are you looking for a "<a href="https://www.customfenceinstall.com/" rel="nofollow noopener" title="https://www.customfenceinstall.com/">fence company</a>". Then check our website now.</p></li>
<li><i>Sarah Boyer:</i><p>Hey, We are a Reliable &amp; Professional Roofing Company. Check our service <a href="https://www.roofingrepaircost.com/" rel="nofollow noopener" title="https://www.roofingrepaircost.com/">troy roofing repair company</a></p></li>
<li><i>danny lee:</i><p><a href="https://blossomnailspa.net/" rel="nofollow noopener" title="https://blossomnailspa.net/">best nail salon</a></p></li>
<li><i>Ericson2314:</i><p>&gt; One of the rare cases of a Wiki used for coordinating collaborative <br>research, rather than for summarizing the state of the art, is the Polymath project.<br> It is probably not a coincidence that this has happened in mathematics,<br> a domain whose working habits remain close to those of the early <br>scientific community, with individuals having more agency than in <br>disciplines that are more dependent on material resources.</p><p>I think you are on to something, because the only other example I know of is <a href="https://ncatlab.org/nlab/" rel="nofollow noopener" title="https://ncatlab.org/nlab/">https://ncatlab.org/nlab/</a>, which is also Math related. (And by mathematicians with a computer scientist slant that only makes familiarity with the broader world of Wikis beyond Wikipedia more likely.)</p></li>
</ul>

<!-- Local Variables: -->

<!-- mode: markdown -->

<!-- End: -->
 ]]></description> </item><item> <title>An open letter to software engineers criticizing Neil Ferguson&#039;s epidemics simulation code</title> <link>https://blog.khinsen.net/posts/2020/05/18/an-open-letter-to-software-engineers-criticizing-neil-ferguson-s-epidemics-simulation-code.html</link> <pubDate>2020-05-18</pubDate> <author>Konrad Hinsen</author> <guid isPermaLink="true">https://blog.khinsen.net/posts/2020/05/18/an-open-letter-to-software-engineers-criticizing-neil-ferguson-s-epidemics-simulation-code.html</guid> <category><![CDATA[ scientific software ]]></category> <description><![CDATA[ <p>Dear software engineers,</p>

<p><a href="https://lockdownsceptics.org/code-review-of-fergusons-model/" >Many</a>  <a href="https://www.telegraph.co.uk/technology/2020/05/16/coding-led-lockdown-totally-unreliable-buggy-mess-say-experts/" >of you</a> <a href="https://chrisvoncsefalvay.com/2020/05/09/imperial-covid-model/" >were</a> <a href="https://github.com/mrc-ide/covid-sim/issues" >horrified</a> at the sight of <a href="https://github.com/mrc-ide/covid-sim" >the C++ code that Neil Ferguson and his team wrote to simulate the spread of epidemics</a>. I feel with you. The only reason why I am less horrified than you is that I have seen a lot of similar-looking code before. It is in fact quite common in scientific computing, in particular in research projects that have been running for many years. But like you, I don't have much trust in that code being a faithful and trustworthy implementation of the epidemiological models that it is supposed to implement, and I don't want to defend bad code in science.</p>

<!-- more -->

<p>However, many of your specific criticisms show a lack of familiarity with today's academic research. This code is not the sole result of 13 years of tax-payer-funded research. The core of that research is building and applying the model it implemented by the code, the code itself is merely a means to this end. The scientists who wrote this horrible code most probably had no training in software engineering, and no funding to hire software engineers. And the senior or former scientists who decided to give tax-payer money to this research group are probably even more ignorant of the importance of code for science. Otherwise they would surely have attributed money for software development, and verified the application of best practices.</p>

<p>But the main message of this letter is something different: it's about <em>your</em> role in this story. That's of course a collective you, not you the individual reading this letter. It's you, the software engineering community, that is responsible for tools like C++ that look as if they were designed for shooting yourself in the foot. It's also you, the software engineering community, that has made no effort to warn the non-expert public of the dangers of these tools. Sure, you have been discussing these dangers internally, even a lot. But to outsiders, such as computational scientists looking for implementation tools for their models, these discussions are hard to find and hard to understand. There are lots of tutorials teaching C++ to novices, but I have yet to see a single one that starts with a clear warning about the dangers. You know, the kind of warning that every instruction manual for a microwave oven starts with: don't use this to dry your dog after a bath. A clear message saying &quot;Unless you are willing to train for many years to become a software engineer yourself, this tool is <em>not</em> for you.&quot;</p>

<p>As a famous member of your community famously said, <a href="https://a16z.com/2011/08/20/why-software-is-eating-the-world/" >software is eating the world</a>. That gives you, dear software engineers, a lot of power in modern society. But power comes with responsibility. If you want scientists to construct reliable implementations of models that matter for public health decisions, the best you can do is make good tools for that task, but the very least you must do is put clear warning signs on tools that you do <em>not</em> want scientists to use - always keeping in mind that scientists are not software engineers, and have neither the time nor the motivation to become software engineers.</p>

<p>Consider what you, as a client, expect from engineers in other domains. You expect cars to be safe to use by anyone with a driver's license. You expect household appliances to be safe to use for anyone after a cursory glance at the instruction manuals. It is reasonable then to expect <em>your</em> clients to become proficient in <em>your</em> work just to be able to use your products responsibly? Worse, is it reasonable to make that expectation tacitly?</p>

<p>Some of you have helped with a first round of code cleanup, which I think is the most constructive attitude you can adopt in the short term. But this is not a sustainable approach for the future. We can't ask software experts for a code review every time we do something important. We computational scientists need you software engineers to help us build a better future for computer-aided research. Which means pretty much all research, because software has been eating science as well for a while. Can we count on your help?</p>

<hr>

<p><em>PS added 2020-05-19T10:30:</em> This post has provoked a lively discussion not only in the comments below but also <a href="https://twitter.com/khinsen/status/1262307434632282112" >on Twitter</a>. There are way too many comments for me to reply to each one individually, so I decided to address recurrent topics in this follow-up.</p>

<p>Many people seem to have read my post as putting the main responsibility for the problems related to the cited simulation code on software engineers. This was most certainly not my intention. Scientists, policy makers, and journalists have all contributed to a less than satisfactory outcome. My open letter is clearly addressed at a particular group of people (software engineers criticizing the Imperial College Covid-19 simulations on the basis of code quality) and clearly states its focus on the role of software technology, which is what the target audience seems to overlook. A focus is always an arbitrary choice of an author for the sake of brevity or clarity. A glance at the rest of my blog should suffice to show that I do consider computational scientists responsible for their technological choices and their consequences. However, my main intention was not assigning blame for events in the past, but outline what needs to change to prevent similar events in the future.</p>

<p>The car analogy was another frequent target of critical comments. Cars are a mature technology, in which many professions (engineers, workers, mechanics, driving instructors, drivers, etc.) have well-defined roles and everyone involved has a general understanding of the role of everyone else. Software is an immature technology in which roles remain fuzzy and everyone has an even fuzzier view of which other roles exist and who fills them. The discussion of my open letter has provided ample evidence for this all-encompassing fuzziness. 
What we collectively need to work on is turning software into a mature technology. That requires all stakeholders to make their own role views explicit and then negotiate shared role definitions with everyone else. Several commenters have pointed out the emergence of research software engineers (RSEs) as a sign for progress, and I completely agree. But even the role of RSEs remains fuzzy at this time. Should they work a collaborators on research projects, with a particular specialization? Or as occasional consultants or service providers to researchers? Their interaction with the software engineering universe is even less clear. For now it is mostly one-way in that RSEs bring software technology from the outside into research labs. What my letter argues for is an action in the opposite direction: make software technology evolve to adapt to the specific needs of scientists. A big problem is culture clash. In academia, scientists are traditionally on top of the power pyramid and are used to everyone else working for them (even though the top position is now held by managers, but that's a different story). In the tech world, it's software engineers who are kings and used to everyone else, including their clients, obeying their directives. In the worst case, RSEs might find themselves trapped in the valley between two power pyramids. In the ideal case (from my point of view), they will be diplomats working towards a merger of the two kingdoms, with a simultaneous transformation into a democracy.</p>

<h3>Comments retrieved from Disqus</h3>

<ul>
<li><i>cd:</i><p>I have been involved in both sides of this. And my code for academic research purposes was shit. It was written to get the job done. I gave no thought to performance, maintainability or anything else for that matter it wasn't even structured.</p><p>When I got a job as a professional I got a real culture shock. The standards that are required are orders of magnitude higher.</p><p>You might say well scientists have to do other bits of research to as well as write the code. And that is true. But it also pains me to say that before becoming a professional software engineer I also worked as scientist in a commercial company. And again the standard of research and development was much, much higher.</p><p>Academia is sloppy and peer review is sloppy.</p></li>
<li><i>Brian L. McMichael:</i><p>This is like trying to build a house without any previous experience and then blaming professional homebuilders for not making it easier for commonfolk to nail 2x4's together.</p></li>
<li><i>David Sarma:</i><p>For the type of software that's under discussion (a concrete realization of a mathematical model), what the scientist cares about is the mathematical model, not the realization of it. This is why "software quality" is shunned as a concern: ideally, it should NOT be something that one has to be concerned about. The ideal scenario would be an algorithmic translation of the mathematical model into computer instructions, with no human there to provide inconsistency and bugs into the process.</p><p>The direction that things are headed are pointed to by projects like CVXPY / CVXR. We want a compiler for mathematical language, whose output we for the most part don't have to look at or care about, in the same sense that programmers do not for the most part inspect the assembly language output of their programs, and criticize them for being poorly organized, verbose and unreadable monstrosities. The *solvers* that the model uses of course should be under the most intense scrutiny by the most skilled software engineers... but this goes beyond the scope of the scientific part of the project, in the same sense that we depend on linear algebra libraries working correctly, but modeling greenhouse gases is NOT linear algebra.</p><p>In other scenarios, flipping to the dual marks the maturation of a field (ex. "classical" renderers transitioning to physically-based rendering), the end of certain classes of conflict and stress (caustic situations and antagonistic relationships), and the ability to focus on content rather than technology (telling good stories vs attaining photorealism). (Other side effects are, deprecations and job loss, industry-wide collapse in some cases, or transition into other business models.) The injection of constraint solvers into mainstream software engineering (in the manner that Rust does) will likely lead to similar outcomes: the end of certain classes of free-for-all improvisation, and better ability to focus on the content under discussion.</p><ul>
<li><i>Konrad Hinsen:</i><p>Thanks for pointing out that there are indeed some developments pointing in the right direction!</p></li>
</ul>
</li>
<li><i>Undercover modeller:</i><p>I think there is a point that is being missed.  This software is essentially repurposed software. Its software built for academic purposes being repurposed as business/nation critical software for making decisions that affect life and death decisions for thousands of people and affect the livelihoods of many millions of people.</p><p>I write business critical models as a living using software engineering processes that I've been taught over the years.  However, if I was asked to write, say, safety critical software for an plane. I would not apply the same processes, nor know what processes should be applied.</p><p>The issue lies in those who commissioned the software, and to a certain and lesser extent, the academics who built it, who should have known that using academic software development techniques was inappropriate for business critical software that might have such a major impact on people's lives.</p><ul>
<li><i>Konrad Hinsen:</i><p>That's an interesting remark. Yes, the software has been repurposed. But: that happens all the time with research software. The small function written for the exploration of a dataset ends up in community-managed software and then maybe in industrial applications. Nobody ever commissions software in academia. It's very much bottom-up.</p><ul>
<li><i>Undercover modeller:</i><p>Alas that is true.  That's why we never incorporate open source components into our models without either rewriting it or subjecting it to our own testing program.</p></li>
</ul>
</li>
</ul>
</li>
<li><i>Colin Gillespie:</i><p>So while I sort of agree with your argument, I do think that academics hold much of the blame.</p><p>An analogous situation is statistics. Ask any statistician that to perform a vaguely complex analysis requires training and experience, yet many scientists are happy to just copy and paste code/analysis from random parts of the internet.</p><p>In building software, the REF (run by academics), actively snubs contributions to software. Instead, they are encouraged to have the "Facebook" type model, publish often and fast. How often are papers retracted if the software is wrong or has a bug?</p></li>
<li><i>Michael Höhle:</i><p>First page of the OpenBugs Manual - <a href="http://www.openbugs.net/Manuals/Manual.html" rel="nofollow noopener" title="http://www.openbugs.net/Manuals/Manual.html">http://www.openbugs.net/Man...</a> <a href="https://uploads.disquscdn.com/images/c076b4ae98f720044a01457d57538d354865aa7a8aadfc80b16227e21821d5c6.png" rel="nofollow noopener" title="https://uploads.disquscdn.com/images/c076b4ae98f720044a01457d57538d354865aa7a8aadfc80b16227e21821d5c6.png">https://uploads.disquscdn.c...</a></p><ul>
<li><i>Konrad Hinsen:</i><p>Excellent - thanks for this example!</p></li>
</ul>
</li>
<li><i>Brian Sides:</i><p>There is a department of Computing at Imperial College London<br><a href="https://www.imperial.ac.uk/computing" rel="nofollow noopener" title="https://www.imperial.ac.uk/computing">https://www.imperial.ac.uk/...</a><br>Where they teach computer programming</p><p>"Welcome to the Department of Computing<br>Computers are the most significant and exciting technological innovations of the last hundred years. In the future, they will play an even more considerable role in medicine, the sciences, industry, communication and the arts. It's safe to say that the science of Computing will remain a vitally important part of modern civilisation and will be responsible for many of the most important changes in the world in which we live.</p><p>Career prospects<br>Our graduates have the highest average salary for a computing degree in the UK and have gone into a range of careers including Media, Software, Finance and Research with employers such as Google, Microsoft, Facebook, Amazon and Bloomberg. A career in Computing opens the door to a wide range of careers."</p><p>Yet over a period of more than 20 years a pandemic computer model was developed.<br>This is the same pandemic model used for Neil Ferguson.s previous predictions<br>That were so far off. 2001 mad cow disease  leading to Six millions cattle and sheep slaughtered.<br>Millions were spent buying vaccines against swine flu in 2009 .</p><p>If this was some internal test program put together quickly . Then you might expect this quality  of code. But even then some bad practices have been employed.</p><p>I have emailed some in charge of the computing department asking for comment on the code. But no reply.</p><p>Obviously the code was not developed by the computer programming department.<br>Those developing the pandemic model thought they were so clever they did not need bother with things like documentation and testing or checking with those who know how to programme.</p><p>The crude method of projecting numbers forward is as questionable as the code.<br>If I write a computer program to calculate how many chip shops will be in my small town. If like Neil Ferguson report 9 says It is Exponential doubling every 5 days.<br>The program will predict in six month there will be over 8 billion chip  shops in my small town.<br>Computers do not have brains, They can not know anything ,that 8 billion chip shops in a small town is impossible,</p></li>
<li><i>elgato:</i><p>As a physicist that development software for my own research and for fun I couldn't agree more with this letter. I code in c++ and HPC mainly gpus. In academia there is some grade of mis appreciation for developing good code. Scientist, that like me, try to make things better we are seen as people that lose time instead of producing results. Arrogance is also a problem. Learning the idiomstics of a given programing language is easy but writing a maintainable code is not. Long time ago after many years of programming I learnt design patterns which changed my way to code. From there I moved to more in depth about software and how to. But the truth is that in school nobody bothered to tell us about it. We learn by simple doing with no formal education whatsoever which is bad, really bad. This piece of code is only an example of many code out in the wild used by people in day to day bases, just this one as it happens it might affect decision of policies makers. Also, for software developers in here, not all the scientist produce the piece of crap that it's being discuss here, please don't put all the people in the same bag.</p></li>
<li><i>Brian Sides:</i><p>The original code was written in 'c' not 'C++' there are some Fortran functions that are supported by the 'c' library (some think the code was written originally in Fortran and ported to 'c' ) The code has now been ported to 'C++'  and split into multiple files but with out using the object orientated features . The code is still mostly just 'c'.  Bugs have been found during the conversion process.<br>The code was written over a period of more than 20 years. Many thousands of man hours.<br>at the end they had produced one single file of 15,000 lines of code<br>that is less that 2 lines of code a day<br>The code is undocumented with a host of single letter variables.<br>Data is read and written with out error checking , data is not verified there is no file signature or checksum.<br>It is very simplistic simulation code.<br>All code needs to be tested. Important code needs to be independently tested.</p><p>These are highly qualified highly paid people . They had a team working on this.<br>There has been a large investment. Where was the management over site.</p><p>It is clear from comments by Neil Ferguson that he thought that it was thousands of lines of undocumented code he kept in his head . Was not a problem , he was kind of proud of it,</p><p>As well as the code the method of taking some data of questionable choice then making a many assumptions and applying these to a limited simulation with a small set of real world statistics . That in no ware takes into account the way these and other factors interact. Is very questionable.</p><p>There is no excuse . Sage have failed to check the model had been properly tested.<br>The predictions from this faulty model have misinformed the Government<br>and led to this ill informed Lock down</p></li>
<li><i>sde-2243:</i><p>I think, it consists of few different topics, hardly mixable.</p><p>1) When I buy a car, I might get (and might not get) some warnings. However nobody expects car manufacturer to teach me how to drive. This is a skill, and I spent literally months and thousands of dollars honing this skill. Still, even I got to the level i can participate in racing events, I would not be arrogant enough to try to drive 18-wheeler. Or bus. And if i try, i would *not* blame others for collision.</p><p>Somehow we think that because we have a computer, we possess skills necessary to develop software. Or, if we learn a language, we learn software engineering. This is wrong: this is an acquired skill. Junior engineers coming from college spent years learning how to develop robust systems quickly. Using analogy, I have a chef knife, so why I cannot cook like a Chef? Oh, and by the way -- there was not a single word of warning on the knife when I bought it. There was no video how to hold it properly, where I should use chef knife, peeling knife, and so on. Not a squeak on how to maintain it, how to wash, how to store, how to sharpen.</p><p>2) The world is changing quickly. What used to be highly-professional activity quickly becoming a side-skill for people professional in different area, being it biologists, physicists, or computational scientists. Apparently, there is *an emerging market* for development tools for this non-specialists.</p><p>However it is hardly reasonable to expect these languages and tools to come from industry *evolution.* [At some point Niklaus Wirth was asked why he does not participate in language standardization. He answered that he is teaching. To teach students, he need a modern language. So he creates one. Standardization is needed by industry -- so let industry do standardization.] Industry does not know what academia needs. And does not care -- justly. But the market means that some company, some group of people might start to work on product that is needed for this market, and start to sell it. [Stephen Wolfram's Mathematica is a great example of such product.]</p><p>3) Why this solution cannot emerge from academia? There are computer science / software engineering departments. So, why do you want somebody else to solve your problems, instead of stopping in your colleague's office?</p><p>Again, we have examples. Some quite interesting facts in mathematics are proved by software. It was academia that developed tools for proofing, and created validation means for this tools. In fact, academia is more interested in formal validation of programs than software industry (as whole). So, if it is possible for mathematicians, why not for others?</p></li>
<li><i>David Frenk:</i><p>C++ isn't the problem. Academic researchers write terrible code in every language they use. Python is pretty much the most user-friendly language imaginable, and most academic python code is spaghetti too. Peer review (especially in an open source context wherever possible), and better software engineering training for academics who need to write code are the best solutions here.</p><ul>
<li><i>Jef “Credible Hulk” Spaleta:</i><p>when the peer review of the code itself become as important to career advancement as the scientific results publication...things will get better. Otherwise, it won't. Academic researchers by and large are not incentived to write maintainable code.  For projects with a large enough budget, you start seeing staff engineers hired to maintain critical codebases, but if the researcher is writing it, its really not expected to be maintainable.</p><p>And while there is an effort put into peer review of the published articles that appear in scientific journals the same effort is not usually required for the digital artifacts (the software) that was used to produce the results expressed in the articles.  As it stands career advancement is not predicated on being proficient at producing readable, reusable robust code. Publish or perish doesn't generally apply to the software.. it is what it is.</p><ul>
<li><i>Konrad Hinsen:</i><p>That is indeed an important point, but it's also important to realize that improving the situation is not easy. Reviewing scientific code upon publication requires (1) accepted standards for code quality and (2) reviewers compensated in some way for the significant effort that code review represents. Which is one of the reasons why I ask for better tools: to reduce the effort in code reviews.</p></li>
</ul>
</li>
</ul>
</li>
<li><i>Michael:</i><p>Coming from a non-CS academic background, I disagree with you. This sentence in particular:<br></p><blockquote>It’s also you, the software engineering community, that has made no effort to warn the non-expert public of the dangers of these [C++] tools. </blockquote><p><br>does not match my experience at all. Every conversation with a scientist that touched upon C++ I can remember has amounted to "yeah, but C++ is very hard, let me use MATLAB/Python/R instead."  If you want to convince yourself, go around university departments and poll graduate students on whether or not they think C++ is easy. This will render the fact that you cannot identify "clear warnings" irrelevant, since if everyone agrees that C++ is not easy, that provides evidence those warnings from the software community are coming through to non-experts.</p><p>That does not mean Ferguson's code doesn't shed light on a serious problem. The code would not have looked much different in MATLAB/Python/R. The problem is not the language, but the inherent practices that should be involved in developing scientific code: version control, unit tests, documentation, reproducibility. Most academic groups, however, provide little to no training or emphasis on the importance of  these tools. The value is instead placed on publishing papers. The purpose of most academic code is usually to be just good enough to plot the graphs that are used for a paper. This is the case even in fields that should be CS-minded, such as applied mathematics. You will never get tenure for writing good code, and your graduate students have no incentive to write good code -- unless, of course, they want to solve a real problem. But then they often go to industry.</p><p>If you want to avoid this, educate the older academics. <a href="https://dirkgorissen.com/2012/03/26/the-researcher-programmer-a-new-species/" rel="nofollow noopener" title="https://dirkgorissen.com/2012/03/26/the-researcher-programmer-a-new-species/">This is already happening</a>. Software engineering, in my view, is extremely transparent about your main gripe: C++ is not for beginners.</p></li>
<li><i>Ariel Fogel:</i><p>Thanks for writing this. As someone who left software development to go back to academia in a field not directly related to CS, one thing I notice is that there's often not a ton of time to implement best practices unless that's part of the culture of the lab (read: unless your lab is led by a computer scientist who is well trained in software engineering).</p><p>Even if there was enough time, more often than not you're writing one-off scripts or things that are what I referred to as a <i>spike</i> in a developer context. And unfortunately, sometimes those spikes continue to get developed. But a lot of times they aren't b/c research is inherently about taking well-informed stabs at the unknown and seeking to uncover something new. It's hard to know when it's worthwhile to start with best practices or the tech debt is high enough that it necessitates a refactor. And even more difficult if you're trying to get funding for that. I'm not sure my lab would be able to write grants that also ensure we TDD every, or even some, pieces of research we produce.</p><p>And that's with me having been exposed to software development practices <b>and</b> having access to some professional programmers who help with our research, which most of my peers haven't and don't. Speaking of which, I'm going to go back to writing crappy code now :)</p></li>
<li><i>David Hicks:</i><p>I've come here from Hacker News where there's a little outrage going on right now ...</p><p>I think computational scientists are increasingly going to need to get code reviewed by experts, particularly in areas where that code affects public policy. There are a bunch of ways this might be achieved, and publishing of source code openly under a FOSS license could help here. But it may be that you need to pay people to build models (or pay people to design some sort of extensible model framework) for you.</p><p>To look at your analogy here - "You expect cars to be safe to use by anyone with a driver’s license."<br>Yes, I do. But I don't expect to be able to go to the Ford factory, pick up some tools and make a car that meets road use regulations without some training. By using C++ you've wandered in and had a go with an arc welder, and now you're annoyed with us at the result?</p><ul>
<li><i>Konrad Hinsen:</i><p>"I think computational scientists are increasingly going to need to get code reviewed by experts"...</p><p>Let me translate: "We, the software industry, set the rules by which everyone has to play for using computers. If scientists want to do computations, they will have to consult with us and pay us for that."</p><p>That's "software is eating the world" at its best. And that's exactly what my open letter is arguing against. You may of course disagree, as this is a question of policy, but then there is no need for further discussion: you and me have conflicting interests.</p><ul>
<li><i>David Hicks:</i><p>I'd also like to ask, respectfully, if you would resent the suggestion of getting an Architect to look over the plans you drew up for a house you were building? And paying them to do so?</p><p>Software Engineering is a skilled profession, we spend a lifetime learning, practising and perfecting it, but it's somehow wrong to suggest that you might want to consult with someone to help get it right?</p><ul>
<li><i>Konrad Hinsen:</i><p>Houses are like cars: mature technologies where all the roles are well defined. There are no "house planning for dummies" books that lure people into designing their own house without help from an architect.<br>I am perfectly fine for software engineering to become as mature as architecture, and left to qualified professionals. But computational scientists need to be able to do <i>their</i> job autonomously. Which is not the case as long as badly designed systems programming languages are almost inevitable for implementing scientific models.</p><ul>
<li><i>David Hicks:</i><p>So now we should set the rules, and be gatekeepers of knowledge? I'm getting very mixed messages here.</p><p>I'm not a C++ coder by trade, very often, so I'll leave them to answer your criticism that it's poorly designed.</p><p>You're asking that computational scientists be able to produce work as well as experienced software engineers can, with no training and with no oversight, without engaging with experienced people to help build out your models, and certainly without paying for any of their insight. Why do you think that should even be possible? Are you haranguing chemical engineers because anyone should be able to build an oil distillation column and it's their fault yours blew up?</p><p>Our discipline is almost uniquely open, you can learn, you can build, we give access to tools and platforms, we share amongst ourselves and with anyone that wants to learn. But that doesn't mean that after reading a couple of intros to C++ you're going to make flawless programs and frankly I find it arrogant that you think you should be able to just bypass the training and achieve comparable results. There's a reason your university has a whole department for computer science.</p><ul>
<li><i>Konrad Hinsen:</i><p>I am not asking that computational scientists should be able to do zero-effort software engineering. They should be able to develop and evaluate scientific models on their own, using tools designed by software engineers. Much like ordinary people write letters using word processing software.</p><p>To give an example for how this could work (I am not saying this particular approach will work, but I think it's worth investigating): design a stack of ever more specialized DSLs, with a general-purpose programming language at the bottom and each successive layer on top of it specializing towards a scientific application domain. Most scientists could then work most of the time at a level they can manage on their own. When they hit the limits of their DSL, they'd work with RSEs on a more appropriate DSL for their specific problems.</p><p>However, what I outlined above is not a technology fix. Those DSLs should each correspond to a role and a competence profile. It's not just a software stack with layers of abstractions introduced to facilitate maintenance by teams of people who have basically all the same profile. Another important point is interoperability. Lots of specialized DSLs can only work in practice if the epidemiology DSL can interoperate with the statistics DSL and the ODE DSL.</p></li>
</ul>
</li>
</ul>
</li>
</ul>
</li>
<li><i>David Hicks:</i><p>I strongly disagree with your (mis)characterisation there, particularly as I suggested publishing open source as a way to get more eyes on the code.</p><p>We don't set the rules, clearly, as you can find and use a good many of our tools for free, in whichever way you want, as demonstrated here. But if you're not trained or experienced you're not always going to get the best results, and perhaps you should be looking for outside help.</p><p>Me, I don't expect to be any good at arc welding without some help and training either.<br>(edit - I don't even expect to be any good at writing software without getting other people to review it!)</p><ul>
<li><i>Ondřej Čertík:</i><p>I think Konrad is arguing for domain scientists to be able to write software by themselves, without needing CS experts (whether paid or open source) to help fix up their code. I agree with that 100%.</p><ul>
<li><i>David Hicks:</i><p>I would argue that anyone producing software that is going to be relied upon for published scientific results, particularly scientific results that are used to inform public policy, should have such software reviewed by peers, and probably a wider audience than that if the peers are similarly non-expert.</p><p>You might not wish to involve CS 'experts', (and this isn't really CS, but Software Eng) but perhaps some of the habits of such people should be explored. I wouldn't dream of deploying something that hadn't had other eyes on it.</p><p>I agree in the abstract that it's a good thing to create tools for scientists to need as little assistance as possible, and it looks like you're working towards that end - good stuff :)</p><p>But I also think that fundamentally, to produce good software, you need more than one person and you need experienced eyes. It's in the nature of the game.</p><ul>
<li><i>Ondřej Čertík:</i><p>David, thanks for the comment -- I agree that one should not work in isolation and the more reviews the better. At the same time I like what Konrad said below that computational scientists need to be able to do their job autonomously. It's not mutually exclusive, we should strive for both.</p></li>
<li><i>Ondřej Čertík:</i><p>Yes I agree that it's always good to have more than just one person to look over any code.</p></li>
</ul>
</li>
</ul>
</li>
</ul>
</li>
</ul>
</li>
</ul>
</li>
<li><i>Ondřej Čertík:</i><p>Thanks for the post Konrad. I have couple thoughts on this. One is, that Fortran would be a great fit for this kind of code, and one thing I am planning with the LFortran (<a href="https://lfortran.org/)" rel="nofollow noopener" title="https://lfortran.org/)">https://lfortran.org/)</a> compiler to do once it is more mature is to give "pedantic" (so to say) warnings or even errors on code constructs that should not be used, even though they are perfectly legal Fortran. From little things like enforcing "implicit none" and not allowing "implied save" or not specifying a precision for floating point and other typical pitfalls. And in the long run, I am hoping the compiler can detect a lot more constructs that should be discouraged, such as using pointers instead of allocatable arrays and even things like every time a subroutine has a side effect or when a global variable is declared, the compiler could give a warning, and you must put in some kind of a comment documenting / acknowledging that's what you really want .That way I believe the compiler with excellent warning and error messages can greatly help teach non-expert programmers how to write higher quality code. Part of this is also that in Debug mode, it should check absolutely everything, from integers wrapping around, to any kind of memory issues such as dangling pointers. I think all of this can technically be done.</p><p>However, ultimately this goes much beyond just better compilers, and that is the main point of your blog post I think. I personally like C++ for things like writing compilers, but for scientific computing I think it's not great, because every big C++ code that I have seen requires to have CS experts on the team to keep fixing up issues that the domain scientists make. As you also imply in your post.</p><p>Fortran is much better suited, but currently it is falling short on its mission, it's lacking tooling, the compiler quality is not great, does not run on modern hardware such as GPUs, etc. I am trying to fix all that, see e.g., some of our recent efforts:</p><p><a href="https://fortran-lang.org/" rel="nofollow noopener" title="https://fortran-lang.org/">https://fortran-lang.org/</a></p><p><a href="https://ondrejcertik.com/blog/2020/04/running-for-wg5-convenor-announcement/" rel="nofollow noopener" title="https://ondrejcertik.com/blog/2020/04/running-for-wg5-convenor-announcement/">https://ondrejcertik.com/bl...</a></p><p>But this is something that should have been done 20 years ago, because even if we are 100% successful in our vision, it will still take 5 to 10 years before Fortran achieves it.</p><p>But I think it goes even beyond that. Even with a language that is better suited for numerical programming, and an excellent compiler that can guide the user to write using the "best practices", I think one also needs to adopt "modern social practices", which is to post the code as open source at GitHub or GitLab, and build a community around it.</p><p>Summary: I think there is a huge opportunity to provide high quality tools for domain scientists to use and we have a long way to go.</p><ul>
<li><i>Themos:</i><p>The NAG Fortran compiler can check array bounds, integer overflow, undefined variables, dangling pointers, memory leaks and more. But getting unreliable numbers faster and cheaper has been a siren's call few can resist.</p><p><a href="https://wg5-fortran.org/N1951-N2000/N1965.pdf" rel="nofollow noopener" title="https://wg5-fortran.org/N1951-N2000/N1965.pdf">https://wg5-fortran.org/N19...</a> addresses Fortran vulnerabilities. Documents exist for  other languages.</p><p>In my view, the fundamental problem is that (non-CS) research codes are not derived from specifications. Huge parameter spaces abound and they are not explored adequately.</p><p>Without careful tuning of incentives, I can't see how we will end up in a better place.</p><ul>
<li><i>Ondřej Čertík:</i><p>@disqus_BXzvDTvCEf:disqus thanks for the comment. Indeed we use the NAG compiler, it's great and the number of things it can catch is awesome. In my comment above I suggest we explore ways how to go even beyond what the NAG compiler can currently catch. Thanks for the link to the N1965 document.</p></li>
</ul>
</li>
<li><i>Konrad Hinsen:</i><p>Thanks for your comments Ondřej. Your work on improving Fortran is very much in line with what I think we (computational science) need. And I certainly agree about developing best practices, which is fortunately already going on.</p></li>
</ul>
</li>
<li><i>orca:</i><p>this is quite off the mark. the author sets the bar too low for himself by criticizing the most easily (and to be fair legitimately) dismissed criticisms of the Imperial College model by software engineers. Here's a better laid-out critique that the OP doesn't speak to:</p><p></p><blockquote>The Imperial College modelers released the source code a couple of days ago to the model that shut down the world economy. It's not the original<br>model code but was rather original source code turned over to volunteer<br>programmers who re-wrote it so that is more readable. I have done some<br>model review of financial models in the past but without the source code<br> I would not be able to do a full review of the Imperial College model.<br>Now that we have the source code (sort of), I can.<br><br><br>Any such model ought to have been independently reviewed before it is ever<br>used for real policy decisions. Policy analysis is awash in models but<br>no one ever really checks them. Going forward, health policy makers<br>should ask for and disclose independent validation of any model before<br>using its results to make recommendations of any consequence.<br><br><br>Normally, model reviews are long technical documents but there would also be a <br>summary section. Here's what I think a summary should have looked like.<br>...<br><br><br>Overall conclusion: this model cannot be relied on to guide coronavirus policy.<br> Even if the documentation, coding, and testing problems were fixed, the<br> model logic is fatally flawed, which is evidenced by its poor <br>forecasting performance.</blockquote><p></p><p><a href="https://www.facebook.com/scarlett.strong.1/posts/25243721950097" rel="nofollow noopener" title="https://www.facebook.com/scarlett.strong.1/posts/25243721950097">https://www.facebook.com/sc...</a></p><ul>
<li><i>Konrad Hinsen:</i><p>This is a very different critique that I actually mostly agree with. Policy decisions should indeed be based not just on "science", but on trustworthy scientific findings. How to do that in an emergency is of course a different question again.</p></li>
</ul>
</li>
<li><i>MaxSchumacher:</i><p>The analogy to cars is flawed, because C++ isn't an end product for untrained users, if you want to stick to the car industry, then C++ is a blowtorch, a tool used by professionals. The scientists shouldn't have used tools they don't understand and base policy recommendations on the output of a blackbox they cannot reason about; admitting ignorance is vastly better than pretending to understand.</p><p>I don't believe in the perfect separation of model and implementation: you learn about the world once the code is running and results are produced. One can argue that if you cannot build it, you don't understand it.</p><ul>
<li><i>Konrad Hinsen:</i><p>We seem to agree that C++ is not an end user product. But show me a single C++ tutorial aimed at novices that clearly says so! How are scientists supposed to realize that they don't understand a product if all the descriptions of that product tell them "don't worry, it's easy"?</p><ul>
<li><i>MaxSchumacher:</i><p>nobody in the history of the world has ever uttered the phrase "<br>"don't worry, it's easy" to refer to C++ It is a famously complex and large language.</p><p>Plenty of C++ books talk about how to write good code and how to use the language, violating those recommendations is akin to putting your dog in the microwave.</p><p>The basics of software quality aren't arcane knowledge uniquely accessible to greybeards, you'll find them in countless entry-level books and blog posts:</p><p>- use descriptive names for variables and functions<br>- try to keep functions small<br>- use comments for difficult spots<br>- document your work<br>- test your code vigorously<br>- get a least one review on the code<br>- use a version control system</p><p>I wouldn't conduct brain surgery and, after failing miserably, complain to the people making the scalpel: "Hey! You should have put a warning label on this!"</p><ul>
<li><i>Konrad Hinsen:</i><p>Me neither. The people I'd complain to are the authors of "Brain surgery for dummies", as well as brain surgeons performing live on television, explaining their techniques. The problem is not proposing power tools, but advertising them to non-specialists.</p></li>
<li><i>boromict cumbordor:</i><p>first hit for "c++" "don't worry" "it's easy": <a href="https://books.google.com/books?id=N5otBAAAQBAJ&amp;pg=PA1&amp;lpg=PA1&amp;dq=%22c%2B%2B%22+%22don%27t+worry%22+%22it%27s+easy%22&amp;source=bl&amp;ots=xCFW65GZlH&amp;sig=ACfU3U0AatR6nWgs8Y2V3InNWRFcYhhXIQ&amp;hl=en&amp;sa=X&amp;ved=2ahUKEwjh_tDL8b3pAhV0MX0KHentC4wQ6AEwAHoECAkQAQ#v=onepage&amp;q=%22c%2B%2B%22%20%22don't%20worry%22%20%22it's%20easy%22&amp;f=false" rel="nofollow noopener" title="https://books.google.com/books?id=N5otBAAAQBAJ&amp;pg=PA1&amp;lpg=PA1&amp;dq=%22c%2B%2B%22+%22don%27t+worry%22+%22it%27s+easy%22&amp;source=bl&amp;ots=xCFW65GZlH&amp;sig=ACfU3U0AatR6nWgs8Y2V3InNWRFcYhhXIQ&amp;hl=en&amp;sa=X&amp;ved=2ahUKEwjh_tDL8b3pAhV0MX0KHentC4wQ6AEwAHoECAkQAQ#v=onepage&amp;q=%22c%2B%2B%22%20%22don't%20worry%22%20%22it's%20easy%22&amp;f=false">https://books.google.com/bo...</a></p></li>
</ul>
</li>
</ul>
</li>
</ul>
</li>
</ul>

<!-- Local Variables: -->

<!-- mode: markdown -->

<!-- End: -->
 ]]></description> </item><item> <title>Wanted: a hierarchically modular software architecture</title> <link>https://blog.khinsen.net/posts/2020/05/05/wanted-a-hierarchically-modular-software-architecture.html</link> <pubDate>2020-05-05</pubDate> <author>Konrad Hinsen</author> <guid isPermaLink="true">https://blog.khinsen.net/posts/2020/05/05/wanted-a-hierarchically-modular-software-architecture.html</guid> <category><![CDATA[ scientific software ]]></category> <description><![CDATA[ <p>In his 1962 classic <a href="https://www.jstor.org/stable/985254" >&quot;The Architecture of Complexity&quot;</a>, Herbert Simon described the hierarchical structure found in many complex systems, both natural and human-made. But even though complexity is recognized as a major issue in software development today, the architecture described by Simon is not common in software, and in fact seems unsupported by today's software development and deployment tools.</p>

<!-- more -->

<p>The prime characteristic that Simon identifies in most complex systems is a hierarchical structure. Systems consist of subsystems, which consist of sub-sub-systems, etc. Simon describes the subsystems at each level as &quot;nearly decomposable&quot;, meaning that the interactions between subsystems are much less important than the interactions between the parts inside a subsystem. I prefer the shorter term &quot;modular&quot; for this feature, and thus end up with &quot;hierarchically modular&quot; as my label for the architecture that Simon describes in much detail. I won't repeat his arguments for the ubiquity of such systems, so please read the paper - it's definitely worth it, and it's very clearly written.</p>

<p>It may seem as if many of today's programming languages propose exactly this kind of architecture for designing software systems, but a critical inspection shows that they don't. To explain where the problem is, I will use Python as an example because it is widely known, but the arguments apply with some modifications to most other languages as well.</p>

<p>Python's module system is basically a hierarchy of namespaces, with namespaces containing mainly function and class definitions, but also variables referring to arbitrary data objects. Since namespaces are independent, and can contain sub-namespaces, this looks like a perfect match for a hierarchically modular architecture.</p>

<p>One obstacle is that there is no way to combine independently designed modules into a larger hierarchy. Suppose I want to create a software component called <code>ode_solver</code> that uses the popular packages <a href="http://numpy.org/" >NumPy</a>  and <a href="http://scipy.org/" >SciPy</a>. In a hierarchically modular architecture, implementation details of a component, such as the names of the packages it uses, would be hidden from outside view. The packages would become <code>ode_solver.numpy</code> and <code>ode_solver.scipy</code>. In real Python, they can only remain <code>numpy</code> and <code>scipy</code>, as their authors decided to call them. Independently written software components in Python always live in the globally shared top-level namespace. And since developers are free to modify their packages as they like, this makes the top-level namespace an instance of <a href="https://www.qwant.com/?q=shared%20mutable%20state" >shared mutable state</a>, universally recognized as problematic in software engineering.</p>

<p>The shared top-level namespace creates a strong interaction between all components at all levels. Suppose I have another component called <code>visualizer</code> that also uses NumPy and SciPy, but requires different versions. That component becomes impossible to combine with my <code>ode_solver</code> because of conflicting version requirements - the well known <a href="https://en.wikipedia.org/wiki/Dependency_hell" >dependency hell</a>. Another way to look at this is to consider each package's detailed dependency list, with version requirements, as part of its interface.</p>

<p>The second obstacle is that the full specification of a module's interface (something that's never ever written down in Python) in general includes classes defined by its dependencies. My <code>ode_solver</code> could, for example, return some value as a NumPy array. That would make NumPy not only a run-time dependency of the code, but also a specification dependency for the interface. If <code>visualizer</code> expects a NumPy array as the input to one of its functions, I'd be in trouble again as the class definition in the two different versions of NumPy might not be the same. And that trouble would not go away if I could migrate NumPy and SciPy inside my component's namespace as suggested above.</p>

<p>Some readers' first reaction is likely to be &quot;that's a symptom of bad specifications&quot; or &quot;that's the trouble you deserve for using a dynamically typed language&quot;. However, static typing doesn't solve the problem, it merely shifts it from run time to compile time. It's the types introduced by dependencies that end up in the static interface of a component. The impact on component compatibility is the same. And if that's a symptom of bad design, then good design is not only rare but also actively discouraged by today's software development tools. The only way out I can see is to create wrapper types and wrapper functions in the component that hide the implementation in terms of dependencies. Hands up if you find that idea appealing!</p>

<p>The only programming language I know of that does not suffer from this problem is <a href="https://www.unisonweb.org/" >Unison</a>, which refers to functions and data types <a href="https://www.unisonweb.org/2020/04/10/reducing-churn/" >via hashes rather than names</a>. It's a very young language, so it's too early to say how this feature will change software architecture on a larger scale.</p>

<p>Programming languages are not the only realm in which we can try to construct hierarchically modular software. It would in fact be preferable to do so at a language-neutral level, to escape from the silos that languages tend to represent. I'd love to be able to combine a component written in Python with a component written in R! So maybe we should try to make hierarchically modular assemblies at the level of compiled binaries.</p>

<p>One candidate would then be Linux' <a href="https://en.wikipedia.org/wiki/Executable_and_Linkable_Format" >Executable and Linkable Format</a> (ELF), which covers several types of binary files: executables, object files, shared libraries, and more. But there is no kind of ELF file that could represent hierarchically composable modules, as far as I can see. There's no way to combine two shared libraries into a bigger shared library, nor two executables into a larger executable, and moreover every executable has a global namespace that would create the same issues that I outlined above for Python. You can't have an executable that includes or refers to two different versions of the <a href="https://zlib.net/" >zlib library</a>, for example.</p>

<p>The only approach that looks doable in the Unix world is working at the process level. A software component is then a process based on an executable, and data between processes is exchanged via files or sockets. Choosing a clever hash-based naming scheme (as done by <a href="https://nixos.org/" >Nix</a> and <a href="https://guix.gnu.org/" >Guix</a>) makes it possible to keep any combination of versions accessible in parallel. Several processes could be managed as child processes by a superprocess, which would thus represent a component one level up in the hierarchy. In the Web world, a very similar setup could be constructed by making each component a Web service. There isn't much tool support for such techniques, but perhaps the most important obstacle is efficiency issues in the communication between components, which would require serialization and either file storage or network communication.</p>

<p>The main merit of the two approaches I have outlined in the last paragraph is that they can accommodate legacy code and systems, unlike the starting-from-scratch approach of Unison. With a bit of luck, improved tooling and optimization could turn the process/service-based approach into a viable technique for some types of real-life application, while Unison and perhaps others introduce the same basic idea at the programming language end of the scale of software component technologies. And then, if the concept turns out to be successful for taming software complexity, it might become the norm after a few decades. So far for my daily dose of wishful thinking!</p>

<p>Finally, let me reveal my motivation for writing this post: I hope that someone will prove me wrong. I'd love to see a comment pointing out that I am simply not aware of the right tools and techniques. And you get bonus points for references to actual hierarchically modular software systems that work!</p>

<h3>Comments retrieved from Disqus</h3>

<ul>
<li><i>Konrad Hinsen:</i><p>A <a href="https://twitter.com/__luthaf__/status/1257668514770554884" rel="nofollow noopener" title="https://twitter.com/__luthaf__/status/1257668514770554884">Twitter comment</a> says that Rust's package management system satisfies my requirements.</p></li>
</ul>

<!-- Local Variables: -->

<!-- mode: markdown -->

<!-- End: -->
 ]]></description> </item><item> <title>Emacs as a malleable system</title> <link>https://blog.khinsen.net/posts/2020/04/03/emacs-as-a-malleable-system.html</link> <pubDate>2020-04-03</pubDate> <author>Konrad Hinsen</author> <guid isPermaLink="true">https://blog.khinsen.net/posts/2020/04/03/emacs-as-a-malleable-system.html</guid> <category><![CDATA[ scientific software ]]></category> <description><![CDATA[ <p>Malleable systems are software systems that are designed to be modified and extended by their users, eliminating the usually strict borderline between developers and users. Making scientific software more malleable is a goal that I have been pursuing for 25 years, starting with a shift from Fortran to Python as my main programming language, and a simultaneous shift from writing programs to writing toolkits, such as my <a href="http://dirac.cnrs-orleans.fr/MMTK/" >Molecular Modelling Toolkit</a> first published in 1997. Therefore I was pleased to discover the <a href="https://malleable.systems/" >Malleable Systems Collective</a>, which has just published a <a href="https://malleable.systems/blog/2020/04/01/the-most-successful-malleable-system-in-history/" >post</a> in which I examine what is probably the most successful malleable system in the history of software: Emacs. If you care about users having more influence on their software, check out their site!</p>

<!-- Local Variables: -->

<!-- mode: markdown -->

<!-- End: -->
 ]]></description> </item><item> <title>The rise of community-owned monopolies</title> <link>https://blog.khinsen.net/posts/2020/02/26/the-rise-of-community-owned-monopolies.html</link> <pubDate>2020-02-26</pubDate> <author>Konrad Hinsen</author> <guid isPermaLink="true">https://blog.khinsen.net/posts/2020/02/26/the-rise-of-community-owned-monopolies.html</guid> <category><![CDATA[ computational science ]]></category> <description><![CDATA[ <p>One question I have been thinking about in the context of reproducible research is this: Why is all stable software technology old, and all recent technology fragile? Why is it easier to run 40-year-old Fortran code than ten-year-old Python code? A hypothesis that comes to mind immediately is growing code complexity, but I'd expect this to be an amplifier rather than a cause. In this pose, I will look at another candidate: the dominance of Open Source communities in the development of scientific software.</p>

<!-- more -->

<h2>From markets to monopolies</h2>

<p>In the 1990s, when I was working on my thesis, the world of scientific computing was very different from what it is now. Innovation was driven by hardware. Processor speeds kept increasing, and new processor architectures appeared on the market in rapid succession. In the course of the 1990s, I did most of my work on Unix workstations based on variety of architectures: <a href="https://en.wikipedia.org/wiki/PA-RISC" >PA-RISC</a>, <a href="https://en.wikipedia.org/wiki/MIPS_architecture" >MIPS</a>, <a href="https://en.wikipedia.org/wiki/PowerPC" >PowerPC</a>, <a href="https://en.wikipedia.org/wiki/DEC_Alpha" >DEC Alpha</a>. I also worked on mainframe computers made by IBM, Fujitsu, and Cray, all using proprietary processors. Each manufacturer sold a package of hardware, operating system, and development tools such as compilers. Compilers implemented standardized programming languages, mainly Fortran and C, with manufacturer-specific extensions that most people stayed away from because they expected to be using different machines a few years later. The computing platforms that everybody was developing for were not processors nor operating systems, but Fortran~77 and ANSI-C, each of which had developed its ecosystem of scientific libraries. For an interactive development platform, add Unix and X11. Mixing Fortran and C was somewhat platform-specific, but very doable as well. Every time I changed labs and computers during my postdoc years, I had to spend a day or two to reinstall everything I needed, but I never suffered <a href="https://hal.archives-ouvertes.fr/hal-02117588" >software collapse</a>.</p>

<p>Today, hardware innovation in mainstream computing has almost come to a halt. All the processor architectures listed above are gone. The x86 architecture, implemented in chips from Intel and AMD, dominates scientific computing, and in fact all of computing except for mobile devices. Hardware manufacturers therefore no longer supply compilers. For everyday work, most people use the free <a href="https://en.wikipedia.org/wiki/GNU_Compiler_Collection" >GNU Compiler Collection</a> or the equally free <a href="https://en.wikipedia.org/wiki/Clang" >Clang</a> compiler. For performance-critical work, commercial compilers from companies such as <a href="https://www.nag.com/" >NAG</a>, <a href="https://www.pgroup.com/index.htm" >PGI</a>, or <a href="https://software.intel.com/en-us/compilers" >Intel</a> offer better performance and libraries fine-tuned for high-performance computing. The standards defining Fortran and C have evolved, but have maintained strict backwards compatibility.</p>

<p>However, in the everyday life of computational scientists, these traditional platforms have lost importance. A new breed of languages and scientific ecosystems, such as <a href="https://www.python.org/" >Python</a>, <a href="https://www.r-project.org/" >R</a>, and <a href="https://julialang.org/" >Julia</a>, have become the dominant support for scientific software in many (though not all) domains of research. Their rise has gone hand in hand with software collapse becoming so common that many consider it normal or even inevitable. Scientists are starting to adopt heavy technology with large overheads in terms of complexity and invested effort to work around the problem (if you didn't guess yet, I am referring to containers). I waste a lot more time today with configuration and setup work (including configuration debugging) than I did in the 1990s. How did we get into this sad state of affairs? Is there any hope for getting out of it again?</p>

<p>One reason that immediately comes to mind is increasing software complexity. But that's more of a symptom than a cause. A better explanation would be an increased <em>problem</em> complexity that would then <em>require</em> more complex software. Problem complexity is much harder to measure, but I don't see much evidence supporting this hypothesis. We certainly do bigger computations, on larger datasets, but if I look at today's commonly used models and methods in computational science, they don't look more complex than what I saw in the 1990s. What has increased, however, is variety. Today's science relies on <em>more</em> computational models than it did 30 years ago, and I believe that this contributes to the fragility issue, as I will explain later.</p>

<p>There is another reason that I haven't heard anyone mention so far: the disappearance of technology markets in favor of monopolist players who can count on customer lock-in. This description will probably make you think of Microsoft's grip on the Windows user base, or the &quot;walled gardens&quot; that Google and Apple have created around their mobile platforms. But there is another category of monopoly owner in the tech world that is hardly recognized as such: Open Source communities.</p>

<h2>Open Source monopolists</h2>

<p>Consider two recent events: Microsoft killing Windows 7, and the Python community killing Python 2. The story is essentially the same in both cases: the creator of a piece of infrastructure software ends support for an old but still widely used version, forcing its users to move on to a later but not fully backwards compatible version. In both cases, a significant part of the user community would have preferred to stick to the older version, as has been nicely <a href="https://xkcd.com/2224/" >illustrated by xkcd</a>. In both cases, the end-of-support decision is a rational one for the producer because supporting old versions is costly. And in both cases, the abandoned users have no other supplier they can turn to, because the producer holds a monopoly on the technology.</p>

<p>Compare this to the diverse market of the 1990s. Producers of infrastructure software could add new functionality and try to win new clients with such improvements, but they could not afford to cause damage to their existing user base because users would simply turn to a competitor. There are many sources for standards-conforming Fortran compilers, but there is only one source for Windows or Python.</p>

<p>I suspect some readers will feel anger at this point. How dare you compare a monopolist business to a community of unpaid volunteers offering their work to the world for free? The crucial point is that I am comparing them as seen from the outside. There is a wide gap between the self-image that Open Source communities have of themselves and the image that they present to the outside world, and I believe that this is a big part of the problem.</p>

<p>Open Source communities tend to see themselves as communities of like-minded people that get organized to work together towards shared objectives. They see themselves much like a sports club that organizes practice sessions for its members, or like a village community that collectively plans its road infrastructure. But this is not at all how Open Source communities present themselves to the outside world. The Web site of a sports club says something like &quot;We are a bunch of people enthusiastic about playing football. If you are as well, come and join us.&quot; Now look at the <a href="https://www.python.org/" >Python Web site</a>. Its first statement, in big letters, is &quot;Python is a programming language that lets you work quickly and integrate systems more effectively.&quot; The site is about a product. Its goal is to convince people to use Python, not to join a community. It is more similar to <a href="https://www.microsoft.com/fr-fr/windows/" >Microsoft's Windows site</a> than to the site of a sports club.</p>

<p>&quot;But...&quot; I hear you say. Open Source. Free Software, as in &quot;free beer&quot; <em>and</em> in &quot;free speech&quot;. And everybody can join in, the community is so welcoming! Fine, but that's again the insiders' view, just slightly enlarged to the circle of people whose engagement with the technology is sufficiently deep that they consider joining the community. I suspect that most people who download and install Python the product will never know anything about the community, and many will even use Python without being aware of it at all. What they are aware of is an application or utility written in Python, e.g. <a href="https://calibre-ebook.com/" >Calibre</a> for managing their e-books, or <a href="https://www.offlineimap.org/" >offlineimap</a> for downloading e-mail. In contrast, a true community-oriented piece of software would have a splash screen saying &quot;Welcome to the Python community! Before using this software, please become familiar wit how our community works&quot;.</p>

<p>Sports clubs and village communities focus on their members' needs, interacting with the outside world by necessity, but only as a side effect. Most Open Source communities are more like political parties or non-government organizations in that they <em>want</em> to have an impact on the outside world. They care about the popularity of their products, and make efforts to increase their mind share. The reward they get in return is not money, but that's the only difference from how a company works. Both Open Source communities and software companies have an interest in attracting new clients and keeping existing ones. Both can retain clients more efficiently by generating lock-in, and so they do.</p>

<p>Note that I am not saying that either one creates lock-in intentionally. For Open Source communities such as Python, which I know sufficiently well, I am convinced there is no such intention. For companies such as Microsoft or Google, I can't know for sure. But from the clients' perspective, it doesn't matter if lock-in is intentional or a side effect.</p>

<p>One particularity about computing technology is that lock-in happens by default. It takes a conscious effort (and thus an incentive) to <em>avoid</em> lock-in. The reason is the fine-grained complexity of software interfaces coupled with the near-zero cost of modifying them. There are so many details that re-implementing an existing interface exactly requires a precise documentation of that interface, a perfectionist attitude, and a lot of time. The markets of the 1990s were made possible only by lengthy and costly standardization processes. Which in turn the participants accepted only because without the markets defined by those standards, none of them could continue to innovate in the field of processor architectures.</p>

<h2>Lock-in favors software collapse</h2>

<p>So far for communities as monopoly holders. Back to my original question: how did software collapse become normal? I believe that this is a near-automatic consequence of infrastructure software being managed by monopoly holders. The monopoly situation prevents existing users from moving elsewhere, significantly reducing the effort that needs to be made to keep them. All effort can thus be concentrated on gaining new users, which leads to the paradoxical situation that the needs of non-users have a larger weight in strategic decisions than the needs of the user base. With backwards compatibility being costly, boring, and irrelevant to the non-users that matter for the future, why care about it? That is, in my opinion, what happened to the Scientific Python ecosystem starting in the 2010s: adoption by the explosively growing data science community drowned the existing user base. The best strategy for SciPy was then to focus on the needs of the data science people, which also became the primary source for recruiting developers and maintainers.</p>

<p>Which brings me back to what I said earlier: the diversification of techniques in computational science is part of the problem. While the various subdomains of computational science have overlapping requirements, they also have divergent needs. The longevity of code is one aspect whose importance varies a lot, but there are others: the size of a typical computational task, the size of the datasets being processed, the nature of the algorithms being applied, the hardware platforms that matter most, and many more. While in theory Open Source is good for supporting diversity (&quot;just fork the code and adapt it to your needs&quot;), the reality of today's major Open Source communities is exactly the opposite: a focus on &quot;let's all work together&quot;. Combine this with the chronic lack of funding, and thus also a lack of incentives for developing the structured governance that would administrate funding and create activity reports, and you end up with large number of users depending on the work of a small number of inexperienced developers in precarious positions who cannot reasonably be expected to make an effort to even understand the needs of the user base at large. In a way, software collapse is a consequence of <a href="https://en.wikipedia.org/wiki/Conway's_law" >Conway's law</a> applied to Open Source communities.</p>

<h2>Can we do better?</h2>

<p>Given that today's tech world is dominated by software and Open Source communities, rather than by hardware-producing companies, is it possible to return to a market situation with no or weak lock-in? I don't think so. Standards-based markets can only form when there are multiple competing producers right from the start. In contrast, Open Source communities start out small and adventurous, with a few growing big and becoming infrastructure suppliers. In the beginning, they have no competition, and when they are big, new communities cannot possibly start to compete with them in the mindshare market. Which leaves two possibilities: Open Source communities could become more user-oriented, or the maintenance of infrastructure software could be ensured by other types of organizations. Let's start by looking at the first possibility.</p>

<p>An important first step would be Open Source communities recognizing that they are developing and selling products to a user base that extends far beyond the circle of potential community members. A good time for that would be just now. Many Open Source communities have recently realized that the shared idealistic goal of an Open Source world is not sufficient for ensuring respectful collaboration, and have reacted by introducing codes of conduct. What I am suggesting here is a similar approach for making the relation with the user base more explicit. The absence of a legal contract between developers and users is one of the core principles of Open Source, but that doesn't imply the absence of moral obligations. Any organization that wants to have an impact on the outside world must consider how this impact affects the life and work of other people. It should then define moral commitments, in written, even if the license prevents them from being legally enforced. A nice example are the
<a href="http://big-data-biology.org/software/commitments/" >Big Data Biology Lab Software Tool Commitments</a>.</p>

<p>Open Source communities could also more actively solicit feedback from the outside. Getting useful feedback from low-engagement users is difficult, but there are proxies, for example the people who package software for various distributions.</p>

<p>But perhaps Open Source communities are just not the right form of organization for infrastructure software. There are other entities that create Open Source software, such as the <a href="https://www.mozilla.org/" >Mozilla</a> and <a href="https://apache.org/" >Apache</a> foundations, or hybrids such as the <a href="https://pharo.org/community" >Pharo community</a> with the <a href="https://consortium.pharo.org/" >Pharo consortium</a> and the <a href="https://association.pharo.org/" >Pharo User Association</a> providing channels for users to influence development. It seems probable that more useful organizational forms are waiting to be discovered. In fact, a good guess is that software should best be managed much like other scientific infrastructure: by specific institutions that ensure long-term funding and provide software as a service to research communities.</p>

<h3>Comments retrieved from Disqus</h3>

<ul>
<li><i>Konrad Hinsen:</i><p>An interesting related blog post: <a href="http://reasonableapproximation.net/2020/04/13/in-my-culture-responsibility-oss.html" rel="nofollow noopener" title="http://reasonableapproximation.net/2020/04/13/in-my-culture-responsibility-oss.html">In my culture: the responsibilities of open source maintainers</a>.</p></li>
<li><i>Luis Pedro Coelho:</i><p>Thanks for the shout out!</p><p>One factor that has impressed me is how shallow some of these "communities" are. Even Python, there are only a handful of big committers to the core (I think barely more than 20 over the whole lifetime of the project! which is barely more than 1 or 2 at any given time).</p><p>I think the Linux kernel may have some deeper community, but many of these central projects are a handful of individuals. (The Linux kernel is also known for keeping  backwards compatibility, but I think that's Linus' personal values rather than just a function of the size of the community: most of his most famous angry rants are about this very topic: do not break other people's code).</p></li>
</ul>

<!-- Local Variables: -->

<!-- mode: markdown -->

<!-- End: -->
 ]]></description> </item><item> <title>Pharo year one</title> <link>https://blog.khinsen.net/posts/2019/12/31/pharo-year-one.html</link> <pubDate>2019-12-31</pubDate> <author>Konrad Hinsen</author> <guid isPermaLink="true">https://blog.khinsen.net/posts/2019/12/31/pharo-year-one.html</guid> <category><![CDATA[ computational science ]]></category> <description><![CDATA[ <p>It's the season when everyone writes about the past year, or even the past decade for a year number ending in 9. I'll make a modest contribution by summarizing my experience with Pharo after one year of using it for projects of my own.</p>

<!-- more -->

<p>My first contact with Pharo happened a bit more than one year ago, when I signed up for the <a href="http://mooc.pharo.org/" >Pharo MOOC</a> in October 2018. But following a MOOC means working on exercice problems defined by someone else. Getting a real feeling for a programming system requires moving on to problems you actually care about. That's why I started three Pharo-based projects in 2019. The main one is the <a href="https://www.activepapers.org/pharo-edition/" >Pharo edition of ActivePapers</a>, the other ones are <a href="https://github.com/khinsen/ipfs-pharo/" >an exploration</a> of the <a href="https://ipfs.io" >Interplanetary File System (IPFS)</a> and a <a href="https://github.com/khinsen/leibniz-pharo/" >second implementation of my digital scientific notation Leibniz</a>. In all these projects, the user interface is an important aspect, because that's one of my major motivations for using Pharo. However, instead of the standard Pharo user interface framework, which is an evolution of the original Smalltalk user interface of the 1980s, I used the <a href="https://gtoolkit.com/" >Glamorous Toolkit</a>, a complete redesign with many interesting new ideas. Perhaps the most significant innovation in the Glamorous Toolkit from my perspective is the introduction of a computational document. It resembles the fashionable computational notebooks in many ways, but differs in being an integral part of a live programming system.</p>

<p>As I wrote in my <a href="https://blog.khinsen.net/posts/2018/12/19/exploring-pharo.html" >initial blog post</a> on Pharo, I started out by exploring the system using the tools it provides for that purpose. In retrospect, this is clearly the strongest aspect of Pharo. The combination of code browsers, code search, object inspection, and execution inspection (via a tool misleadingly called a debugger) is an extremely powerful way to understand complex software systems. The best evidence is that I was able to write useful and non-trivial extensions to the Glamorous Toolkit, which still is rapidly evolving alpha-stage software and, judged by standard metrics such as lines of documentation per line of code, badly documented. But such metrics make no sense in a system in which searching the code base is faster than documentation lookup in standard environments. Going back to such environments after working with Pharo is a very frustrating experience.</p>

<p>Note that I am not saying that the Pharo environment is perfect. For my taste it requires way too much mouse use. I am still much more productive in Emacs than in Pharo for tasks supported by both, mainly because I can keep my hands on the keyboard. I also find the standard code browser in Pharo too limiting in only showing one method at a time. The Glamorous Toolkit is a clear improvement in that respect. But all the criticism I can come up with is about details that can be fixed, whereas the main defects that I now see in almost every other software development environment is much more fundamental: they suffer from a barrier that separates development tools on one side from the code under development on the other side.</p>

<p>Similar remarks apply to the Smalltalk language on which Pharo is built. It's a minimal programming language that puts its object system in center stage and pushes as many features as possible into its libraries. That's certainly an interesting point in design space to explore, but I'd personally prefer to have a couple of important concepts (for example immutable objects) as language features, rather than as  implementation details of class hierarchies. But then, no language is perfect, and Smalltalk is certainly good enough for my needs.</p>

<p>The most serious problem that I have with Pharo is that I don't see how I could use it productively for my own research in computational biophysics in the near future. There is a small computational science community around Pharo (see e.g. <a href="https://github.com/pharo-open-documentation/awesome-pharo#scientific-libraries" >this list</a> of scientific libraries), but most of the infrastructure code that I'd need is missing. Moreover, Pharo evolves too rapidly for the kind of computational research that I do (see <a href="https://blog.khinsen.net/posts/2017/11/16/a-plea-for-stability-in-the-scipy-ecosystem.html" >my critique of the SciPy ecosystem</a> for some background information). Finally, reproducible computations remain a challenge because there isn't much of a support infrastructure for reproduciblity in Pharo so far, although the recent <a href="https://github.com/guillep/PharoBootstrap" >work on bootstrapping</a> is an important first step.</p>

<p>On a longer time scale, I can imagine Pharo replacing Emacs as my main user interface to computing, with the hard-core science written in different languages but interfaced to Pharo. I expect IPFS to play an important role at the cross-language interface, for various reasons that deserve an entire blog post on their own. However, it takes a lot of not-yet-written code to get there. Too much to define this as a realistic goal for myself. This means that my future use of Pharo mainly depends on the directions taken by the Pharo community over the coming years. I am pretty sure that Pharo will remain an important tool in my toolbox, I just don't know what its exact role will be.</p>

<!-- Local Variables: -->

<!-- mode: markdown -->

<!-- End: -->
 ]]></description> </item><item> <title>Industrialization of scientific software: a case study</title> <link>https://blog.khinsen.net/posts/2019/11/12/industrialization-of-scientific-software-a-case-study.html</link> <pubDate>2019-11-12</pubDate> <author>Konrad Hinsen</author> <guid isPermaLink="true">https://blog.khinsen.net/posts/2019/11/12/industrialization-of-scientific-software-a-case-study.html</guid> <category><![CDATA[ science ]]></category> <description><![CDATA[ <p>A coffee break conversion at a scientific conference last week provided an excellent illustration for the industrialization of scientific research that I wrote about in a <a href="https://blog.khinsen.net/posts/2019/10/29/the-industrialization-of-scientific-research.html" >recent blog post</a>. It has provoked some discussion <a href="https://twitter.com/khinsen/status/1191673813626499072" >on Twitter</a> that deserves being recorded and commented on a more permanent medium. Which is here.</p>

<!-- more -->

<p>I was chatting with a colleague who I have been meeting at such occasions for about 15 years. He asked me if I was still developing my <a href="http://dirac.cnrs-orleans.fr/MMTK" >Molecular Modelling Toolkit</a>. I replied that I had stopped working on it because the <a href="https://www.python.org/doc/sunset-python-2/" >end of support for Python 2 in 2020</a> would quickly make it too hard to use for most of its intended audience, and that I didn't have the means nor the motivation to port it to Python 3. He was quite surprised by my explanations, since he had never heard of the end of support for Python 2, though he did know that there was also a version 3 that was a bit different. His own data analysis scripts were still Python 2 because he had never seen a good reason to even look at Python 3 - never break a working system! But he was alarmed by my prediction that Python 2 would soon disappear from Linux distributions, as he relied on Ubuntu (regularly updated by his lab's systems administrator) to provide him with Python 2 and the few libraries he used.</p>

<p>I was not surprised, as I have had similar conversations with various colleagues over the last years. In particular when someone contacts me with a Python question, which happens quite frequently as I have the reputation of being a Python expert in my little corner of science. The typical profile of these people is experimentalists who write and use small data analysis scripts, but for whom computation is not the central part of their research. They picked up Python from a colleague or a student, or perhaps through attending a short introductory course (such as a <a href="https://software-carpentry.org" >Software Carpentry</a> workshop). They have a Python installation on their machine, which is managed by someone else. For them, Python is &quot;just there&quot;, exactly like other Unix basics such as <code>sh</code>, or <code>grep</code>. Moreover, Python has been part of their computing life for many years, often for their entire scientific career, and it has never caused them any trouble.</p>

<p>When I mentioned my coffee break conversation <a href="https://twitter.com/khinsen/status/1191673813626499072" >on Twitter</a>, Greg Landrum <a href="https://twitter.com/dr_greg_landrum/status/1192446793612705792" >commented</a> that he would expect every Python user to make an effort to stay informed about important Python news, so everyone should by now have heard of the end-of-life decision for Python 2. This reminded me of an earlier <a href="https://twitter.com/zacchiro/status/1123168548929536000" >Twitter conversation with Stefano Zacchiroli</a>, who expressed similar views. As did other actors of the FOSS universe in various real-life discussions. There seems to be a widely shared expectation among FOSS developers that users should follow news about the software they use and take the required steps to adapt to &quot;mandatory changes&quot;, as Stefano put it. My story illustrates that this is not happening. There is a category of users who (1) don't follow development news and (2) expect the software they use to stay around forever without major breaking changes.</p>

<p>This is exactly the phenomenon that I call the industrialization of scientific software. Some software packages, such as the core of the Scientific Python ecosystem, become so popular beyond their core community that for an important part of their users they are industrial products, something they obtain once and then use without thinking much about its origins or possible evolution. One sign of a piece software becoming an industrial product is its inclusion in standard Linux distributions, where it is just one package out of many that users can choose from. Linux distributions take the role that department stores have for material goods, providing a platform for window-shopping and acquisition via a standardized procedure. For users who get their software from a Linux distribution, all software looks a bit alike. They have no reason to be more careful about Python than about <code>sh</code> or <code>grep</code>.</p>

<p>Just like material goods industries, the developers of industrial software, FOSS or not, have no easy way to communicate with their clients. If such communication becomes inevitable, as for example in the case of a product recall for safety reasons, an enormous effort must be deployed to ensure that the message reaches most of its audience. <a href="https://twitter.com/pdebuyl/status/1192410647784574976" >Pierre de Buyl made a suggestion along these lines</a>, proposing to put up posters with an explanation of the Python 2-&gt;3 transition in every research lab. Asking research funders to support such an action would be an interesting experiment.</p>

<p>Is there anything that FOSS communities can do to prevent such miscommunication in the future? A look at industrial material goods may provide inspiration. Every non-trivial technical product comes with a user manual, which typically starts with pointing out safety precautions that users are expected to be aware of. Do this, don't do that, watch out for exceptional situations. The documentation of software packages could do the same, and tutorials could then emphasize the message when explaining the product to potential future customers. Here is what such a warning could look like:</p>

<pre><code>This software package is developed for cutting-edge scientific
research. Our priority in development is to improve the software
and to adapt it for the needs of future applications. As a consequence,
we cannot maintain client code compatibility indefinitely.
Users of this package are expected to check the release notes
(available at http://...) at least once per year, and to adapt
their code to changes in the interfaces explained there.
</code></pre>

<p>I would expect such a notice in the <a href="https://scipy-lectures.org/intro/intro.html" >introduction to the SciPy Lecture Notes</a>, for example. It describes the SciPy ecosystem, comparing it to alternative choices, but says no word about what users need to do to safely use this ecosystem in their research work. As I said in my <a href="https://blog.khinsen.net/posts/2019/10/29/the-industrialization-of-scientific-research.html" >previous post</a>, the FOSS community has largely been blind to the consequences of software industrialization, maintaining the outdated view that developers and users form a single community. It's time for an upgrade.</p>

<p>Note added after the initial publication: Dan Katz commented <a href="https://twitter.com/danielskatz/status/1194203819271491586" >on Twitter</a> with a reference to this very clear <a href="https://collegeville.github.io/CW3S19/WorkshopResources/WhitePapers/quillenCW3S19.pdf" >statement on the development priorities for Matlab</a>. It would be very helpful if FOSS communities published similar statements about their products.</p>

<!-- Local Variables: -->

<!-- mode: markdown -->

<!-- End: -->
 ]]></description> </item><item> <title>The industrialization of scientific research</title> <link>https://blog.khinsen.net/posts/2019/10/29/the-industrialization-of-scientific-research.html</link> <pubDate>2019-10-29</pubDate> <author>Konrad Hinsen</author> <guid isPermaLink="true">https://blog.khinsen.net/posts/2019/10/29/the-industrialization-of-scientific-research.html</guid> <category><![CDATA[ science ]]></category> <description><![CDATA[ <p>Over the last few years, I have spent a lot of time thinking, speaking, and discussing about the reproducibility crisis in scientific research. An obvious but hard to answer question is: Why has reproducibility become such a major problem, in so many disciplines? And why now? In this post, I will make an attempt at formulating an hypothesis: the underlying cause for the reproducibility crisis is the ongoing industrialization of scientific research.</p>

<!-- more -->

<p>First of all, let me explain what I mean by industrialization. In the production of material goods, this term stands for a transition to high-volume production in large sites (factories), profiting from economies of scale. This doesn't directly carry over to immaterial goods such as information and knowledge, which can be copied at near-zero cost. There are, however, aspects of industrialization that do make sense for immaterial goods. The main one is a clear separation of producers, who design and make products for an anonymous group of potential clients, and consumers who choose from pre-existing products on the market. This stands in contrast to 1) producing for one's own consumption, and 2) commissioning someone else (e.g. a craftsman) to make a personalized product. Both of these approaches lead to products optimized for a specific consumer's need, whereas industrial products are made for a large and anonymous market.</p>

<p>In scientific research, immaterial industrial products are a recent phenomenon. The ones that I will concentrate on are software and datasets that are publicly available and used by scientists outside of any collaboration with their authors. Twenty years ago, this would have been a rare event. Most software was written for in-lab use, and not even made available to others. Only a small number of basic, standardized, and widely used tools, such as compilers, were already industrial products. Most data were likewise not shared outside the research group that collected them. The resulting non-verifiability of scientific findings was an obvious problem, and led ultimately to today's growing Open Science movement. However, the Open Science movement goes well beyond asking for the transparency that is fundamentally required by the scientific method. It wants software and data to be <em>reusable</em> by other scientists and for different purposes. This is stated most explicitly by the <a href="https://www.go-fair.org/fair-principles/" >FAIR data</a> label, in which the R stands for reusability. Open Science thus turns software and datasets into industrial commodities.</p>

<h2>The knowledge gap</h2>

<p>A characteristic feature of industrial products is that consumers know much less about them than producers. Consumers cannot ask for personalized explanations either, unlike in the case of a product tailor-made by a craftsman. For material goods, this has led to a wide range of professions, institutions, and regulations designed to help consumers choose suitable products and to protect them against producers' abuse of their superior knowledge. Examples are consumer protection agencies, independent experts, technical norms, quality labels, etc. For the industrial products in scientific research, we have no established equivalents yet, and it is not even clear if can ever have them. And that is, in my opinion, a major cause of the reproducibility crisis.</p>

<p>One piece of evidence is the nature of the cases discussed in the context of the crisis. Reproducibility has been an issue with experiments since the dawn of science, and yet experimental non-reproducibility never shows up in the examples cited. This is not because it is unimportant, but because it is well understood. Experimentalists of all disciplines know what ought to be reproducible in their field, and to which degree, and even the most theoretically minded theoreticians understand that experiments necessarily come with uncertainties. The issues that do show up in the catalogs of non-reproducible results are related to two specific research tools: statistics and computers. Both are recent, and both are routinely used by scientists who do not fully understand them. In other words, their users are consumers of industrial products who lack guidance in their choice of tools and methods.</p>

<p>Side note: I can almost hear some readers complain that statistics are nothing recent, going back to Arab mathematicians who lived 1000 years ago. You are right. What is recent is the widespread <em>use</em> of statistics in science. Before computers, statistical methods had to be applied manually, keeping them simple and the datasets small. The kind of statistical inference whose results turn out to be non-reproducible, e.g. in psychology, would not have been possible without computers.</p>

<p>As an illustration, consider the common use of <em>p</em>-value thresholds for deciding on significance. Anyone who understands the statistical framework to which <em>p</em>-values belong (hypothesis testing) agrees that most uses of such thresholds in the scientific literature make no sense. The fact that they are widely used nevertheless thus shows that most people who deal with them, as authors or as reviewers, do not understand the statistical hypothesis testing sufficiently well. And since the abuse of <em>p</em>-values has been going on for a while, it has now become a de-facto accepted practice, to the point that the people who do understand its absurdity have a hard time being heard. The same can be said about the abuse of journal impact factors for judging the authors of scientific articles, which are a sign of CVs and publication lists becoming industrial products as well.</p>

<p>The root cause of computational non-reproducibility is an even better illustration of software becoming an industrial product. I noticed that many scientists who have never experienced reproducibility issues themselves find it hard to imagine that they can exist. After all, 2 + 2 is 4, today and tomorrow. What happens when two people obtain different results from &quot;the same&quot; computation is that they performed in fact different computations (using different software) without being aware of the difference. Software has become ever more complex over the last decades, but software developers have also made an effort to hide this complexity from users - with great success. Most scientists are surprised to learn that when they run that little script sent by a colleague, they are really using hundreds of software packages written (and modified frequently) by hundreds of people over many years with only loose coordination. It's not only those hundreds of packages that are industrial commodities, but even the assembly of all those pieces, for example a Linux distribution.</p>

<h2>What can we do?</h2>

<p>We can look at the much better understood industrial production of material goods for inspiration for possible solutions. A complex industrial product, such as a car or a television set, comes with a user manual and perhaps an obligation for user training, such as obtaining a driver's license. Moreover, technical norms impose precautions on producers to make their products safe to use by non-experts. Independent experts evaluate products and publish reports that guide consumers in their choice. These approaches can be adapted to scientific software and statistical methods, but that work remains to be done.</p>

<p>I expect reproducibility to play a major role in this, as a quality label. A reproducible result can still be wrong, but nevertheless reproducibility guarantees the absence of some kinds of common problems. We need additional, complementary quality labels of course, and in fact we have a few, such as the presence of test suites for scientific software, or the existence of provenance metadata for datasets. But this is only the beginning. We do not yet know how to make data and code an industrial product that is safe to use by others, nor do we know how to prepare scientists for working in such an ecosystem. Best practices, even good enough practices, remain to be established.</p>

<p>Experts will likely be another ingredient of a solution. I suspect that most statistics-related problems could be solved by requiring that every publication making a claim based on statistical significance be validated by a trained statistician. We will have to figure out how to organize this validation. One possibility is to create independent certification agencies, similar to <a href="https://www.cascad.tech/" >cascad</a> for computational reproducibility, that employ qualified statisticians and deliver validation certificates that will figure prominently in a paper.</p>

<h2>It's not just software and data</h2>

<p>As I said above, I have focused on data and code because the computational aspects of science are what I am most familiar with. But industrialization isn't limited to computing. Even the good old journal article is slowly turning into an industrial product. With approaches such as meta-analyses or content mining, scientific papers are being used by people who are not part of the community that their authors belong to, and may thus not have the tacit knowledge shared by that community which might well be necessary to fully appreciate the published results. Interdisciplinary research is also a source of potential misunderstandings due to unshared tacit knowledge.</p>

<p>We can also see industrialization in the management of science. In fact, the term &quot;management&quot; in itself implies some form of industrialization. Unfortunately, management principles from the material goods and service industries are being applied uncritically to scientific research, leading to phenomena such as the abuse of the journal impact factor to measure an individual's productivity, or the attribution of budgets based on multiple-year predictions of research outcomes (called &quot;grant proposals&quot;) that lack any credibility. This suggests that the people who design these management practices consider science itself a commodity, as an industry that can be run just like any other industry. There is, however, a crucial difference: whereas the production of material goods is by necessity based on well-known technologies and processes (otherwise their deployment at scale would be bound to fail), research is all about the unknown. Scientists can describe directions they want to take, but not promise to reach specific goals in the future. Science is intrinsically a bottom-up process, whereas management is about top-down organization.</p>

<h2>Open Source and Open Science</h2>

<p>Back to software, there is one aspect that deserves further discussion: the role of the FOSS (free/open source software) approach that has been gaining traction in research over the last decade, and that has furthermore inspired much of the Open Science movement. The origin of the FOSS movement can be seen as a rebellion against the industrialization of software, which made it difficult to impossible for users to adapt it to their needs. The widely shared story of Richard Stallman's fight against a proprietary printer driver (see <a href="https://poynder.blogspot.com/2006/03/interview-with-richard-stallman.html" >here</a> for example) is a nice illustration. Initially, the FOSS movement focused on establishing legal means (licenses) to protect software from becoming proprietary. More slowly, and less explicitly, it worked towards a view of software development as something a community does for its own needs, with the ideal that anyone sufficiently motivated should be able to join such a community and participate in the development process. This was a reasonable proposal in the 1980s, when software was simpler and most computer users had by necessity some programming experience.</p>

<p>Today's situation is very different. Most software has the status of an industrial product for most of its users, whether it's FOSS or not. In theory, anyone can learn anything about FOSS and participate in its evolution at all levels. In practice, the effort is prohibitive for most, and nobody today can envisage understanding all the software they depend on, let alone contributing to its development. As I explained above, it has even become close to impossible to just keep track of which software one depends on. From a user's perspective, the development communities of FOSS projects are industrial software producers just like commercial companies. In a way, FOSS users even have less power because the developer communities have no legal or moral obligations toward their users at all. There are a few cases of institutions that permit users to influence and support the development of FOSS, for example the <a href="http://consortium.pharo.org/" >Pharo consortium</a> or the <a href="https://www.fondation-inria.fr/" >Inria foundation</a>, but they are the exception rather than the rule.</p>

<p>In science, the FOSS ideal of communities producing software for their own use works very well for domain-specific software packages, whose developers are a representative subset of a well-defined scientific community. But infrastructure software that is used across many scientific disciplines will invariably end up being an industrial product for most of its users. This is true for most of the Scientific Python ecosystem, for example, and also for the statistical software universe that has grown around the R language. Note that I am not saying that the FOSS approach has no advantages there. Open source code is very important to ensure the transparency required for making science verifiable. What I am saying is that openness is not enough to ensure that software is a safe-to-use industrial product, nor does it provide a mechanism for keeping a product's evolution in sync with the needs of its user base.</p>

<p>Whereas the FOSS community has largely remained blind to this issue, the Open Science movement seems to be more aware of the pitfalls of &quot;just&quot; being open, at least for data. The I and R (interoperability, reusability) in FAIR are the best evidence for this. For now, they remain ideals for which practically usable implementations remain to be defined. Perhaps this will lead to a more careful consideration of reusability for software as well. As with the material goods industries, the key is to recognize users and educators as stakeholders and ensure that their needs are taken into account by producers. Open source communities working on widely used infrastructure software could, for example, adopt a governance model that includes representative non-developing users. Funders of such communities could make such a governance model a condition for funding. But the very first step is creating an awareness of the problem. Development communities should openly state their ambition. It's OK to develop software for use inside a delimited community, but then don't advertise it as easy to use for everyone. It's also OK to aim high and work on general-purpose infrastructure software, but then explain how users can make themselves heard without having to become contributors themselves. Being &quot;open&quot; is not enough.</p>

<h3>Comments retrieved from Disqus</h3>

<ul>
<li><i>asmeurer:</i><p>Software, like all systems, does not just continue to work so long as you don't break it. It only works because people continuously work to keep it from breaking. Imagine if your city builds a bridge. Some years later, there is a bond election to pay for costs for the bridge. Now consider a voter who votes against the bond, saying, "they already built the bridge, why do they need more money? As long as they don't tear it down, it should continue to work." This is of course ridiculous. Bridges and roads require maintenance, or they will degrade. They do not just have a one time cost. Software is the same way. Even though the bits that make up the source code of software are just as immutable as the atoms of concrete in the bridge, it still requires ongoing maintenance or it will rot, just as the bridge will start to develop potholes, and eventually start to crumble if it is not maintained. The ecosystem of software and hardware that a piece of code runs on and alongside must be considered as part of the system, just as the cars should be considered as part of the system of a bridge.</p><p>The other thing to understand is that for open source software, this maintenance is provided almost exclusively by unpaid volunteers. I wonder how much your colleague has given to NumFOCUS, since he expects the software to be supported indefinitely. I would encourage you to show him this <a href="https://www.fordfoundation.org/work/learning/research-reports/roads-and-bridges-the-unseen-labor-behind-our-digital-infrastructure/" rel="nofollow noopener" title="https://www.fordfoundation.org/work/learning/research-reports/roads-and-bridges-the-unseen-labor-behind-our-digital-infrastructure/">https://www.fordfoundation....</a>.</p><p>Maintaining Python 2 support means splitting this development effort away from the development of new features, the fixing of bugs, and so on. It also means keeping a large amount of technical debt (I've written about this here <a href="https://www.asmeurer.com/blog/posts/moving-away-from-python-2/)" rel="nofollow noopener" title="https://www.asmeurer.com/blog/posts/moving-away-from-python-2/)">https://www.asmeurer.com/bl...</a>. Actually, if you want to continue to use Python 2, you can. What you can't do is expect the volunteers who work on CPython to continue to work on it, or the volunteers to work on libraries to continue to support it in addition to Python 3, or the volunteers who work on Linux distributions to continue to support it. These things would all require ongoing development efforts (see my first paragraph). You are of course free to pay a vendor to continue to provide Python 2 support for you (I'm sure some will pop up if the market demand is there), or attempt to fix any holes in the support yourself.</p><ul>
<li><i>Konrad Hinsen:</i><p>Hi Aaron,</p><p>thanks for your comments!</p><p>Before giving my point of view on your first paragraph, let me reply to your second one, which is really the topic of my post. My colleague doesn't expect software to be maintained indefinitely by someone else for free. His expectation of Python being just there forever is nothing but an extrapolation of past experience. He has no idea about how software maintenance works, nor any opinion on how it should work. And even after our coffee break conversation, he probably has no more than a foggy notion of all that. Coffee breaks are way too short, as we all know.</p><p>As for your statement that "software only works because people continuously work to keep it from breaking", that's a self-fulfilling prophecy in my opinion. What breaks software package A is a breaking change in its dependency, software package B. If everybody introduces breaking changes all the time, all software will break all the time, and your statement becomes true. In a world where everyone avoids breaking changes, software can work for a very long time without any maintenance. I have 25 year old Fortran programs that still work, as do the shell scripts that coordinate them. Software is as stable as its developers want it to be. What you can't have, given today's state of the art, is stable software *and* rapid improvement in functionality. That's a choice that developers must make. And then they should then make a clear public statement about their choice.</p><ul>
<li><i>asmeurer:</i><p>Right, I don't think there is any malice. For the most part, it is just ignorance of how open source maintenance works. Usually once you explain this to someone, they get it, but by default people don't think about it and they assume that things that just work will continue to work, and don't really consider that they only work because there are people out there who dedicate time or money to making them work.</p><p>Even for Fortran there is a maintenance cost. Every Fortran compiler has to support multiple versions of the language, and any compiler that works on a modern machine is necessarily being actively developed, because the architectures of 30 years ago aren't the same as the ones today. So ultimately, "breaking changes" will always happen *somewhere* in the stack, unless you are exclusively using 30-year old software on 30-year old hardware. Your colleague can continue to use his Python code by not updating Python from Python 2, except it won't be available on the latest Linux distro, He can avoid updating Linux, except old versions of Linux won't work on newer hardware. He can avoid updating his hardware, except hardware eventually dies.</p><ul>
<li><i>Konrad Hinsen:</i><p>Yes, Fortran compilers are being maintained. Fortran (in any of its standardized versions) is what I call a stable platform. Compiler developers work on avoiding collapse from below, in order to ensure that programmers in the software stack above  needn't worry about it. And they work on improvements that any particular user might care about or not (speed, new versions of the standard, new hardware...).</p><p>But saying that Fortran requires maintenance hides enormous differences in degree. I am pretty sure that the first release of GNU Fortran for Linux would still work on a modern Linux, though you may have to install support for 32-bit code first. All of the software stack in the PC world has been very stable. People upgrade because they want new features or other improvements, not because they face software collapse.</p><p>An interesting historical side note: all software platforms that go back to the 2000s or earlier are stable. All the ANSI standard languages, but also the JVM or the Linux ecosystem as a whole. Rapidly changing platforms are a recent phenomenon. What happened? One hypothesis: the advertising business, with its extreme short-term focus, become an important driving force for technology.</p><p>What really bothers my experimentalist colleague is the risk of Python 2 dropping out of Linux distributions, because that's what makes Python easily accessible. You can't afford not to update Linux these days, for security reasons. Maybe a conservative distribution such as CentOS will keep Python 2 for some years to come.</p></li>
</ul>
</li>
</ul>
</li>
</ul>
</li>
<li><i>Stephen Kell:</i><p>Thanks Konrad... another very thought-provoking post. I agree with your basic premise that FOSS-style culture of "fix it yourself", or more generally of conveniently identifying users with developers, doesn't match today's large-scale patterns of software distribution and co-evolution.</p><p>However, there's an elephant in the room: who is right? Why *shouldn't* Python 2 be around forever? Why is there any category difference between Python and (say) awk, or sh?</p><p>I lean towards the view that there shouldn't be, that this is another instance of the language-implementer tail wagging the working-programmer dog. (The mess known as "FFIs" is another massive example of this.)</p><p>This is cultural... stereotyping wildly, "PL" people often see dictating change to users as their prerogative; "systems" people often don't share this (e.g. witness Linus Torvalds's strong insistence on backwards compatibility).</p><p>The analogy with instruction manuals is also problematic. My toaster's instruction manual literally says "Caution: do not insert any objects into the toast slot". More</p><p>generally, these sorts of things often contain advice that is practically unfollowable, but exists to cover the backside of the manufacturer. It may or may not be legally enforceable, but my point is that this isn't necessarily the right culture to be inspired by. Putting signs and disclaimers everywhere seems like an "ambulance at the bottom of the cliff" solution. It's ducking the big question: how can we structure the "material" of software so that the things people quite reasonably want to do are the things that actually work?</p><p>I have some thoughts on that question, but would love to hear yours first. :-)</p><ul>
<li><i>Konrad Hinsen:</i><p>Hi Stephen,</p><p>thanks for your comments!</p><p>I have been thinking about that elephant for a while, but I deliberately left it out of this blog post in order to concentrate on what I hope to be more consensual: that mutual miscomprehension between developers and users is a problem we collectively need to work on. But I'll happily come back to the elephant :-)</p><p>For me, there is no category difference between Python 2 and sh or awk. They are just on almost opposite ends of a spectrum. As you say, the root of the difference is cultural. In the Unix/"system" approach, there is an ideally small set of infrastructure software that defines the rules of the system and which ought to be a stable basis that nobody perturbs without a very good reason. I'd say that sh belongs to this infrastructure, but awk probably not. Another principle more specifically for Unix is the famous "tools that do one thing but do it well". Small tools are easy to keep stable as well. Integration of these tools for solving a specific problem is someone else's job, meaning that Unix is designed for power users.</p><p>Python, on the other hand, started out as a "batteries included" supertool, and needs to evolve constantly in order to remain the top supertool for many tasks. Unlike Unix tools, the different parts of the Python standard library cannot evolve independently at their own rhythm, which enourages an attitude of embracing change and pursuing it as a goal in itself. Compatibility is then not only seen as a waste of effort, but also as a sign of attachment to the past. Moreover, working on a supertool puts developers in a god-like position. They are not creating a humble part of a system, but a system on its own that transcends mere operating systems. And as you suggest, programming languages are probably the extreme case of god-like power.</p><p>Another aspect is the size and structure of the communities. Unix is anarchy: everybody does their tool, period. No annual conferences, no governance, no code of conduct, no bureaucracy managing formal enhancement proposals. Python started out the same way, and was very stable in its early years. Today's Python community is big and organized. Such communities require shared beliefs, and "software changes" is one of these beliefs in the Python community. It probably helps that society at large is obsessed by innovation, even without any associated goal of improvement.</p><p>Concerning your comment on instruction manuals, I agree that today's legalistic attitude has pushed alerts and warnings beyond the limit of the reasonable. Maybe I should have written "instruction manuals as they were 30 years ago".</p><p>Finally, the big question. I doubt there is one general answer to it, so I will stick to what I know best: research in the natural sciences. I have tried to be neutral in my description of the ongoing industrialization, but I consider it mostly a bad development, with few but important exceptions. For software, the main exceptions are well-understood compute-intensive procedures in simulation and data analysis. For everything else, I believe we need more anarchy and more control over software in the hands of each individual scientist. Meaning small understandable building blocks, rather than the monolithic libraries of today's SciPy ecosystem, however convenient that may be for getting a job done quickly.</p><p>Am I now entitled to learn about your thoughts on the big question ?  ;-)</p><ul>
<li><i>Stephen Kell:</i><p>Thanks for the thoughtful reply. And sorry for the delay... I thought I had posted this, but I had merely written it.</p><p>Firstly, I agree that keeping to consensus-inducing topics is often a good tactic... apologies for charging off a revolutionary direction. :-)</p><p>"Batteries included" versus "one thing well" does identify a difference. Then again, Unix itself is in some sense a "batteries included" system and has a community process of sorts in the form of POSIX... its rate of change is tempered by both standardisation (slow) and plurality (many implementations). Perhaps if Python had several widely used implementations, the 2-vs-3 issue would have gone very differently... as you say, culture is a major factor.</p><p>To answer your question... I see the whole "supported versions" issue (not just in Python) and the burden of porting software to "keep up", as most immediately a consequence of two big but very concrete problems that our operating systems and programming tools set us up with. Solving them is already currently "possible" but uneconomical.</p><p>The first problem is that software packaging has no notion of isolation. Without extra effort, I can't have version X of some library/program installed and also version Y, because they collide/interfere with each other (e.g. they may want to install things at the same path, but also more directly that A links with B, say). The notion of "install" doesn't distinguish "coexistence" (both are available to me) and communication (both intentionally interact, including by presence in a shared namespace). This is a fairly direct consequence of Unix-style linking and sharing of the filesystem namespace. The right redesign of those could solve it, and I believe it needn't be very invasive. Some package managers do attempt something like this, but I've yet to see one that really goes deep enough.</p><p>The second problem is that critical fixes are not isolated from general development. In order to get implementation fixes for a given piece of software, you have also to get interface "fixes". For example, if a security bug is discovered that dates back to a library version N, it will probably only be fixed in version N+k. The interface of that version is probably different, so you have to port your code. I've not seen much focus on black-box approaches to security defence ("block, not patch"). Again, I'd argue this can be traced to Unix -- if all you have is opaque byte streams, recognising bad input is a tall order because it must be coded from scratch each time -- but by evolving Unix we can fix it. A memory-safe C will also help here (am working on it!).</p><p>Of course the reality of both of these is more complicated than I've made out. But in a world without both of these problems, I think widespread (cultural) expectations around software's "continued workingness" would be very different, because the "support" of large institutions wouldn't be necessary to keep a given codebase running acceptably. (And I did write even more about all this, but I think that rather than rambling away here, I should save the details for a blog post of my own....)</p><ul>
<li><i>Konrad Hinsen:</i><p>Thanks for taking the time to write and the initiative to actually post this reply :-)</p><p>Python actually has multiple implementations. I don't know how widely used Jython, IronPython, and PyPy are these days, but I'd say it doesn't matter. The consensus in the Python community is that CPython is the reference implementation and everyone else has to follow.</p><p>The first problem you describe looks like an early case of "convention over configuration". Recent package managers (I am thinking of Nix and Guix) are switching to explicit configuration, which does indeed solve most of the problems of what Windowsians call "DLL hell", except for one: the case where A depends on B and C, with B depending on D v1 and C on D v2. The only attempt I am aware of to solve that problem is Unison (<a href="https://www.unisonweb.org/)" rel="nofollow noopener" title="https://www.unisonweb.org/)">https://www.unisonweb.org/)</a>, which refers to dependencies by hash code rather then by name.</p><p>Your second problem looks much harder to me because it involves so many different aspects: culture, economics, power relations, etc. I am not convinced that enough people actually want to solve the problem, whose continued existence provides market dominance to some, and employment to others.</p><p>So... I am looking forward to your blog post!</p></li>
</ul>
</li>
<li><i>Stephen Kell:</i><p>Thanks for the thoughtful reply. I agree that keeping to consensus-inducing topics is often a good tactic... apologies for charging off a revolutionary direction. :-)</p><p>"Batteries included" versus "one thing well" does identify a difference. Then again, Unix itself is in some sense a "batteries included" system and has a community process of sorts in the form of POSIX... albeit with a very slow rate of change gated by both standardisation (slow) and plurality (many implementations). Perhaps if Python had several widely used implementations, the 2-vs-3 issue would have gone very differently... as you say, culture is a major factor.</p><p>To answer your question... I see the whole "supported versions" issue (not just in Python) and the burden of porting software to "keep up", as most immediately a consequence of two big but very concrete problems that our operating systems and programming tools set us up with. Solving them is already currently "possible" but uneconomical.</p><p>The first problem is that software packaging has no notion of isolation. Without extra effort, I can't have version X of some library/program installed and also version Y, because they collide/interfere with each other (e.g. they may want to install things at the same path, but also more directly that A links with B, say). The notion of "install" doesn't distinguish "coexistence" (both are available to me) and communication (both intentionally interact, including by presence in a shared namespace). This is a fairly direct consequence of Unix-style linking and sharing of the filesystem namespace. The right redesign of those could solve it, and I believe it needn't be very invasive. Some package managers do attempt something like this, but I've yet to see one that really goes deep enough.</p><p>The second problem is that critical fixes are not isolated from general development. In order to get implementation fixes for a given piece of software, you have also to get interface fixes. For example, if a security bug is discovered that dates back to a library version N, it will probably only be fixed in version N+k. The interface of that version is probably different, so you have to port your code. I've not seen much focus on black-box approaches to security defence ("block, not patch"). Again, I'd argue this can be traced to Unix -- if all you have is opaque byte streams, recognising bad input is a job done from scratch each time -- but by evolving Unix we can fix it. A memory-safe C will also help here.</p><p>Of course the reality of both of these is more complicated than I've made out. But in a world without both of these problems, I think widespread (cultural) expectations around software's "continued workingness" would be very different, because the "support" of large institutions wouldn't be necessary to keep a given codebase running acceptably. I did write even more here, but I think that rather than rambling away here, I should save the details for a blog post of my own....</p></li>
<li><i>asmeurer:</i><p>It's curious that you consider the SciPy ecosystem to be monolithic. It's generally considered to be built out of building blocks. A typical scientific workflow will require several libraries, which work together but are developed separately. If you want to do plots, you will use matplotlib or some other plotting library. If you need basic scientific functions you will use numpy or scipy, and for something more domain specific you will use a domain specific library, and so on. Contrast this to something like MATLAB or Mathematica where there is a single application package that does everything.</p><ul>
<li><i>Konrad Hinsen:</i><p>The SciPy stack is monolithic from the end user's point of view: you can't pick individual versions of each library and expect them to work together. You can only combine versions from close points in time. The developers' perspective is certainly very different. But the requirement of co-evolution in a context of rapid change in interfaces leads to a similar end result as centrally coordinated development.</p></li>
</ul>
</li>
</ul>
</li>
</ul>
</li>
</ul>

<!-- Local Variables: -->

<!-- mode: markdown -->

<!-- End: -->
 ]]></description> </item><item> <title>The computational notebook of the future (part 2)</title> <link>https://blog.khinsen.net/posts/2019/05/09/the-computational-notebook-of-the-future-part-2.html</link> <pubDate>2019-05-09</pubDate> <author>Konrad Hinsen</author> <guid isPermaLink="true">https://blog.khinsen.net/posts/2019/05/09/the-computational-notebook-of-the-future-part-2.html</guid> <category><![CDATA[ computational science ]]></category> <description><![CDATA[ <p>A while ago I <a href="http://blog.khinsen.net/posts/2019/02/11/the-computational-notebook-of-the-future.html" >wrote about my ideas for a successor of today's
computational
notebooks.</a>
Since then I have made some progress on a prototype implementation,
which is the topic of this post. Again I have made a companion
<a href="https://vimeo.com/339361206" >screencast</a> so that you can get a better idea of how all this works in practice.</p>

<!-- more -->

<p>As a reminder, the two aspects of today's notebooks
(<a href="https://www.wolfram.com/mathematica/" >Mathematica</a>,
<a href="https://jupyter.org/" >Jupyter</a>, <a href="https://rmarkdown.rstudio.com/" >R
markdown</a>,
<a href="https://orgmode.org/worg/org-contrib/babel/" >Emacs/OrgMode</a>) that I
consider harmful for scientific communication are:</p>

<ol>
<li><p>The linear structure of a notebook that forces the narrative to
   follow the order of the computation.</p></li>
<li><p>The impossibility to refer to data and code in a notebook from the
   outside, and in particular from another notebook, making reuse of
   code and data impossible.</p></li>
</ol>

<p>Like the demo that I made last time, and which is best qualified as a
quick hack, the computational document that I am presenting today is
implemented in <a href="https://pharo.org/" >Pharo</a> and builds on the
<a href="https://gtoolkit.com/" >Glamorous Toolkit</a>, which is an innovative
development environment designed around the notion of &quot;moldable
development&quot;, which means that developers should be able to adapt
their tools to their specific needs with little effort. This is
precisely what I have done. The code is <a href="https://github.com/activepapers/activepapers-pharo" >on
GitHub</a> and
includes the example document from the demo.</p>

<p>Contrary to today's notebooks, my computational documents consist of
two distinct layers, which I show for an example in the screencast. A
<em>workflow layer</em> consists of <em>scripts</em> (short pieces of code) that
compute <em>datasets</em> keep track of the data dependencies. The workflow
layer can be visualized as a graph. Scripts and datasets make up a
standard Pharo object that can be used as a building block in
subsequent work, unlike the code and data in today's notebooks. For
example, the Pharo expression <code>InfluenzaLikeIllnessInFrance data
absoluteIncidence</code> yields one of the data frames from my example
document and can be used in any type of Pharo code, including code in
another document.</p>

<p>On top of that workflow layer, there is a documentation layer
consisting of a Wiki-style multi-page document in which each page can
contain code snippets. These code snippets are intended for data
presentation (plotting etc.) and for demonstrations (examples,
verifications, etc.) They are not accessible from outside their pages,
and they cannot change the datasets computed by the workflow. The
documentation pages can refer to and include the datasets, the
scripts, but also arbitrary other Pharo code. In particular, this
allows including library code used by the workflow scripts in the
documentation layer, as opposed to today's notebooks for which library
code is undocumentable black-box code.</p>

<p>A third essential element is the <em>playground</em> attached to the
workflow. This is where interactive exploration takes place. Code
snippets in the playground can access datasets just like scripts, but
they cannot modify them. The playground is meant both for authors and
for readers. Authors develop scripts incrementally in the playground,
and turn them into scripts (at the click of a button) when they are
satisfied. Readers can write code snippets for exploring the data in
more detail.</p>

<p>The code is currently &quot;demo quality&quot;, so please don't rely on it for
your own research. Even the underlying GToolkit library is still
advertised as alpha level. There is a reason for calling this the
future rather than the present! However, there are a few conclusions
that I am already willing to draw from this work:</p>

<ol>
<li><p>An authoring environment for computational documents should also
be a more general software development environment.  If you have
to change tools for switching from library code to a computational
document or back, you have a technological barrier to overcome
that creates a mental separation between &quot;inside&quot; and &quot;outside&quot;,
whereas the science that you want to communicate is on both sides
of your barrier.</p></li>
<li><p>The emphasis on making all code and data explorable that has been
part of Smalltalk culture from the start is highly beneficial for
computational science as well.  Notebook environments such as
Jupyter or RStudio feel extremely limited compared to the standard
Pharo environment, let alone the more advanced GToolkit.</p></li>
<li><p>Decomposing the computation into smaller independent scripts
with well-defined interfaces makes it more understandable.
In the traditional linear notebooks, you never know how far
further down a temporary variable will be used. You must
read the code from top to bottom to be sure not to miss
something. Likewise, separating &quot;essential&quot; computations
on the data from &quot;superficial&quot; computations such as plotting
makes the overall scientific logic stand out better.</p></li>
<li><p>A good authoring environment must support the full lifecycle of
computer-aided research, starting with interactive exploration and
iterating towards a computational document optimized for the
reader rather than the author. Today's notebooks do not provide
this support by sticking to a linear structure that is
satisfactory only in the initial stages of the lifecycle.</p></li>
</ol>

<!-- Local Variables: -->

<!-- mode: markdown -->

<!-- End: -->
 ]]></description> </item><item> <title>Is reproducibility good for scientific progress? (a paper review)</title> <link>https://blog.khinsen.net/posts/2019/04/23/is-reproducibility-good-for-scientific-progress.html</link> <pubDate>2019-04-23</pubDate> <author>Konrad Hinsen</author> <guid isPermaLink="true">https://blog.khinsen.net/posts/2019/04/23/is-reproducibility-good-for-scientific-progress.html</guid> <category><![CDATA[ reproducible research ]]></category> <description><![CDATA[ <p>A few days ago, a discussion in my Twitter timeline caught my attention. It was about a very high-level model for the process of scientific research whose conclusions included the affirmation that reproducibility does not improve the convergence of the research process towards truth. The Twitter discussion set off some alarm bells for me, in particular the use of the term &quot;reproducibility&quot; in the abstract, without specifying which of its many interpretations and application contexts everybody referred. But that's just the Twitter discussion, let's turn to the more relevant question of what to think of the paper itself (<a href="https://arxiv.org/abs/1803.10118" >preprint on arXiv</a>).</p>

<!-- more -->

<p>The core of the work presented in that paper is a stochastic model for the process of scientific research. There is some phenomenon described by a &quot;true&quot; mathematical model. Scientists do not know this model, but can obtain data points from it. This is how experiments are described. Scientists do have full access to their own models for reality. At each time step, a scientist generates a new model according to some strategy and evaluates the quality of that model to see if it is &quot;better&quot; (in a well-defined sense) than the current concensus model of the community. One of the strategies is replication of prior work.</p>

<p>Such highly simplified high-level models are easy to criticize because of the huge number of simplifying assumptions. And yet, in other branches of science (such as physics), simple toy models have proven to be very useful. In particular, they can help identify mechanisms that are also present in more realistic (and thus more complex) descriptions of the same phenomena. However, toy models require reality checks as well, in the form of validation, even if validation is qualitative rather than quantitative. This is in my opinion one of the weak spots of this paper: validation is limited to a few basic sanity checks. Given the scarcity of empirical data on the scientific process, this isn't really surprising.</p>

<p>As for the specific issue of reproducibility, the model presented in the paper has a major weakness in that it completely ignores the issues that motivate reproducibility checks and replication studies in real life. Scientists, like all humans, are prone to mistakes and biases. The collective process of scientific research therefore includes verification steps that reduce the impact of mistakes and bias. Peer review is probably the best known one, but reproducibility checks and replication studies fall into this category as well. It is then not surprising that a model without mistakes and bias predicts little utility for verification measures.</p>

<p>However, this is merely a criticism of the current proposed model. It should be possible to include mistakes and bias without profound changes to the basic idea of modelling scientific research by a stochastic process. Confirmation bias is perhaps the simplest case: Let authors of original research overestimate the benefit of their work (as part of the evaluation criterion S in the paper) and replicators underestimate it. As for mistakes, a crude technique would be to let some percentage of scientists generate two new models, evaluate the first one, but report the second one as having been tested. Mistakes detected in a replication study would then lead to erasure of the replicated study from the process of concensus formation.</p>

<!-- Local Variables: -->

<!-- mode: markdown -->

<!-- End: -->
 ]]></description> </item><item> <title>The computational notebook of the future</title> <link>https://blog.khinsen.net/posts/2019/02/11/the-computational-notebook-of-the-future.html</link> <pubDate>2019-02-11</pubDate> <author>Konrad Hinsen</author> <guid isPermaLink="true">https://blog.khinsen.net/posts/2019/02/11/the-computational-notebook-of-the-future.html</guid> <category><![CDATA[ computational science ]]></category> <description><![CDATA[ <p>Regular readers of this blog may have noticed that I am not very happy with today's state of computational notebooks, such as they were pioneered by Mathematica and popularized by more recent free incarnations such as <a href="https://jupyter.org/" >Jupyter</a>, <a href="https://rmarkdown.rstudio.com/" >R markdown</a>, or <a href="https://orgmode.org/worg/org-contrib/babel/" >Emacs/OrgMode</a>. In this post and the <a href="https://peervideo.net/videos/watch/9ed70819-6271-439f-b392-54f34b73c124" >accompanying screencast</a> (my first one!), I will explain what I dislike about today's notebooks, and how I think we can do better.</p>

<!-- more -->

<p>There are two aspects of notebooks that I consider harmful for scientific communication:</p>

<ol>
<li>The linear structure of a notebook that forces the narrative to follow the order of the computation.</li>
<li>The impossibility to refer to data and code in a notebook from the outside, and in particular from another notebook, making reuse of code and data impossible.</li>
</ol>

<p>If you look at a traditional scientific article, or technical report, you will notice that its narrative is structured according to a high-level view of the work. It starts by describing the context of the work, then its goals and a very brief summary of the methods, and right after that it presents results and discusses them. Technical details are only discussed afterwards, once the reader understands why they actually matter. With today's notebooks, the technical details come first: a typical data analysis starts with cleanup and preprocessing steps, and therefore they also come first in the narrative.</p>

<p>An unpleasant side effect of the &quot;narrative follows computation&quot; principle is that some technical details actually cannot be discussed adequately. Scientific methods implemented in software libraries can be summarized in plain English, but the code is elsewhere, managed by a different toolset, and cannot be shown to the reader.</p>

<p>This makes the transition to the second problematic aspect: there is no way to refer to or reuse any specific part of a notebook. Neither the code nor the computed results are accessible from the outside. And that also makes it impossible to build up useful libraries from notebooks.</p>

<p>So far for the criticism - now let's make it constructive. At this point, you should watch the <a href="https://peervideo.net/videos/watch/9ed70819-6271-439f-b392-54f34b73c124" >screencast</a> before reading on. In the screencast, I show a simple data analysis both as a Jupyter notebook and as a demo prototype for what I consider the notebook of the future. This prototype is built using the <a href="https://gtoolkit.com/" >Glamorous Toolkit</a>, a very innovative software development environment for <a href="https://pharo.org/" >Pharo</a>, which is a modern descendant of <a href="https://en.wikipedia.org/wiki/Smalltalk" >Smalltalk</a>. If you want to play with this yourself, the code is <a href="https://github.com/khinsen/computational-documents-with-gtoolkit/" >on GitHub</a>. It's really just a demo, because the simplistic approach to organizing the computation that I have used there would not scale to real-life computations (it does a lot of needless recomputation). My plan is to implement the <a href="https://www.activepapers.org/" >ActivePapers</a> approach for managing the computations. GToolkit is alpha software as well. So none of this is ready for prime time, but it does show that better notebooks are possible.</p>

<p>Unlike today's notebooks, which are a sequence of code snippets and documentation paragraphs, the computational documents of my demo are <em>objects</em> in the sense of object-oriented programming. Each document contains code, input data, and computed data, which can be accessed from the outside and thus reused in client code. The narrative is merely an additional view into these items, which can present and discuss them in any order that seems suitable for explaining the work. Like with scientific articles, the narrative is typically written in the final stages of the work, once the basic code skeleton is working. In the case of my demo, I started out writing the two Pharo classes, before even installing GToolkit which was a bit unstable at the time.</p>

<p>Note that this &quot;one job, one object, one narrative&quot; approach has a beneficial side effect in encouraging people to do each job well, rather than just well enough for going on with the next job. My Jupyter/Python version of the data analysis only extracts the minimum information required from the input dataset, without even mentioning what else is in there. The GToolkit/Pharo version provides a complete description of the dataset, including the data that is not used at all in the second document that describes the analysis.</p>

<p>Finally, there are other interesting aspects of GToolkit (and Pharo) for computational science, but I will leave them for future posts. I will just mention that the &quot;inspectors&quot; (a term familiar to every Smalltalk developer but probably unknown to anyone else) are easily extensible. Adding a pane that provides yet another view of the document is a matter of writing a couple of lines of Pharo code. It's as if you could implement a new widget for Jupyter in a few lines of Python code right in your notebook.</p>

<p><strong>Update</strong>: There's a workaround for embedding figures (thanks to Tudor Gîrba for the hint!), which you can find in the current code version on GitHub.</p>

<h3>Comments retrieved from Disqus</h3>

<ul>
<li><i>Tomas:</i><p>Hi Konrad,<br>Is the screencast still available somewhere? The link won't load for me.</p><ul>
<li><i>Konrad Hinsen:</i><p>Unfortunately I didn't keep a copy, and <a href="http://peervideo.net" rel="nofollow noopener" title="peervideo.net">peervideo.net</a> seems to have disappeared. So far for my very first screencast... I did better the second time, so the screencast for [part 2](<a href="http://blog.khinsen.net/posts/2019/05/09/the-computational-notebook-of-the-future-part-2.html)" rel="nofollow noopener" title="http://blog.khinsen.net/posts/2019/05/09/the-computational-notebook-of-the-future-part-2.html)">http://blog.khinsen.net/pos...</a> is still around, and also more interesting in the long run.</p><ul>
<li><i>relbus:</i><p>Getting hit by linkrot really drives home all the points you raise about stability and reproducibility.</p></li>
<li><i>Tomas Fiers:</i><p>Just watched that one. It's fantastic. I have been thinking about a new  interface for computational science/play too, and this demo suddenly connected different loose threads (dependency graph, transclusion, intermediate value inspection)</p><ul>
<li><i>Konrad Hinsen:</i><p>Thanks for your feedback! Another line of work I recommend in this space is Sam Ritchie's dynamic notebooks: <a href="https://roadtoreality.substack.com/p/the-dynamic-notebook" rel="nofollow noopener" title="https://roadtoreality.substack.com/p/the-dynamic-notebook">https://roadtoreality.subst...</a></p></li>
</ul>
</li>
</ul>
</li>
</ul>
</li>
</ul>

<!-- Local Variables: -->

<!-- mode: markdown -->

<!-- End: -->
 ]]></description> </item><item> <title>Exploring Pharo</title> <link>https://blog.khinsen.net/posts/2018/12/19/exploring-pharo.html</link> <pubDate>2018-12-19</pubDate> <author>Konrad Hinsen</author> <guid isPermaLink="true">https://blog.khinsen.net/posts/2018/12/19/exploring-pharo.html</guid> <category><![CDATA[ computational science ]]></category> <description><![CDATA[ <p>One of the more interesting things I have been playing with recently is <a href="http://www.pharo.org/" >Pharo</a>, a modern descendent of Smalltalk. This is a summary of my first impressions after using it on a <a href="https://github.com/khinsen/leibniz-pharo/" >small (and unfinished) project</a>, for which it might actually turn out to be very helpful.</p>

<!-- more -->

<p>The first time I read about Smalltalk was in the <a href="https://archive.org/details/byte-magazine-1981-08" >August 1981 issue of Byte magazine</a>. Back then, I was a high school student and I had just invested my savings into my first home computer with characteristics  typical for the time: Z80 processor, 16 KB of memory, Microsoft Basic, data storage on cassette tapes. From that perspective, Smalltalk was a utopia. The revolutionary aspect of Smalltalk was its design as an integrated computing system that combined a language, a huge standard library, a development environment, and perhaps most of all a graphical user interface (GUI), which in fact was the ancestor of all of today's desktop-style GUIs. As a consequence, it required a high-quality graphics display, a mouse, and plenty of CPU power. None of that was available in commodity hardware.</p>

<p>In 1995, a friend passed me a floppy disk with Smalltalk-80 for the Atari ST family, and I could finally lay my hands on a working Smalltalk system. By then I had an Atari TT with the awesome big high-resolution black-and-white screen that was available for it. Just perfect for Smalltalk. I was very impressed by the system, which in many respects was superior to the Atari's native TOS/GEM combo, and even to the Unix workstations I had in the lab. But I couldn't actually use it for anything productive, because Smalltalk lived in a separate universe, unable to access any file on my hard disk. It wasn't more than an impressive demo of what computing could be like.</p>

<p>I have faint memories of playing with <a href="http://www.squeak.org/" >Squeak</a> a couple of years later, but I found its flashy colors and toy-inspired aesthetics so unpleasant that I didn't go very far. Pharo is actually a fork of Squeak that evolved into a different direction, with a more sober design that is much more to my liking. More importantly, some of the on-going developments in the Pharo community (in particular the <a href="https://gtoolkit.com/" >Glamorous Toolkit</a>) are much in line with my recent interest in the <a href="https://peerj.com/articles/cs-158/" >human-computer interface of computational science</a>. The 2018 session of the <a href="https://mooc.pharo.org/" >Pharo MOOC</a> was thus a good occasion to take a more serious look at this up-to-date incarnation of Smalltalk. The MOOC does a pretty good job at introducing Pharo to people with various interests, and it even includes some explanations of the internal workings of Pharo (look for the &quot;black magic&quot; label).</p>

<p>As a language, Smalltalk was revolutionary in the 1980s, but no longer today because many now better known languages have drawn on it for inspiration. If you know Python, for example, then Pharo won't surprise you much beyond the obvious and important syntactical differences. On the plus side, that means it is not much effort to do a first project in Pharo when coming from a Python background. But it also means that there isn't much to be gained from learning Pharo if you look at it as just another programming language. The really interesting part is not the language, but the user interface of Pharo the computing platform.</p>

<p>Pharo belongs to a rare species of computing environments that I think is best described by the label &quot;explorable&quot;. All of Pharo is implemented in Pharo itself, and all the source code is there for you to inspect and modify. But it's not just the code that is inspectable, it's all the objects that exist in memory. You can, for example, evaluate <code>Array instanceCount</code> to find out how many arrays exist at the moment (213464 when I tried). You can then obtain an arbitrarily chosen instance with <code>Array someInstance</code> and open a graphical inspector using <code>Array someInstance inspect</code>. You can also modify that array, without any idea of where it is used and for what, and thus wreak havoc with your system. For a more thorough approach to breaking Pharo, one of my favorites is <code>true become: false</code>, which replaces <code>true</code> by <code>false</code> and vice versa everywhere in the system. Pharo reacts much like I'd expect a human logician to react: it freezes instantly.</p>

<p>The complete state of a Pharo system, including all code and all objects, and thus even GUI elements such as open windows, can be saved with a click in what is called an image. This is obviously very convenient, but should not be used as the only strategy for storing code because images are fragile, as my example above illustrates. Consider an image your development environment rather than your code repository. In fact, Pharo supports and encourages storing code in Git repositories.</p>

<p>It is important to understand that explorability is not an accidental feature of Pharo (and other Smalltalk derivates), but has been a design goal from the start. Those interested in the history of this idea should look at <a href="https://en.wikipedia.org/wiki/Dynabook" >Alan Kay's Dynabook concept</a> and then take another step back in history to <a href="http://thedemo.org/" >Doug Englebart's &quot;Mother of all Demos&quot;</a>. The motivation behind all these developments is to make computing a tool not for performing tasks, but for augmenting human intellectual abilities. That goal is, unfortunately, very rare. In fact, the only other system I know of that was designed to be explorable is <a href="http://www.emacs.org/" >Emacs</a>, also with the goal of maximally empowering users. Once you look beyond superficialities, Pharo and Emacs are actually quite similar. Both are built around a high-level programming language with a rich library, a user-interface framework, and development tools with inspection capabilities. Emacs then comes with a text editor as the default application at startup. Pharo has no such default application, meaning that it is pretty useless before you write some code of your own. That is probably the main reason why Emacs became so much more popular - people use it as a text editor and only later, if ever, discover its empowering features.</p>

<p>Explorability is what interests me most in Pharo, because I believe that computational science sorely needs it, and that existing interactive interfaces such as REPLs or notebooks are far from sufficient. They impose a linear thread of exploration, whereas I want to be able to go off on a tangent, dig in deeper into a model, compare two datasets side-by-side, etc. Notebooks are also rigid exploration environments which can be extended only with major effort, if at all. Pharo offers a much richer exploration environment, and makes it easy to adapt to problem-specific needs (another reference to the <a href="https://gtoolkit.com/" >Glamorous Toolkit</a> is compulsory here). The snag is that Pharo doesn't offer much support for working with scientific data or scientific models (though I must admit that I haven't checked out <a href="https://github.com/PolyMathOrg/PolyMath" >PolyMath</a> yet). There are people who use Pharo for computational science (see e.g. <a href="https://github.com/UMMISCO/kendrick" >this epidemiology simulation platform</a>), so I suppose that there are useful tools I simply haven't looked at yet.</p>

<p>One power tool that I have already discovered (and explored interactively in Pharo) is the visualization library <a href="http://agilevisualization.com/" >Roassal</a>. It may superficially resemble various visualization libraries for JavaScript, but the big difference is that it integrates with the Pharo development and exploration tools. It is very easy to add a visualization pane to Pharo's object inspector and get a graphical view on your objects in addition to the standard browser-type interface for accessing an object's internals. And that means that you can easily use visualization as a tool in designing, implementing, and debugging code. It also helps a lot that the visualizations are themselves interactive. You can make them react to clicks, drags, and other events, and thus turn them into a user interface to your classes. For those familiar with Jupyter notebooks, it's as if you could implement interactive widgets in a few lines of Python code stored in your notebook.</p>

<p>I should perhaps say something about Pharo as a software development environment, but that aspect has been covered before by others in much more depth than I would do it myself. The demos in the Pharo MOOC are a good introduction, but for an overview of the possibilities, nothing beats <a href="https://www.youtube.com/watch?v=baxtyeFVn3w" >Aditya Siram's recent demo</a> aimed at adepts of functional programming languages.</p>

<p>After all that praise, I have to add some caveats. First of all, the Pharo community is tiny compared to, say, Python's, and therefore the choice in domain-specific libraries is rather small. Next, Pharo development moves on at a rapid pace, with the main consequence that nearly all available documentation is outdated, and what's left is often an update for insiders rather than an introduction for newcomers. No matter how explorable a system is, you need some higher-level information to use it productively, if only to know the jargon that permits you to start searching for stuff. As an example, when I tried to figure out how package dependency management works, I had to ask on the <a href="http://lists.pharo.org/mailman/listinfo/pharo-users_lists.pharo.org" >Pharo user mailing list</a> to learn that the keyword to look for is &quot;baseline&quot;. The three books <a href="http://books.pharo.org/updated-pharo-by-example/" >Pharo by Example</a>, <a href="http://deepintopharo.com/" >Deep into Pharo</a>, and <a href="http://books.pharo.org/enterprise-pharo/" >Enterprise Pharo</a>
are probably the best place to start looking for introductory essays, but even they are two versions behind the current one.</p>

<p>Finally, let me anticipate a reaction that I expect regular readers of this blog to have. How is it possible for someone who underlines the importance of reproducibility in every second post to say something positive about a system that relies on persistent state to the point that it cannot even be bootstrapped from its own source code? There are a couple of replies. Most importantly, reproducibility is not what I am looking for in Pharo. Every system has its good and bad sides, and I am turning to Pharo for its good sides, explorability and user interfaces. Second, the Pharo developers are working on this. And finally, decades of dealing with persistent yet fragile system images have lead the Smalltalk community to figure out ways to cope with the resulting problems (e.g. changesets) that may be worth studying for inspiration. Computational science suffers from a fundamental tension between the short-term need for interactivity and the long-term need for reproducibility. So far, no one has found a satisfying answer, so it's worth looking for inspiration in unusual places.</p>

<h3>Comments retrieved from Disqus</h3>

<ul>
<li><i>Richard Eng:</i><p>&gt; But it also means that there isn’t much to be gained from learning Pharo if you look at it as just another programming language.</p><p>I disagree. Syntactically, Pharo is a much nicer language to use than Python, for example. It's incredibly elegant. Conceptually, there's hardly anything to Pharo's syntax. By comparison, Python has much more syntax, and some of it is decidedly unnatural or unintuitive.</p><p>Python's OOP feels "bolted on," like an afterthought. It hides instance variables and methods "in plain sight" by prefixing their names with underscores. Yuck!</p><p>Instance method definitions must include "self" as the first argument. Their excuse? "Explicit is better than implicit." Give me a f*cking break.</p><p>Python's lambdas can only accept single expressions. What other language does this???</p><p>Python prefers half-open intervals. For example, <i>range(1,6)</i> gives you 1, 2, 3, 4, 5. This just doesn't feel right.</p><p>Python's local variable scoping rules are peculiar.</p><p>Python's Off-side rule syntax makes many developers uncomfortable, myself included.</p><p>This is all to say that using Python imposes a greater cognitive load on the programmer. With Pharo, there is no such load. Pharo is simplicity incarnate.</p><ul>
<li><i>Konrad Hinsen:</i><p>I agree with your criticisms of Python syntax, but in my personal experience of 25 years of Python coding, most of these are not serious issues in practice.</p><p>Everything can be improved, but only problems that its users perceive as serious have a chance of actually being addressed, and syntax is overall not perceived as a serious problem in the Python community. The only point you raise that I have seen discussed in the Python community is the limitations of lambda. Also the local variable scoping rules, but that's not really syntax.</p><p>There is of course the big issue of indentation that you mention, which probably deters some people to the point that they never become Python programmers. But that's in the realm of personal preferences, as many others just love it.</p></li>
</ul>
</li>
<li><i>Ben Coman:</i><p>btw, Pharo's version of Jupyter is Grafoscopio<br><a href="http://mutabit.com/grafoscopio/" rel="nofollow noopener" title="http://mutabit.com/grafoscopio/">http://mutabit.com/grafosco...</a></p></li>
<li><i>Torsten Bergmann:</i><p>You seem to have missed one primary point: Pharo since Pharo 7 IS NOW bootstrapped - so we can build a image from our own source code.<br>Even a more minmal one than the default download.</p><p>Check the folder "bootstrap" in <a href="https://github.com/pharo-project/pharo" rel="nofollow noopener" title="https://github.com/pharo-project/pharo">https://github.com/pharo-pr...</a><br>and also checkout <a href="https://github.com/guillep/PharoBootstrap" rel="nofollow noopener" title="https://github.com/guillep/PharoBootstrap">https://github.com/guillep/...</a></p><p>This is possible since 2016 already - see <a href="https://pharoweekly.wordpress.com/2016/07/24/a-taste-of-bootstrap/" rel="nofollow noopener" title="https://pharoweekly.wordpress.com/2016/07/24/a-taste-of-bootstrap/">https://pharoweekly.wordpre...</a></p><p>I followup on this on <a href="https://astares.blogspot.com/2018/12/exploring-pharo.html" rel="nofollow noopener" title="https://astares.blogspot.com/2018/12/exploring-pharo.html">https://astares.blogspot.co...</a></p><ul>
<li><i>Konrad Hinsen:</i><p>Thanks for pointing this out. I had at some time seen a comment about bootstrapping being planned as a feature for Pharo 7, but never an announcement of it being done. Since Pharo 7 hasn't been officially released yet, I had assumed it was still on the todo list.</p><ul>
<li><i>Ben Coman:</i><p>Pharo 7 is now released...<br><a href="http://forum.world.st/ANN-Pharo-7-0-released-td5093960.html" rel="nofollow noopener" title="http://forum.world.st/ANN-Pharo-7-0-released-td5093960.html">http://forum.world.st/ANN-P...</a></p></li>
<li><i>Dollface93:</i><p>khinsen briefly</p></li>
</ul>
</li>
</ul>
</li>
</ul>

<!-- Local Variables: -->

<!-- mode: markdown -->

<!-- End: -->
 ]]></description> </item><item> <title>Knowledge distillation in computer-aided research</title> <link>https://blog.khinsen.net/posts/2018/10/21/knowledge-distillation-in-computer-aided-research.html</link> <pubDate>2018-10-21</pubDate> <author>Konrad Hinsen</author> <guid isPermaLink="true">https://blog.khinsen.net/posts/2018/10/21/knowledge-distillation-in-computer-aided-research.html</guid> <category><![CDATA[ computational science ]]></category><category><![CDATA[ computer-aided research ]]></category><category><![CDATA[ scientific software ]]></category> <description><![CDATA[ <p>There is an important and ubiquitous process in scientific research that scientists never seem to talk about. There isn't even a word for it, as far as I now, so I'll introduce my own: I'll call it <em>knowledge distillation</em>.</p>

<p>In today's scientific practice, there are two main variants of this process, one for individual research studies and one for managing the collective knowledge of a discipline. I'll briefly present both of them, before coming to the main point of this post, which is the integration of <em>digital</em> knowledge, and in particular software, into the knowledge distillation process.</p>

<!-- more -->

<p>The first variant is performed by individual researchers or closely collaborating teams who, starting from the raw information of their lab notebooks, describing methods applied and results obtained, write a journal article summarizing all of this information into an illustrated narrative that is much easier to digest for their fellow scientists. This narrative contains what the authors consider the essence of their work, leaving out what they consider technical details. Moreover, the narrative places the work into its wider scientific context. In a second step, the authors condense the article into an even smaller abstract, supposed to tell readers at a glance if the article is of interest to them without going into any details. This process can be illustrated as a pyramid:</p>

<p><img src="/static/knowledge-pyramid-1.svg" alt="" /></p>

<p>At the bottom we have all the gory details, one level up the distilled version for communication, and at the top the minimal summary for first contact with a potential reader. It is not uncommon to have an additional layer between the bottom two, often published as &quot;supplementary material&quot;.</p>

<p>Whereas authors work from the bottom to the top of this pyramid, readers work down from the top, gaining a more detailed understanding at each step. Until not so long ago, this was a two-step process: after the abstract, they could move on to the paper, but after that they had to contact the authors for obtaining more details, and the authors might well not care to reply. The Open Science movement has made some progress in pushing for more transparency by making deeper information layers available for critical inspection, in particular raw datasets and the source code for the software used to process them. The situation is very much in flux as various scientific disciplines are working out which information can and should be shared, and how. The maximal level of openness is known as <a href="https://en.wikipedia.org/wiki/Open-notebook_science" >Open Notebook science</a>, which basically means making the whole pyramid public. Note, however, that giving access to the base of pyramid does not make the knowledge distillation steps superfluous. Readers would succumb to information overload if exposed to all the details without a proper introduction in the form of distilled knowledge. In fact, <em>most</em> readers don't want to anything else than the distilled version. </p>

<p>The second variant of knowledge distillation is performed collectively by domain experts who summarize the literature of their field into review articles and then into monographs or textbooks for students. The pyramid diagram is very similar to the first variant's:</p>

<p><img src="/static/knowledge-pyramid-2.svg" alt="" /></p>

<p>It's really just the same process at another scale: knowledge transfer about a discipline, rather than about a specific study.</p>

<p>So far for good old science - let's move to the digital age. The base of our first pyramid now contains code and digital datasets. Some of the code was written by the authors of the study for this specific project and typically takes the form of scripts, workflows, or notebooks. This is complemented by the dependencies of this project-specific code - see my <a href="http://blog.khinsen.net/posts/2017/01/13/sustainable-software-and-reproducible-research-dealing-with-software-collapse.html" >post on software collapse</a> for an analysis of the full software stack. Full openness requires making all of this public, with computational reproducibility serving as a success indicator. If other researchers can re-run the software and get the same results, they possess all the information one could possibly ask for, from a computational point of view.</p>

<p>But as with Open Notebook science, making all the details open is not sufficient. Readers will again succumb to information overload when exposed to a complex software stack and digital datasets whose precise role in the study is not clear. Information overload is even a much more serious problem with software because the amount of detail that software source code contains is orders of magnitude bigger than what can be written down in a lab notebook.</p>

<p>So how do we distill the scientific knowledge embedded in software? The bad news is that we don't yet have any good techniques. What we find in journal articles when it comes to describing computational methods is very brief summaries in plain English, closer to the abstract level than to the journal article level. As a consequence, computational methods remain impenetrable to the reader who does not have prior experience with the software that has been applied. There is no way to work down the pyramid, readers have to acquire the base level skills on their own. Worse, there is no way to stop at the middle level of the pyramid and yet have a clear understanding of what is going on.</p>

<p>The recent years have seen a flurry of research and development concerning the publication of software and computations. One main focus has been the reproducibility of results, another the sustainability of scientific software development, and a third one the readability of computational analyses. This last focus has most notably led to the development of computational notebooks (such as Jupyter, Rmarkdown, Emacs/Org-mode and many more), which embed code and results in a narrative providing context and explanations. Notebooks are occasionally put forward as &quot;the paper of the future&quot;, but in view of the knowledge pyramid, that's not what they are. They are closer to the digital age equivalent of lab notebooks, especially when combined with version control to capture the time evolution of their contents. The real paper of the future must contain a <em>distilled</em> version of the source code.</p>

<p>It is interesting to examine why notebooks have been so successful in some scientific domains. First of all, they are a much better human-readable presentation of source code than anything we had before, with the exception of the related idea of literate programming which I expect to see a come-back as well. Next, in domains where computational studies tend to be linear sequences of well-known standard operations, such as statistical analyses, the notebook is very similar to a distilled computational protocol, because the technical details are mostly hidden in libraries. These libraries also contain significant scientific knowledge, but because these methods are well-known, they have in a way been distilled in the form of textbooks.</p>

<p>More generally, though, notebooks contain both too little and too much information to qualify as distilled descriptions of computational studies. Too little because much scientific knowledge is hidden in the notebook's dependencies, which are not documented at the same level of readability (which is why I believe that literate programming has a future). Too much because they still expose technical details to the reader that is more a hindrance than a help for understanding.</p>

<p>How, then, should the paper of the future present distilled computational knowledge? I see three main requirements:</p>

<ol>
<li>It must be possible to explain and discuss individual models, approximations, or algorithms without the constraints of an efficient working implementation.</li>
<li>These models, approximations, and algorithms must be presented in a sufficiently precise form that automatic verification procedures can ensure that the source code at the base level of the pyramid actually implements them.</li>
<li>Suitable user interfaces must allow a reader to explore these models, approximations, and algorithms through concrete examples.</li>
</ol>

<p>The first requirement says that clarity of exposition must take absolute precedence over any technical considerations of software technology. The intrinsic complexity of computational methods makes understanding hard enough, so everything possible must be done to keep accidental complexity out of the way.</p>

<p>The second requirement ensures that the conformity between the distilled and the detailed representations of a computational protocol can be verified by computers rather than by humans. Humans aren't very good at checking that two complex artifacts are equivalent.</p>

<p>The third requirement is motivated by the observation that a real understanding of a computational method, which is usually too lengthy to be actually performed manually, requires both reading code and observing how it processes simple test cases. Observation is not limited to the final outcome, it may well be necessary to provide access to intermediate results.</p>

<p>To get an idea of what &quot;suitable user interfaces&quot; might look like, it's worth looking at the <a href="https://explorabl.es/" >explorable explanations</a> and the <a href="http://www.complexity-explorables.org/" >Complexity Explorables</a> Web sites. Note, however, that none of these exploration user interfaces provide easy access to a precise formulation of the underlying models or algorithm. They exist in the form of JavaScript source code embedded in the Web site, but that's not exactly a reader-friendly medium of expression. Another interesting line of development is happening in the <a href="https://pharo.org/" >Pharo</a> community (Pharo being a modern descendent of Smalltalk), e.g. the idea of <a href="http://scg.unibe.ch/research/moldableinspector" >moldable inspectors</a>, which are user interfaces specifically designed to explore a particular kind of object, which in the O-O tradition combines code and data.</p>

<p>Back to requirements 1 and 2: we want a precise and easily inspectable description that can be embedded into an explanatory narrative. We also want to be sure that it actually corresponds to what the user interface lets us explore, and to what the software implementation applies efficiently to real-world problems. I am not aware of any existing technology that can fulfill this role, although there many that were designed with somewhat different goals in mind that can serve as guidelines, in particular the various <a href="https://en.wikipedia.org/wiki/Modeling_language" >modeling</a> and
<a href="https://en.wikipedia.org/wiki/Specification_language" >specification languages</a>.</p>

<p><img src="/static/knowledge-pyramid-3.svg" alt="" /></p>

<p>My own research into this problem had led to the concept of <a href="http://sjscience.org/article?id=527" >digital scientific notations</a>, and I am currently designing such a notation for physics and chemistry, called <a href="https://github.com/khinsen/leibniz" >Leibniz</a>. A <a href="https://peerj.com/articles/cs-158/" >first report</a> on this research has been published earlier this year. Leibniz is mainly inspired by traditional mathematical notation concerning the way it is embedded into a narrative, and from specification languages in terms of semantics. Some relevant features of Leibniz for expressing distilled knowledge are</p>

<ul>
<li><p>Its highly declarative nature. Leibniz code consists of short declarations that can be written down in (nearly) arbitrary order, making them easy to embed into a narrative, much like mathematical expressions and equations.</p></li>
<li><p>Its foundation in term rewriting (the same foundation adopted by most computer algebra systems). Among other advantages, this allows Leibniz code to concentrate on one aspect of a model or algorithm while leaving other aspects unspecified.</p></li>
<li><p>Its restriction to a single universal (but often inefficient) data structure.</p></li>
</ul>

<p>These features mainly address requirement 1. As for requirement 2, Leibniz uses XML for its syntax and has very simple semantics, making it easy to write libraries that read and execute Leibniz code which in turn make it easy to integrate Leibniz into scientific software of all kinds. Only Leibniz development environments have to deal with the more complex user-facing syntax requiring a specific parser.</p>

<p>Leibniz does not try to address requirement 3, but since it meets requirement 2, it doesn't get in the way of people wishing to build exploration and inspection user interfaces for Leibniz-based models and algorithms.</p>

<p>Leibniz is still very much experimental, and I am not at all sure that it will turn out to be useful in its current form. In fact, I am almost certain that it will require modification to be of practical use. If that doesn't scare you off, have a look at the <a href="http://khinsen.net/leibniz-examples/" >example collection</a> to get an idea of what Leibniz can do and what it looks like. Feedback of any kind is more than welcome!</p>

<!-- Local Variables: -->

<!-- mode: markdown -->

<!-- End: -->
 ]]></description> </item><item> <title>Literate computational science</title> <link>https://blog.khinsen.net/posts/2018/07/26/literate-computational-science.html</link> <pubDate>2018-07-26</pubDate> <author>Konrad Hinsen</author> <guid isPermaLink="true">https://blog.khinsen.net/posts/2018/07/26/literate-computational-science.html</guid> <category><![CDATA[ computational science ]]></category><category><![CDATA[ computer-aided research ]]></category><category><![CDATA[ scientific software ]]></category> <description><![CDATA[ <p>Since the dawn of computer programming, software developers have been aware of the rapidly growing complexity of code as its size increases. Keeping in mind all the details in a few hundred lines of code is not trivial, and understanding someone else's code is even more difficult because many higher-level decisions about algorithms and data structures are not visible unless the authors have carefully documented them and keep those comments up to date.</p>

<!-- more -->

<p>The main angle of attack to keep software source code manageable has been the development of ever more sophisticated programming languages and development paradigms, but it is not the only one. Another approach was initiated by Donald Knuth's invention of <a href="http://literateprogramming.com/" >literate programming</a>. Its basic idea is to invert the roles of code and documentation. Rather than adding doxumentation as annotations to the code, literate programming puts an explanatory narrative about the software at the center of the software author's attention. Code snippets are embedded into this narrative, much like mathematical formulas are embedded into scientific articles and textbooks.</p>

<p>Literate programming never gained much popularity, for reasons that, to the best of my knowledge, have never been explored systematically. Insufficient tool support is often cited as an obstacle, but I suspect that the mismatch between the structure of the narrative and the language-imposed structure of the code is equally problematic. Programmers need to name code blocks and then assemble them into valid source code by hand. My own experience is that it's usually easier to write and test the code first and then re-create it as a literate program, but this doesn't lead to code that naturally fits the narrative.</p>

<p>The main argument in support of this suspicion is the much higher popularity of a variant of literate programming that both adds and removes features compared to Knuth's original system. Computational notebooks (implemented e.g. by <a href="https://jupyter.org/" >Jupyter</a>) document a computation rather than a piece of software. In addition to code, they embed input data and results into the narrative, but they also restrict code to a linear assembly of code cells executed in sequence. This limitation removes the need to name and assemble code blocks.</p>

<p>An idea I have been exploring recently is to take another step towards letting the explanatory narrative take center stage, by designing a formal language specifically for embedding into such a narrative. However, my language called <a href="https://github.com/khinsen/leibniz" >Leibniz</a> is not a programming language. I call it a digital scientific notation to emphasize its intended use in the documentation of scientific models and methods, but in terms of computer science terminology it is a <a href="https://en.wikipedia.org/wiki/Specification_language" >specification language</a> designed for models expressed in terms of equations and algorithms. Leibniz code <em>must</em> be embedded into a narrative, although the Leibniz authoring environment also extracts a machine-readable version as an XML file for easy processing by scientific software.</p>

<p>For getting an overview of Leibniz, I suggest to look first at a <a href="http://khinsen.net/leibniz-examples/examples/leibniz-by-example.html" >simple example</a>, and then read my <a href="https://peerj.com/articles/cs-158/" >paper</a> describing Leibniz and the problems it is designed to solve, which just appeared in PeerJ CompSci (Open Access like all of PeerJ). The explanations in the paper should prepare you for a look at the currently <a href="http://khinsen.net/leibniz-examples/examples/mass-on-a-spring.html" >most extensive example</a>, which documents, for a toy problem, the full path of assumptions and approximations that lead from a theoretical framework (Newton's equations of motion) to a numerical algorithm, with all models along the way being machine-readable.</p>

<p>As the paper explains, Leibniz is best described as a research prototype at the current stage. It has known limitations that make its application to complex real-world problems a bit challenging. However, I am confident that these limitations can be overcome, and that Leibniz will be suitable for a wide range of scientific models and methods, starting with mathematical equations and ending with literate workflows. As Silicon Valley startups would say, make sure you won't be left behind by the Leibniz revolution!</p>

<!-- Local Variables: -->

<!-- mode: markdown -->

<!-- End: -->
 ]]></description> </item><item> <title>Scientific software is different from lab equipment</title> <link>https://blog.khinsen.net/posts/2018/05/07/scientific-software-is-different-from-lab-equipment.html</link> <pubDate>2018-05-07</pubDate> <author>Konrad Hinsen</author> <guid isPermaLink="true">https://blog.khinsen.net/posts/2018/05/07/scientific-software-is-different-from-lab-equipment.html</guid> <category><![CDATA[ computational science ]]></category><category><![CDATA[ computer-aided research ]]></category><category><![CDATA[ scientific software ]]></category> <description><![CDATA[ <p>My most recent paper submission (<a href="https://peerj.com/preprints/26633/" >preprint</a> available) is about improving the verifiability of computer-aided research, and contains many references to the related subject of reproducibility. A reviewer asked the same question about all these references: isn't this the same as for experiments done with lab equipment? Is software worse? I think the answers are of general interest, so here they are.</p>

<!-- more -->

<p>First of all, an inevitable remark about terminology, which is still far from standardized (see <a href="https://arxiv.org/abs/1802.03311" >this preprint</a> and <a href="https://doi.org/10.3389%2Ffninf.2017.00076" >this article</a>  for two recent contributions to the controversy). I will use the term &quot;computational reproducibility&quot; in its historically first sense introduced by Claerbout in 1992, because it seems to me that this is currently the dominant usage. <em>Reproducing</em> a computation thus means running the same software on the same data, though it's usually done by a different person using a different computer. In contrast, <em>replication</em> refers to solving the same problem using different software. This terminological subtlety matters for the following discussion, because experimental reproducibility is actually more similar to replicability, rather than reproducibility, in the computational case.</p>

<p>There are two aspects in which I think scientific software differs significantly from lab equipment:</p>

<ol>
<li>Its characteristics as a human-made artifact</li>
<li>Its role in the process of doing science.</li>
</ol>

<h2>Software is more complex and less robust than lab equipment</h2>

<p>The first point I raised in my paper is the epistemic opacity of automated computation. Quote:</p>

<blockquote>
<p>The overarching issue is that performing a computation by hand, step
by step, on concrete data, yields a level of understanding and
awareness of potential pitfalls that cannot be achieved by reasoning
more abstractly about algorithms. As one moves up the ladder of
abstraction from manual computation via writing code from scratch,
writing code that relies on libraries, and running code written by
others, to having code run by a graduate student, more and more
aspects of the computation fade from a researcher's attention. While
a certain level of epistemic opacity is inevitable if we want to
delegate computations to a machine, there are also many sources of
accidental epistemic opacity that can and should be eliminated in
order to make scientific results as understandable as possible.</p>
</blockquote>

<p>The reviewer asks: isn't this the same as when doing experiments using lab equipment constructed by somebody else? My answer is no.</p>

<p>Let's do a little thought experiment, introducing Alice and Bob as virtual guinea pigs. Alice is an experienced microscopist, Bob is an experienced computational scientist. We give Alice a microscope she hasn't seen before, and ask her to evaluate if it is suitable for her research. We give Bob a simulation program (with source code and documentation) that he hasn't seen before, and ask him the same question.</p>

<p>My expectation is that Alice will go off an do some tests with samples that she knows well, and perhaps do some measurements on the microscope. After that, she will tell us for which aspects of her work she can use this new microscope. Meanwhile, Bob will be scratching his head while trying to figure out how to deal with our question.</p>

<p>One reason for the difference is that a microscope is a much simpler artifact than a simulation program. While it is certainly difficult to design and produce a good microscope, from a user's perspective its characteristics can be described by a handful of parameters, and its quality can be evaluated by a series of test observations. Software, on the contrary, can do almost anything. A typical simulation program has lots of options, whose precise meaning isn't always obvious from its documentation. More importantly, no two simulation programs have identical options. Even the most experienced user of simulation software A falls back to near-novice status when given simulation software B.</p>

<p>A more subtle difference is that microscopes, and lab equipment in general, are designed to be robust against small production defects and small variations of environmental conditions. Such small variations cause only small changes in the generated images. With software, on the other hands, all bets are off. A one-character mistake in the source code can cause the program to crash, but also to produce arbitrarily different numbers. In fact, there is no notion of similarity and thus of small variations for software. For a more detailed discussion, see my <a href="http://doi.ieeecomputersociety.org/10.1109/MCSE.2016.67" >CiSE article</a> on this topic. This is why you can evaluate the quality of a microscope using a few judiciously chosen samples, whereas no amount of test runs can assure you that a piece of software is free of bugs. Unless you can afford to test <em>all possible</em> inputs, of course, but then you don't really need the software.</p>

<p>These two differences explain why Alice knows how to evaluate the microscope, whereas Bob doesn't know where to start. He might look at the documentation and the test cases to see if the program is meant to be used for the kind of work he does. But the documentation almost certainly lacks some important details of the approximations that are made in the code and that matter for Bob's work. Moreover, he would still have to check that the software has no serious bugs related to the functionality he plans to use. Without knowing the implemented algorithms in detail, he cannot even anticipate what bugs to watch out for.</p>

<p>Bob could also choose a very different approach and judge the software by quality standards from software engineering. Is the code well structured? Does it have unit and integration tests? These are the criteria that software journal ask their reviewers to evaluate (e.g. the <a href="http://dx.doi.org/10.6084/m9.figshare.795303" >Journal of Open Research Software</a> or the <a href="http://joss.theoj.org/about#reviewer_guidelines" >Journal of Open Source Software</a>). Statistically, they are probably related to the risk of encountering bugs (if anyone knows about research into this question, please leave a comment!). But even the most meticulous developers make mistakes, and, more importantly, may have different applications in mind than those that Bob cares about.</p>

<p>Finally, Bob could do what in my experience (and also according to <a href="https://arxiv.org/abs/1605.02265v1" >this study</a> ) most scientists do in choosing research software: they use what their colleagues use. Bob would then send a few emails asking if anyone he knows uses this software and is happy with it. This is a reasonable approach if you can assume that your colleagues, or at least a sizable fraction of them, are in a better position to judge the suitability of a piece of software than yourself. But if everyone adopts this approach, it becomes a popularity contest with little intrinsic value (see <a href="https://doi.org/10.1126%2Fscience.1231535" >this paper</a> for a detailed example). In any case, it is not a way to actually answer our question.</p>

<p>In the end, if you really want to know if your software does what you expect it to do, you have to go through every line of the source code until you understand what it does. You are then at the minimal level of epistemic opacity that you can attain without actually doing the computations by hand. Unfortunately, in the case of complex wide-spectrum software, this is likely to be much more effort than writing your own special-purpose software.</p>

<p>The solution I propose in my paper is to use human-readable formal specifications as a form of documentation that is rigorous and complete, and can be used as a reference to verify the software against. The idea is to have a statement of the implemented algorithms that is precise and complete but as simple as possible, without being encumbered by considerations such as performance. Note that I don't know if this will turn out to be possible - my work is merely a first step into that direction that, to the best of my knowledge, has not been explored until now.</p>

<h2>Software is about models, lab equipment is about observations</h2>

<p>A popular meme in explaining science describes it as founded on two pillars, experiment and theory. Some people propose to add computation and/or simulation as a third pillar, and data mining as a fourth, although these additions remain controversial. In my opinion, they are misguided by a bad identification of the initial pillars. They are not experiment and theory, but observations and models. We often speak of computational experiments when doing simulations, and there are good reasons for the analogy, but it is important to keep in mind that these are experiments on models, not on natural phenomena.</p>

<p>Observations provide us with information about nature, and models allow us to organize and generalize this information. In this picture, computation has two roles: evaluating the consequences of a model, and comparing them to observations. Simulation is an example for the first role, data mining for the second. Both of these roles predate electronic computers, they simply received more modest labels such as &quot;solving differential equations&quot; or &quot;fitting parameters&quot; in the past.</p>

<p>In the context of reproducibility and verifiability, it is important to realize that there is no symmetry between these two pillars. Nature is the big unknown that we probe through observations. To do this, we use lab equipment that can never be perfect, for two reasons: first, it is constructed on the basis of our imperfect understanding of nature, and second, our control of matter is limited, so we cannot produce equipment that behaves precisely as we imagine it. Models, on the other hand, are symbolic artifacts that are under our precise control. We can formulate and communicate them without any ambiguity, if only we are careful enough.</p>

<p>Because of these very different roles of observations and models, computational reproducibility has no analogue in the universe of observations. It is almost exclusively a communication issue, the one exception being the non-determinism in parallel computing that we accept in exchange for getting results faster. Non-determinism aside, if Alice cannot reproduce Bob's computations, that simply means that Bob has not been able or willing to describe his work in enough detail for Alice to re-do it identically. There is no fundamental obstacle to such a description, because models and software are symbolic artifacts. We actually know how to achieve computational reproducibility, but we still need to make it straightforward in practice.</p>

<p>Similarly, if Alice cannot verify that Bob's computation solves the problem he claims them to solve, this means that Bob has not succeeded in explaining his work clearly enough for Alice to understand what is going on. An unverifiable computation is thus very similar to a badly written article. The big difference in practice is that centuries of experience with writing have lead to accepted and documented standards of good writing style, whereas after a few decades of scientific computing, we still do not know how to expose complex algorithms to human readers in the most understandable way. My paper is a first small step towards developing appropriate techniques.</p>

<p>Experimental reproducibility, on the other hand, is an ideal that can never be achieved perfectly, because no two setups are strictly the same. Verifiability is equally limited because observations can never be repeated identically, even when done with the same equipment. Reproducibility is a quality attribute much like accuracy, precision, or cost. Tradeoffs between these attributes are inevitable, and have to be made by each scientific discipline as a function of what its main obstacles to progress are.</p>

<p>Science has been adjusting to the inevitable limits of observations since its beginnings, whereas the issue of incomplete model descriptions has come up only with the introduction of computers permitting to work with complex models. We don't know how yet if non-verifiable models are a real problem or not. However, as a theoretician I am not comfortable with the current situation. Models can be simple or complex, good or bad, grounded in solid theory or ad-hoc, but they should not be fuzzy. In particular not for complex systems, where it is very hard to foresee the consequences of minor changes.</p>

<!-- Local Variables: -->

<!-- mode: markdown -->

<!-- End: -->
 ]]></description> </item><item> <title>Scientific communication is a research problem</title> <link>https://blog.khinsen.net/posts/2018/04/09/scientific-communication-is-a-research-problem.html</link> <pubDate>2018-04-09</pubDate> <author>Konrad Hinsen</author> <guid isPermaLink="true">https://blog.khinsen.net/posts/2018/04/09/scientific-communication-is-a-research-problem.html</guid> <category><![CDATA[ computer-aided research ]]></category> <description><![CDATA[ <p>A <a href="https://www.theatlantic.com/amp/article/556676/" >recent article in &quot;The Atlantic&quot;</a> has been the subject of many comments in my Twittersphere. It's about scientific communication in the age of computer-aided research, which requires communicating computations (i.e. code, data, and results) in addition to the traditional narrative of a paper. The article focuses on computational notebooks, a technology introduced in the late 1980s by <a href="https://www.wolfram.com/mathematica/" >Mathematica</a> but which has become accessible to most researchers only since <a href="http://jupyter.org/" >Project Jupyter</a> (formerly known as the IPython notebook) started to offer an open-source implementation supporting a wide range of programming languages. The gist of the article is that today's practice of publishing science in PDF files is obsolete, and that notebooks are the future.</p>

<!-- more -->

<p>One <a href="https://twitter.com/khinsen/status/982339472036593672" >interesting follow-up thread on Twitter</a> explored if any scientific papers had actually been published in the form of Jupyter notebooks. It seems that the answer is no. Notebooks are published as supplementary material to standard papers, or as informal communication outside of the official scientific record, in particular for teaching purposes, but no one could point to a paper indexed in any article database that was written as a Jupyter notebook. As to the question of <em>why</em> his hasn't happened, all answers remain speculative in the absence of research into the subject. Publishers' format requirements are certainly a part of the problem, but limitations of today's notebook format also matter. In particular, notebooks lack support for bibliographies and for cross referencing.</p>

<p>Another interesting follow-up is <a href="https://metarabbit.wordpress.com/2018/04/08/the-scientific-paper-of-the-future-is-probably-a-pdf/" >a blog post by Luis Pedro Coelho</a> who predicts that PDFs will stay with us for many years to come, because none of the proposed successors is actually mature enough for use in real life. In particular, he points out the complexity and lack of longevity and stability of most of today's computational tools. My personal experience is very similar to his. He also asks the very relevant question if a notebook-style presentation of results and computations is actually a good idea in the context of a scientific paper. I suspect nobody can provide an evidence-based answer at this time.</p>

<p>As these discussions illustrate, scientific communication about computer-aided research remains a research problem. As a community, we do not know how to explain, share, or review computer-aided research in a satisfactory way. Most of us agree that PDFs are no longer sufficient, and that we need to share code and data. However, we do not yet have good enough practices for doing so, at least not for all practically relevant situations. We do not know either if sharing code and data will actually be sufficient to enable effective communication. It is well possible that we will also need to develop practices for better <em>explaining</em> computations to each other, and have them peer reviewed in some form.</p>

<p>From this point of view, all of today's technology, be it Jupyter, <a href="https://orgmode.org/worg/org-contrib/babel/" >Org mode</a>, <a href="https://yihui.name/knitr/" >knitr</a> or similar tools, should best be seen as support tools for performing experiments in scientific communication. What is still largely missing is systematic research that evaluates these experiments with the goal of summarizing the collective experience and drawing conclusions. There are promising starts, such as  <a href="https://hal.archives-ouvertes.fr/hal-01676633" >this study on the actual use of Jupyter notebooks</a>, but their number is negligible compared to the number of articles proclaiming that this or that technology is going to revolutionize scientific communication without providing any tangible evidence.</p>

<p>I think it is time for the scientific community to acknowledge that it doesn't really know how to communicate computer-aided research effectively, and encourage research into the question. Experimenting with the various proposed approaches is essential, but analyzing the outcomes of these experiments is essential as well. In my opinion, we currently over-emphasize tool development, community building, and teaching, which are all directed at <em>implementing</em> new practices, but neglect research into what these practices actually should <em>be</em>. Future generations of scientists may well remember today's hot developments as sources of technical debt.</p>

<p>A personal anecdote provide and illustration of the dominating attitude. My <a href="http://www.activepapers.org/" >ActivePapers</a> project is clearly labeled as research. Its goal is to explore how non-trivial computations (long run times, big data sets) can be performed, archived, and published reproducibly. For first results, see <a href="https://f1000research.com/articles/3-289/v3" >this paper</a>. Whenever I present this project, I know there is one question someone in the audience will ask: What are your plans for increasing your user base? I answer that I am doing research and not product development, and that I am not recruiting users but at best collaborators. This always causes surprise and sometimes animated discussions. It almost seems that doing research on doing research is a strange idea for professional scientists. On the other hand, my other research project on scientific communication, the digital scientific notation <a href="https://github.com/khinsen/leibniz" >Leibniz</a>, does not generate this kind of reaction, but then it hasn't see that much exposure yet. It explores the question of how we can explain a complex computation in a way that allows readers to verify its scientific assumptions. For a first account, see <a href="https://peerj.com/preprints/26633/" >this preprint</a>.</p>

<p>Finally, readers might be interested in two of my earlier blog posts that are related to notebooks:</p>

<ul>
<li><p><a href="https://khinsen.wordpress.com/2015/09/03/beyond-jupyter-whats-in-a-notebook/" >&quot;Beyond Jupyter: what’s in a notebook?&quot;</a> looks at notebooks as digital documents, focusing on the information content rather than on the tool for doing computations.</p></li>
<li><p><a href="http://blog.khinsen.net/posts/2015/12/08/from-facts-to-narratives.html" >&quot;From facts to narratives&quot;</a> explores various approaches, one of them being notebooks, to combining formal elements of a computation (code, date) with a explanatory narrative.</p></li>
</ul>

<!-- Local Variables: -->

<!-- mode: markdown -->

<!-- End: -->
 ]]></description> </item><item> <title>What can we do to check scientific computation more effectively?</title> <link>https://blog.khinsen.net/posts/2018/03/07/what-can-we-do-to-check-scientific-computation-more-effectively.html</link> <pubDate>2018-03-07</pubDate> <author>Konrad Hinsen</author> <guid isPermaLink="true">https://blog.khinsen.net/posts/2018/03/07/what-can-we-do-to-check-scientific-computation-more-effectively.html</guid> <category><![CDATA[ computational science ]]></category><category><![CDATA[ computer-aided research ]]></category><category><![CDATA[ scientific software ]]></category> <description><![CDATA[ <p>It is widely recognized by now that software is an important ingredient to modern scientific research. If we want to check that published results are valid, and if we want to build on our colleagues' published work, we must have access to the software and data that were used in the computations. The latest high-impact statement along these lines is a <a href="https://www.nature.com/articles/d41586-018-02741-4" >Nature editorial</a> that argues that with any manuscript submission, authors should also submit the data <em>and</em> the software for review. I am all for that, and I hope that more journals will follow.</p>

<!-- more -->

<p>However, we must also be aware of the inherent limitations of simply including software in peer review. With the exception of small and focused software, of the kind we typically have in replications submitted to <a href="http://rescience.github.io/" >ReScience</a> (one of the very few scientific journals that actually does code review), the task of evaluating scientific software is so enormous that asking a single person to do it within two weeks is simply unreasonable. For that reason, journals specialized in software papers, such as the <a href="https://openresearchsoftware.metajnl.com/" >Journal of Open Research Software</a> or the
<a href="https://joss.theoj.org/" >Journal of Open Source Software</a>, limit the reviewing process to more easily verifiable formal aspects, such as the presence of documentation and the use of appropriate software engineering techniques. Which is, of course, much better than nothing, but it isn't enough.</p>

<p>A few months ago I wrote about the <a href="http://blog.khinsen.net/posts/2017/05/04/which-mistakes-do-we-actually-make-in-scientific-code.html" >kinds of mistakes that we tend to make in scientific computing</a>. In my experience (I'd love to see a systematic study on this), most mistakes are due to discrepancies between what a paper describes and what is actually computed. This covers simple mistakes such as a wrong sign in a computed formula (such as in the widely publicized case of <a href="http://doi.org/10.1126/science.314.5807.1856" >protein structure retractions</a>), or a typo in the input parameter file for a simulation program, but also more complex situations such as the <a href="http://doi.org/10.1073/pnas.1602413113" >inflated false-positive rates in fMRI studies</a> that also made it into the headlines of science news. In this case, the fundamental issue was a mismatch between the methods implemented in the software and the methods that would have been appropriate for many typical use cases of the software. Put differently, the users of the software did not fully understand what exactly the software did. They trusted the software authors blindly to do &quot;the right thing&quot;, whatever that was. And they were probably reinforced in their blind trust by the fact that many of their colleagues used the same software. It's the research version of &quot;nobody ever got fired for buying IBM equipment&quot;.</p>

<p>Code review is an important step to a better verification of scientific computations, but in the cases I just described its utility is very limited. Neither the wrong sign in the protein crystallography code nor the not-quite-universally-applicable statistical analysis method used by the fMRI software would be detectable by software engineering methods. In the first case, the code would have to be compared to the set of mathematical formulas on which it was based, a task requiring expert knowledge in both crystallography and programming, plus a lot of time - much more than what a reviewer can typically invest. In the second case, code review cannot do anything at all. Only the reviewers of the application papers could have spotted the inappropriateness of the methods - but why should they be expected to be more knowledgeable about the pitfalls than the authors?</p>

<p>An important but not yet widely recognized aspect of these situations is that today's scientific software incorporates a significant amount of scientific knowledge that is very difficult to access and verify by users and reviewers. The translation of mathematical equations in a paper into efficient computer code is almost a form of encryption from the point of view of scientific knowledge transformation. Extracting equations from software source code is not much easier than extracting source code from compiled binaries.</p>

<p>But can we do anything about this? I believe we can, but it will require a serious rethinking of the way we use computers to do research. My first explorations in this direction are described in a paper that is now available as a <a href="https://peerj.com/preprints/26633/?td=bl" >PeerJ preprint</a>. Please have a look, and don't hesitate to ask a question or leave other feedback of any kind!</p>

<!-- Local Variables: -->

<!-- mode: markdown -->

<!-- End: -->
 ]]></description> </item><item> <title>Data science in ancient Greece</title> <link>https://blog.khinsen.net/posts/2017/12/19/data-science-in-ancient-greece.html</link> <pubDate>2017-12-19</pubDate> <author>Konrad Hinsen</author> <guid isPermaLink="true">https://blog.khinsen.net/posts/2017/12/19/data-science-in-ancient-greece.html</guid> <category><![CDATA[ science ]]></category> <description><![CDATA[ <p>Data science is usually considered a very recent invention, made possible by electronic computing and communication technologies. Some consider it the  <a href="https://www.microsoft.com/en-us/research/publication/fourth-paradigm-data-intensive-scientific-discovery/" >fourth paradigm</a> of science, suggesting that it came after three other paradigms, though the whole idea of distinct paradigms remains controversial. What I want to point out in this post is that the principles of data science are much older than most of today's practitioners imagine. Let me introduce you to <a href="https://en.wikipedia.org/wiki/Apollonius_of_Perga" >Apollonius</a>, <a href="https://en.wikipedia.org/wiki/Hipparchus" >Hipparchus</a>, and <a href="https://en.wikipedia.org/wiki/Ptolemy" >Ptolemy</a>, who applied these principles about 2000 years ago.</p>

<!-- more -->

<p>The focus of interest of these early researchers was a topic that had kept humanity busy for quite a while already, all over the world: the motion of heavenly bodies. The main motivation was making predictions for the near future. The configuration of the stars and planets was widely believed to have an impact on human affairs (a belief we call astrology today), so knowing them in advance was of obvious interest. They had astronomical observations at their disposal, but numbers alone are not sufficient to make predictions. You also need a model for extrapolating the numbers to the future.</p>

<p>The tool that Apollonius, Hipparchus, Ptolemy, and probably many others, developed and improved to near perfection was <a href="https://en.wikipedia.org/wiki/Deferent_and_epicycle" >epicycles</a>: a model for the orbit of a heavenly body consisting of a superposition of circles, with each circle's center moving along a bigger circle's circumference. Epicycles are similar in spirit to Fourier series. Any periodic orbit can be described as a superposition of circular motions. Given enough data, one can fit an epicycle model and make predictions. But since the epicycle model does not contain any physics, it doesn't come with any safeguards against mistakes. Epicycles can equally well describe real and completely unrealistic orbits, and therefore the quality of the data is very important.</p>

<p>Today's data science works much the same. Very general models, such as neural networks, are fitted to large datasets and then used to make predictions. Again the models contain very few assumptions about underlying laws of nature. They are by design very general (see e.g this <a href="http://neuralnetworksanddeeplearning.com/chap4.html" >visual proof</a> that neural networks can compute any function) in order to capture any kind of regularity in the input datasets. As for epicycles, data quality is important, which is why data scientist invest a significant effort into cleaning up the raw data they work on.</p>

<p>Aside from the obvious technological aspects and the associated change of scale in the size of datasets, the main improvement of today's data science on epicyle models for orbits is even more generality. Early astronomers had periodicity baked into their models from the start. Neural networks (and other models used in data science) could predict the motion of heavenly bodies with even less theoretical input. However, it is important to realize that every model imposes <em>some</em> a priori assumptions, even if, as in the case of neural networks, these assumptions are not fully understood and therefore not formalized. Seen in this light, the improvement of modern data science over epicycles is gradual rather than fundamental.</p>

<p>Adopting an historical perspective, data science turns out to mark the <em>beginning</em> of scientific disciplines rather than their refinement. It permits the very first step from raw observations to a description of regularities. Connecting these regularities to known more fundamental principles, or even discovering <em>new</em> fundamental principles as in the case of Newton's laws for celestial mechanics, can only happen afterwards.</p>

<p>Perhaps a more fundamental distinction than the one between experiment and theory (plus, according to some, simulation and data science) is the one between <em>data-driven</em> and <em>model-driven</em> science. Data-driven science starts from observations and searches for regularities using generic models. Model-driven science takes more advanced problem-specific models and aims at evaluating and improving their quality on one hand, and explore their consequences on the other hand. In terms of day-to-day research activities, data-driven science collects observations that promise to be interesting and uses statistical methods to interpret them. Model-driven science has theoreticians exploring models and experimentalists asking Nature specific questions arising from this exploration. The oldest and best-known scientific disciplines, i.e. physics and chemistry, are primarily model-driven today, which may contribute to the impression that data-driven science is new. As the epicycle example shows, this is really just a lack of historical perspective.</p>

<!-- Local Variables: -->

<!-- mode: markdown -->

<!-- End: -->
 ]]></description> </item><item> <title>Stability in the SciPy ecosystem: a summary of the discussion</title> <link>https://blog.khinsen.net/posts/2017/11/22/stability-in-the-scipy-ecosystem-a-summary-of-the-discussion.html</link> <pubDate>2017-11-22</pubDate> <author>Konrad Hinsen</author> <guid isPermaLink="true">https://blog.khinsen.net/posts/2017/11/22/stability-in-the-scipy-ecosystem-a-summary-of-the-discussion.html</guid> <category><![CDATA[ python ]]></category><category><![CDATA[ scientific computing ]]></category> <description><![CDATA[ <p>The <a href="http://blog.khinsen.net/posts/2017/11/16/a-plea-for-stability-in-the-scipy-ecosystem.html" >plea for stability in the SciPy ecosystem</a> that I posted last week on this blog has generated a lot of feedback, both as comments and in a lengthy <a href="https://twitter.com/khinsen/status/931192953636315137" >Twitter thread</a>. For the benefit of people discovering it late, here is a summary of the main arguments and my reply to them.</p>

<!-- more -->

<h2>Just freeze your code and it will be reproducible forever</h2>

<p>By far the most frequent argument against my claim that we need more stability in the SciPy ecosystem was that people can simply archive their code with all the dependencies (down to the Python language itself) in a way that lets others re-run it later for reproducibility. The most frequently proposed technical approaches were the <a href="https://conda.io/docs/" >conda</a> package manager and <a href="https://www.docker.com/" >Docker</a> containers.</p>

<p>There are three main reasons why this is not a sufficient solution:</p>

<ol>
<li><p>Freezing code is fine for archival reproducibility, as I mentioned in my original post. It is not sufficient for living code bases that people work on over decades. Computational biologist Luis Pedro Coelho has <a href="https://metarabbit.wordpress.com/2017/11/18/numpy-scipy-backwards-stability-debate-and-why-freezing-versions-is-not-the-solution/" >explained</a> this very well and I recommend everyone to read his short writeup. My situation is very much the same as his. On Twitter, astronomer Tuan Do has chimed in with a <a href="https://twitter.com/quantumpenguin/status/933123060822978560" >similar comment</a>.</p></li>
<li><p>The technical solutions proposed all depend on yet more infrastructure whose longevity is uncertain. For how long will a Docker container image produced in 2017 remain usable? For how long will conda and its repositories be supported, and for how long will the binaries in these repositories function on current platforms?</p></li>
<li><p>None of today's code freezing approaches comes with easy-to-use tooling and clear documentation that make it accessible to the average computational scientist. The technologies are today in a &quot;good for early adopters&quot; state. This means we cannot rely on them to preserve <em>today's</em> research even though they may well take on this role in the future.</p></li>
</ol>

<p>To illustrate point 3, let me introduce Alice and Bob, who are real scientists I know, except that I have changed the names. Alice is a chemist with a decent knowledge of Python and basic software engineering techniques (Software Carpentry level), which she eagerly applies because she cares about the quality of her work. Alice considers herself an experimentalist. She develops and maintains a Python codebase for interpreting certain types of experimental data, but software development is not the focus of her work. The code she writes is not public, because her boss doesn't want it to be. Worse, her code depends on a small library developed by a collaborator who doesn't even hand out source code. What Alice gets is pre-compiled shared libraries for the three platforms that matter to herself and to her users.</p>

<p>Bob is an experimental biologist who uses the same instruments as Alice and is happy that Alice has written nice software for interpreting the results. He gets that software, including the binary-only dependencies, by personal arrangements with the various people involved. Bob doesn't know much about Python, nor does he care. His software installation was mostly done by Alice during a one-afternoon meeting in which they worked together to reach a state he could work with. Ideally, he would like to never touch it again, but he also wants the new features that Alice adds from time to time.</p>

<p>To all those who replied &quot;just use conda&quot; or &quot;just use Docker&quot;, I recommend considering the situation of Alice and Bob. Do you really believe that conda or Docker are the right solution for them today? Could you point them to suitable documentation written at the right level? Both for building and for re-using frozen environments?</p>

<p>To prevent another round of misunderstanding, I am not saying that the situation of Alice and Bob would be perfect if only they could have a stable Python infrastructure. Research code should be open, for example, for many reasons including the possibility to upload it to various repositories. Fortunately, the attitudes towards software use in science are changing in the right direction, but this will take a lot of time, like all social change.</p>

<p>I also fully understand the point of view that the SciPy ecosystem is for advanced users who value methodological innovation, and that it cannot cater for the needs of Alice and Bob because of conflicting requirements and insufficient resources to deal with them. But then, as I said in my original post, please have the courage to say so openly and clearly. Every beginner-level tutorial for scientists should state during the first five minutes that you cannot expect stability and that you should either use Python only for throw-away code or else be sure you can assume maintenance. In other words, make sure that people like Alice have no false expectations. They can then look for other technology, or team up with like-minded people to maintain long-time-stable branches of SciPy, or try whatever else.</p>

<h2>Stability is an unrealistic expectation</h2>

<p>Another frequently expressed opinion was that it is unrealistic to expect the kind of stability I advocated in a modern software environment. This is a self-fulfilling prophecy: if you consider the goal impossible, you won't even try to achieve it. As I have pointed out, long-time stability is a reality in other ecosystems, built around languages such as Fortran or Java. A few people said that Fortran or Java are unfair comparisons, because they encourage very different approaches to dependency management. This is actually my point: you can have stability, but only if it's an explicit goal and if some effort is made to reach that goal. This includes finding suitable approaches to dependency management.</p>

<p>David Cournape made the <a href="https://twitter.com/cournape/status/849918989165842434" >interesting observation</a> that no technology less than 20 years old is better than Python in terms of stability. That rings true, in the sense that I cannot find a counterexample. But I see this as a statement about dominant attitudes in software engineering (way beyond scientific computing), not as a statement about technological constraints that would make stability fundamentally incompatible with other requirements. Software development today is dominated by short-lived technologies but also by short-lived applications. The application domains where stability is valued probably represent a much smaller part of the pie than 20 years ago. But then, this is just another illustration for what I wrote about recently: <a href="http://blog.khinsen.net/posts/2017/11/09/there-is-no-such-thing-as-software-development.html" >There is no such thing as software development</a> in the abstract, there is only domain-specific software development. The needs of scientific computing are clearly different from the needs of Silicon Valley startups. The conclusion is that the software development tools and practices should be different as well.</p>

<p>Finally, even within the somewhat tumultuous SciPy ecosystem, stability is not impossible. My own <a href="http://dirac.cnrs-orleans.fr/MMTK/" >MMTK</a> library has been around for 20 years, but in spite of continuous extensions and one API redesign (from version 1.x to version 2.x), I have never knowingly broken anyone's application code. With the end of support Python 2, I can unfortunately no longer maintain that policy.</p>

<h2>Everybody lacks resources for maintenance</h2>

<p>Many comments addressed the lack of human resources for developing and maintaining scientific software, and in particular infrastructure software like the core of the SciPy ecosystem. In combination with the fact that new developments are more attractive to most people than boring maintenance, and also more valued by the community, this leads to a culture favoring innovation over stability when most of the work is done by volunteers. This was best expressed by Peter Wang in a <a href="https://twitter.com/pwang/status/931386237193211904" >short sequence of tweets</a>.</p>

<p>This is indeed an important factor, and one whose importance transcends scientific computing and even science itself. If you look back at the history of civilization, or even at the history of life on earth, you can't fail to notice that all living organisms have invested the lion's share of their efforts into maintaining the <em>status quo</em>: staying alive, staying safe, maintaining an environment that ensures a certain quality of life, etc. In modern societies whose very survival depends on technology, infrastructure maintenance (roads, power grid, ...) has always been a priority of state administrations - until recently, that is. Today, we hear politicians and even intellectuals proclaim the importance of innovation and disruption, while basic infrastructure starts to rot for lack of maintenance.</p>

<p>I can only hope that the innovation and disruption fashion will die out before the societies that have fallen victim to this fashion will do so by natural selection. In the meantime, I propose that scientists try to resist as best as possible. The fact that infrastructure software such as NumPy does get funding is a good sign in my opinion. I believe we can also get funding for stability, if only we clearly state that we need it.</p>

<h2>Data supremacy</h2>

<p>Pierre de Buyl <a href="http://disq.us/p/1nveqdc" >reminded me</a> of an <a href="http://ieeexplore.ieee.org/abstract/document/6341744/" >article</a> I wrote five years ago, in which I proposed that data rather than software tools should be the focus of scientific computing because data is of longer-lasting scientific importance. As I have <a href="https://f1000research.com/articles/3-101/v2" >pointed out two years later</a>, that data includes scientific models (equations etc.), even though for technical reasons they are mostly embedded into software tools today (see <a href="http://sjscience.org/article?id=527" >here</a> for an idea for doing things differently).</p>

<p>In a world where all scientifically relevant information is stored in stable and well-defined open file formats, software tools can evolve much more freely without disturbing ongoing work or harming reproducibility. New versions of software tools would merely have to maintain the functionality of their predecessors, but not their implementation details. However, this is at best a promise for the future. We don't even have the basic technology to make this happen, nor a consensus that it would be a good idea, which would open up the possibility of getting funding towards that goal. We will therefore need stable software environments for many more years to come.</p>

<!-- Local Variables: -->

<!-- mode: markdown -->

<!-- End: -->
 ]]></description> </item><item> <title>A plea for stability in the SciPy ecosystem</title> <link>https://blog.khinsen.net/posts/2017/11/16/a-plea-for-stability-in-the-scipy-ecosystem.html</link> <pubDate>2017-11-16</pubDate> <author>Konrad Hinsen</author> <guid isPermaLink="true">https://blog.khinsen.net/posts/2017/11/16/a-plea-for-stability-in-the-scipy-ecosystem.html</guid> <category><![CDATA[ python ]]></category><category><![CDATA[ scientific computing ]]></category> <description><![CDATA[ <p>Two NumPy-related news items appeared on my Twitter feed yesterday, just a few days after I had accidentally started a <a href="https://twitter.com/khinsen/status/929014170749632513" >somewhat heated debate</a> myself concerning the poor reproducibility of Python-based computer-aided research. The first was the announcement <a href="https://github.com/numpy/numpy/blob/master/doc/neps/dropping-python2.7-proposal.rst" >of a plan for dropping support for Python 2</a>. The second was a pointer to a <a href="https://www.youtube.com/watch?v=fowHwlpGb34" >recent presentation by Nathaniel Smith</a> entitled &quot;Inside NumPy&quot; and dealing mainly with the NumPy team's plans for the near future. Lots of material to think about... and comment on.</p>

<!-- more -->

<p>The end of Python 2 support for NumPy didn't come as a surprise to anyone in the Python community. With Python 2 itself not being supported after 2020, it doesn't make any sense for Python-dependent software to continue support beyond that date. The detailed plan for the transition of NumPy to a Python-3-only package looks quite reasonable. Which doesn't mean that everything is fine. The disappearance of Python 2 will leave much scientific software orphaned, and many published results irreproducible. Yes, the big well-known packages of the SciPy ecosystem all work with Python 3 by now, but the same cannot be said for many domain-specific libraries that have a much smaller user and developer base, and much more limited resources. As an example, my own <a href="http://dirac.cnrs-orleans.fr/MMTK/" >Molecular Modelling Toolkit</a> (MMTK), which might well be the oldest domain-specific library of the SciPy ecosystem, will probably go away after 2020. Porting it to Python 3 is possible, of course, but an enormous effort (some details are in this <a href="https://twitter.com/khinsen/status/930749714567434240" >Twitter thread</a>) for which resources (funding plus competent staff) are very difficult to find.</p>

<p>Speaking purely from a computational science point of view, the Python 2-&gt;3 transition was a big mistake. While Python 3 does have some interesting new features for scientists, most of them could have been implemented in Python 2 as well, without breaking backward compatibility. There are, of course, good reasons for the modernization of the language. I am not saying that Guido van Rossum is an idiot - far from it. As popular as Python may be in today's scientific research, scientific users make up for a very small part of the total Python user base. Unfortunately, the need for long-term stability is rather specific to scientific users, and not even all of them require it (see e.g. <a href="https://twitter.com/ctitusbrown/status/929044554598137856" >these</a> <a href="https://twitter.com/ctitusbrown/status/929044751633936384" >two</a> tweets by Titus Brown). So while Python 3 is probably a step forward for most Python users, it's mostly a calamity for computational science.</p>

<p>Apart from the major earthquake caused by this change in the Python language itself, whose victims we will be able to count starting from 2020, the SciPy ecosystem has been subject to regular minor seismic activities by breaking changes in its foundational libraries, such as NumPy or matplotlib. I am not aware of any systematic study of their impact, but my personal anecdotal evidence (see e.g. this <a href="http://blog.khinsen.net/posts/2017/04/06/reproducible-research-in-the-python-ecosystem-a-reality-check.html" >report</a>) suggests that a Python script can be expected to work for two to three years, but not for five or more. Older scripts will either crash, which is a nuisance, or produce different results, which is much worse because the problem may well go unnoticed.</p>

<p>In my corner of science, biomolecular simulation, the time scale of methodological progress is decades. This doesn't mean that nothing exciting happens in shorter time spans. It just means that methods and techniques, including software, remain relevant for one to three decades. It isn't even uncommon for a single research project to extend over several years. As an example, I just edited a script whose last modification date was December 2015. It's part of collaborative project involving methodological development and application work in both experiment and theory. The back-and-forth exchanges between experimentalists and theoreticians take a lot of time. In the course of such projects, I update software and even change computers. If infrastructure updates break my code in progress, that's a major productivity loss.</p>

<p>Beyond personal productivity considerations, breaking changes are a threat to the reproducibility of scientific studies, an aspect that has been gaining more and more attention recently because so many published results were found to be non-reproducible or erroneous (note that these are very different things, but that's not my topic for today), with software taking a big share of the responsibility. The two main issues are: (1) non-reproducible results cannot be trusted, because nobody really knows how they were obtained and (2) code whose results are non-reproducible is not a reliable basis for further work (Newton's famous &quot;standing on the shoulders of giants&quot;). Many researchers, myself included, are advocating better practices to ensure computational reproducibility. In view of the seismic activities outlined above, I have been  wondering for a while whether I should add &quot;don't use Python&quot; to my list of recommendations. What's holding me back is mainly the lack of any decent alternative to today's SciPy ecosystem.</p>

<p>Watching <a href="https://www.youtube.com/watch?v=fowHwlpGb34" >Nathaniel's BIDS talk</a>, I was rather disappointed that these issues were not treated at all. There is a general discussion of &quot;change&quot;, including a short reference to breaking changes and their impact on downstream projects, which suggests that there has been some debate of these questions in the NumPy community (note that I am no longer following the <a href="https://mail.scipy.org/mailman/listinfo/numpy-discussion" >NumPy discussion</a> mailing list for lack of time). However, assuming that Nathaniel's summary is representative of that debate, neither reproducibility nor the requirements of the different software layers in scientific computing seem to have received the attention they deserve.</p>

<p>I have written before about <a href="http://blog.khinsen.net/posts/2017/01/13/sustainable-software-and-reproducible-research-dealing-with-software-collapse.html" >software layers</a> and the <a href="http://blog.khinsen.net/posts/2015/11/09/the-lifecycle-of-digital-scientific-knowledge.html" >lifecycle of digital scientific knowledge</a>, so I will just give a summary here. A scientific software stack looks like this:</p>

<ul>
<li>Layer 4: project-specific code</li>
<li>Layer 3: domain-specific libraries</li>
<li>Layer 2: scientific infrastructure</li>
<li>Layer 1: non-scientific infrastructure</li>
</ul>

<p>In the SciPy universe, we have Python in layer 1, NumPy and friends in layer 2, lots of lesser-known libraries (including my <a href="http://dirac.cnrs-orleans.fr/MMTK/" >MMTK</a> mentioned above) in layer 3, and application scripts and notebooks in layer 4.</p>

<p>A breaking change in any layer affects everything in the layers above. The authors of the affected higher-level code have three options:</p>

<ol>
<li>adapt their code (maintenance)</li>
<li>freeze their code (describe the stack they actually used)</li>
<li>do nothing</li>
</ol>

<p>The first choice is of course the ideal case but it requires serious development resources. With the second one, archival reproducibility is guaranteed, i.e. a reader knows under which conditions the code can be used and trusted, and how these conditions can be recreated. But frozen code is not a good basis for further work. Using it requires much work for re-creating an outdated environment. Worse, using two or more of such packages together is in general impossible because each one has different dependency version requirements. Finally, the third option leaves the code in a limbo state where it isn't even clear under which conditions it can be expected to work. In a research context, this ought to be considered unacceptable.</p>

<p>Let's consider now how these three choices are applied in practice, for each layer in the software stack. Software in layers 1 and 2 must obviously be maintained, otherwise people would quickly abandon it. Fortunately these layers also suffer the least from collapse, because there is less code below them. Layer 3 code gets more or less well maintained, depending on the size of the communities supporting it, and on the development resources available. Quite often, maintenance is sub-optimal for lack of resources, with the maintainers aware of the problem but unable to do a better job. That's my situation with MMTK.</p>

<p>Layer 4 code is the focus of the reproducible research movement. Today, most of this code is still not published, and of the small part that does get out, a large part is neither maintained nor frozen but simply dumped to a repository. In fact, the best practices recommended for reproducible research can be summarized as &quot;freeze and publish layer 4 code&quot;. Maintaining layer 4 code has been proposed (see e.g. <a href="https://www.biorxiv.org/content/early/2016/08/11/056473" >continuous analysis</a>  ), but it is unclear if the idea will find acceptance. The obvious open question is who should do the maintenance. Considering that most research is done by people who spend a few years in a lab and then move on, it's difficult to assign the responsibility for maintenance to the original authors of the code. But anyone else is less competent, less motivated, and would likely expect to be payed for doing a service job.</p>

<p>An argument I hear frequently in the SciPy community (and elsewhere) is that scientific code that is not actively used and maintained isn't worth bothering with (see e.g. <a href="https://twitter.com/ctitusbrown/status/929045580789161984" >this tweet by Titus Brown</a>). The implication is that breaking changes in the infrastructure layers are OK and must be absorbed by the maintainers of layers 3 and 4. In view of what I just said about layer 4, it should be obvious that I don't agree at all with this point of view. But even concerning layer 3, I find it a bit arrogant. The message to research communities with weaker code development traditions, and thus fewer resources, is that their work doesn't matter.</p>

<p>I would like to see the SciPy community define its point of view on these issues openly and clearly. We all know that development resources are scarce, that not everything that's desirable can be done. The real world requires compromises and priorities. But these compromises and priorities need to be discussed and communicated openly. It's OK to say that the community's priority is developing new features and that this leaves no resources for considering stability. But then please say openly and clearly that SciPy is a community for coding-intensive research and that people who don't have the resources to adapt to breaking changes should look elsewhere. Say openly and clearly that reproducibility beyond a two-year timescale is not the SciPy community's business, and that those who have such needs should look elsewhere. Or else, decide that SciPy is inclusive and caters for all computer-aided research - and draw the conclusion that stability must take a larger weight in future development decisions.</p>

<p>What is not OK is what I perceive as the dominant attitude today: sell SciPy as a great easy-to-use tool for all scientists, and then, when people get bitten by breaking changes, tell them that it's their fault for not having a solid maintenance plan for their code.</p>

<p>Finally, in anticipation of an argument that I expect to see, let me stress that this is not a technical issue. Computing technology moves at a fast pace, but that doesn't mean that lack of stability is a fatality. My <a href="https://github.com/khinsen/hydrolib" >last Fortran code</a>, published in 1994, still works without changing a single line. Banks have been running Cobol code unchanged for decades. Today's Java implementations will run the very first Java code from 1995 without changes, and even much faster thanks to JIT technology. This last example also shows that stability is not in contradiction with progress. You can have both if that's a design goal. It's all a matter of policy, not technology.</p>

<p><strong>Note added 2017-11-22:</strong> see also my <a href="http://blog.khinsen.net/posts/2017/11/22/stability-in-the-scipy-ecosystem-a-summary-of-the-discussion.html" >summary of the discussion</a> in reaction to this post.</p>

<h3>Comments retrieved from Disqus</h3>

<ul>
<li><i>xoviat:</i><p>Honestly if you actually want MMTK to be ported to Python 3, the least you can do is sign up for a GitHub account and upload the code to a repository. Right now, it's definitely not going to be ported because no one can look at the code.</p><ul>
<li><i>Konrad Hinsen:</i><p>It has been on Bitbucket for a couple of years:</p><p>  <a href="https://bitbucket.org/khinsen/mmtk" rel="nofollow noopener" title="https://bitbucket.org/khinsen/mmtk">https://bitbucket.org/khins...</a></p><p>Releases have been on SourceSup, where they have always been among the top-ten downloads:</p><p>  <a href="https://sourcesup.renater.fr/projects/mmtk/" rel="nofollow noopener" title="https://sourcesup.renater.fr/projects/mmtk/">https://sourcesup.renater.f...</a></p><p></p></li>
</ul>
</li>
<li><i>Luis Pedro Coelho:</i><p>Long form follow-up: <a href="https://metarabbit.wordpress.com/2017/11/18/numpy-scipy-backwards-stability-debate-and-why-freezing-versions-is-not-the-solution/" rel="nofollow noopener" title="https://metarabbit.wordpress.com/2017/11/18/numpy-scipy-backwards-stability-debate-and-why-freezing-versions-is-not-the-solution/">https://metarabbit.wordpres...</a></p></li>
<li><i>bastibe:</i><p>You can always install old versions of Python and packages using "pip install scipy==0.9.0". Old versions are not going away. If you need stability, this seems to be an easy option. Am I missing something?</p><ul>
<li><i>Konrad Hinsen:</i><p>Many people have made this suggestion. In theory it works, as long as all dependencies are in PyPI. C library dependencies are often a problem. But the main issue is that you cannot suppose that everyone (all program authors and users) know exactly what to do and do it correctly. In practice, the approach you describe almost never works because some information is missing. To make it practical, we'd need easy-to-use tooling for all phases: producing a complete list of versioned dependencies (including C libraries), verifying the completeness of this list, and restoring the environment on a different machine. All that with simple tools that everybody can figure out how to use on all platforms.</p><p>People are working on this, and I am optimistic that we will get there, but for a few more years we will have to live with the current state. Which is why stability still matters for reproducibility.</p><p>In addition, stability will always matter for slow-moving science, where you need to combine ten-year-old and two-year-old libraries in a single program.<br></p><ul>
<li><i>Syndafloden:</i><p>If you want a completely reproduceable case, you'll likely need to package it with the specific runtimes or dependencies -- Which shouldn't be very hard at all, with, say, a Nanobox solution or something similar.</p><p>You usually want that either way, regardless of use-case, language or environment.</p><ul>
<li><i>Robert Jamie Munro:</i><p>Python is really terrible here compared to, for example, node/npm or even Java / Maven. There's even an XKCD comic about it: <a href="https://m.xkcd.com/1987/" rel="nofollow noopener" title="https://m.xkcd.com/1987/">https://m.xkcd.com/1987/</a></p><ul>
<li><i>Justin Black:</i><p>So this is an operating system specific solution, but one could use a docker image with versioned binaries, and pinned python packages using requirements.txt<br>That way, the image has everything you need in it.</p></li>
</ul>
</li>
</ul>
</li>
</ul>
</li>
</ul>
</li>
<li><i>NPoisson:</i><p>Hmm. I will tend to consider that a numerical work should be distributed as a git repository with freezed source code. Even better, new tech allow to freeze the software stack if it's not too much hardware dependent.</p><p>Aka, a pip requirements with proper versioning + a Dockerfile should be able to provide a freezed ecosystem and allow good reproducibility. Of course, these tech are new and not known for their stability... for now. But I think that it will be an important part of the scentific stack : you define well your OS and software needs, you provide your source in a well documented way and you distribute both with your publication.</p><ul>
<li><i>Konrad Hinsen:</i><p>Many people are working on various solutions for freezing, mostly at a level below Python/SciPy and thus generic. I am rather optimistic that this will work out fine ultimately, although my personal bet is not specifically on Docker. However, it will take a long time to come up with a reliable and stable solution and then develop good tooling to make it easy to use.</p><p>This is in fact what I call "archival reproducibility" in my post. It's an important step, but not a replacement for stable infrastructure.</p></li>
</ul>
</li>
<li><i>gerritholl:</i><p>Scipy moved to version 1.0 three weeks ago (<a href="https://github.com/scipy/scipy/tree/v1.0.0" rel="nofollow noopener" title="https://github.com/scipy/scipy/tree/v1.0.0">https://github.com/scipy/sc...</a> ), after 16 years of development.  Within those 16 years, many what you call layer-3 and layer-4 code has been built on top of scipy, in the full knowledge the API was not stable yet, as the 0.x version number indicated.  The bump to version 1.0 suggests the API should be more stable from now on, which hopefully will be the case.</p><p>I agree that communication is key.  If you want to build code that will run unchanged for 20 years, relying on a library that is in version `0.x` is probably not a good idea, unless you freeze the version and bundle it along.  When scientific software is in beta, as scipy effectively was until three weeks ago, the API *should* be able to change.  But 16 years to go from initial release to initial stable release, as scipy did, is very long.</p><ul>
<li><i>Konrad Hinsen:</i><p>I fully agree, though I'd recommend more explicit communication than just a version number. Non-developers are often not familiar with version number conventions.</p><p>I have no personal experience with scipy stability because I have always avoided scipy except for ephemeral experimentations. The reason is the difficult installation procedure, for which I didn't want to do technical support to the users of my own code.</p><ul>
<li><i>stefanvdwalt:</i><p>With the arrival of binary wheels, hopefully this is now a non-issue.</p><ul>
<li><i>Konrad Hinsen:</i><p>It's indeed much less of an issue. The remaining difficult situation is HPC systems (clusters, supercomputers) with severe Internet access restrictions that render pip non-operational. While downloading wheels on a different machine is possible in principle, few people know it's possible and fewer know how to do it. In practice, people install from source code on those machines.</p><ul>
<li><i>Nathaniel J. Smith:</i><p>Surely if you can get the source code onto the machine, then you can also get a wheel onto it? It's literally exactly the same process, except you click on the '.whl' link instead of the '.tar.gz' link. Actually, downloading wheels is easier, because you can type 'pip wheel &lt;package name="" or="" source="" tree=""&gt;' and it will automatically download the whole transitive dependency tree as wheels, which you can then rsync over or copy onto a USB stick or whatever the magic transfer system is.</p><p>I understand that not everyone may not realize this, but rewriting every scipy feature inside every package seems like a lot more work than explaining how to download wheels :-).</p><ul>
<li><i>Konrad Hinsen:</i><p>You are right that all the technology is there. As so often, the remaining big issue is making sure that everybody who has the problem can find the solution in a reasonable amount of time.</p><p>BTW, the alternative to using scipy is not rewriting all its features, but rewriting, or finding in a smaller dependency, the one or two features that a given application needs. And that is sometimes easier than dealing with your users' installation questions, in my experience.</p></li>
</ul>
</li>
</ul>
</li>
</ul>
</li>
</ul>
</li>
</ul>
</li>
<li><i>Luis Pedro Coelho:</i><p>+1 on this.</p><p>I find that often this discussion devolves into a binary "let's be like the kernel: stable APIs forever" vs "let's move fast and break things", but I would be happy with "let's break thing if we must, but try hard to avoid breaking other people's code when there is an obvious alternative".</p><p>The python2/3 transition is annoying (and py3 was an avoidable mistake), but I think that numpy/scipy changing their interfaces without any regard for backwards compatibility is much worse. For example, scipy.stats.mannwhitneyu has had at least 3 different behaviours in as many years without a lot of discussion of the possible effects on people's code. I almost published wrong results because of this particular change.</p><p>Histogram() changes has also caused me problems (for a while, people would email me every few months about not being able to reproduce my paper because numpy broke the code [<a href="https://metarabbit.wordpress.com/2013/09/23/to-reproduce-the-paper-you-cannot-use-the-code-we-used-for-the-paper/" rel="nofollow noopener" title="https://metarabbit.wordpress.com/2013/09/23/to-reproduce-the-paper-you-cannot-use-the-code-we-used-for-the-paper/">https://metarabbit.wordpres...</a>].</p><p>I once filed what I thought was an obvious bugfix (make the code follow the documented API instead of changing it for one high profile project) and had to argue for it: <a href="https://github.com/numpy/numpy/pull/2780" rel="nofollow noopener" title="https://github.com/numpy/numpy/pull/2780">https://github.com/numpy/nu...</a> Again, they broke my code for absolutely no good reason.</p><ul>
<li><i>Nathaniel J. Smith:</i><p>I clicked through your links because I sympathize with your frustration, and wanted to see what we did wrong in case it's something we can handle better in the future. I'm still not sure what your issue with histogram was -- the link in that paragraph leads to a blog post that doesn't have any more details either. But I did read through PR #2780, which is linked both from that blog post and the bottom of your comment.</p><p>I have to say, I found this extremely frustrating. The change that broke your code wasn't for "no good reason" or "aesthetic grounds" (as you describe it in the linked blog post) -- it was made because the 1.7 release broke Theano, and they submitted a fix to un-break it. I.e., your evidence that we don't care about backwards compatibility is that we  *made a backwards compatibility fix*. In the process, we did accidentally break your code -- sorry about that. The patch was reviewed, but at the time no-one realized that it could cause compatibility breakage. (I'm still not entirely clear on why that happened -- I think it has to do with ways in which C++ is stricter than C? Nonetheless, it obviously did. Again, I apologize for this part.) Once you submitted your PR and alerted us of the problem, we confirmed with Theano that your fix wasn't going to break their code again, and then we merged it and backported it to the stable release branch. This all happened within 12 hours, and I posted the first reply – which linked to the previous context explaining why the change was made, and started the process of checking with Theano – 6 minutes after your original submission, at 3am my time.</p><p>It's true that everyone mostly ignored your argument about the documentation. This is for two reasons: first,  when documentation and code disagree, the default is to change the documentation. This is mandatory if you care about backwards compatibility -- in fact it follows directly from the rule that you cited at the beginning of your post. Changing the code might break users, and changing the documentation is an "obvious alternative" that doesn't risk breaking users. So everyone was focused on the breakage, not the documentation. And second, it didn't even matter anyway – we were already in the process of fixing the problem, so we focused on that instead of getting into a tangential discussion about engineering principles.</p><p>All in all, I'm shocked that *this* is your example you use to go around sneering about how we're a bunch of terrible engineers who don't care about our users. You should feel ashamed of yourself.</p><p>We've certainly made mistakes, and doubtless will continue to do so in the future. NumPy's a complex  project, maintained by a small handful of volunteers, who are trying to support millions of users with contradictory requirements –  inevitably we do mess up. When we do, we know it causes real harm to our users, and we try to do better. But at least acknowledge that we're trying. Geez.</p></li>
<li><i>stefanvdwalt:</i><p>While there may be isolated cases that have been badly handled, the general approach is to be conservative with API changes unless there is a significant benefit (e.g., clarity, or additional usage possibilities).  Many libraries in the SciPy ecosystem follow a three-release deprecation cycle, which means in practice that if you run your code once a year, you will at least see warnings that indicate what needs to be changed.  The expectation that libraries should *never* change APIs is unreasonable; for papers you should consider either specifying the version or NumPy, or publish the code in a location where you have the ability to change it later.  Your comment seems to suggest that the NumPy and SciPy developers do not care about backward compatibility, which I don't think is an accurate reflection.</p><ul>
<li><i>Luis Pedro Coelho:</i><p>As I wrote, I don't think that the choice is a binary one between "never change the API" like the kernel and changing it at will.</p><p>"in practice that if you run your code once a year, you will at least see warnings that indicate what needs to be changed"</p><p>This is only true if I run my code once a year with the (at the time) most up to date version; not true otherwise. Also, sometimes I want to retrieve code that I used 2 years ago in another project and I would rather have an expectation that it works.</p><p>"While there may be isolated cases that have been badly handled, the general approach is to be conservative with API changes unless there is a significant benefit (e.g., clarity, or additional usage possibilities)."</p><p>This is exactly our disagreement. I don't think that "clarity or additional usage possibilities" is anywhere close to something that would justify breaking backwards compatibility for a foundational project like numpy or scipy.</p><p>Add new functions while deprecating the older ones. Most new functionality can be done with new functions or even just new arguments. This way, you improve the API and evolve it. After a few years, remove old functions. But changing the behaviour of working code in 3 release cycles (18 months) is not what I'd consider conservative, it's rather on the "move fast and break things" side of the scale. For more cutting edge projects, that could be OK, even expected, but numpy/scipy should be more like infrastructure.</p><p>I won't even ask for something like semantic versioning (where there would be a commitment to supporting the APIs for duration of a major release), but 18 months is way too short for a project like numpy, especially for changes that silently change results. And if I report a change to a documented API that caused code to stop compiling, it should be treated it as a bona fides bug (and not a discussion of which API is best).</p></li>
</ul>
</li>
</ul>
</li>
<li><i>Pierre de Buyl:</i><p>Hi Konrad, interesting read!</p><p>In the direction of "mitigation" of these issues, your other idea that data is more important than code (Hinsen 2012, CISE). Whether you maintain, freeze, or ignore, the availability of reference data allows future "you" or future "someone else" to perform at least a comparison test.</p><ul>
<li><i>Konrad Hinsen:</i><p>Yes, data in open, documented, and software-independent formats is a big plus for longevity. My own MMTK is a bad example there, because it uses a trajectory format that includes executable Python code, making it very hard to process from other languages.  I have repented and defined a more open and language-neutral format (MOSAIC, <a href="https://mosaic-data-model.github.io/)" rel="nofollow noopener" title="https://mosaic-data-model.github.io/)">https://mosaic-data-model.g...</a>.</p><p>Unfortunately, data supremacy is almost as hard to sell as stable software!  </p></li>
</ul>
</li>
<li><i>:</i></li>
<li><i>Nathan Goldbaum:</i><p>NumPy LTS will continue to be available on Python2 and MMTK will continue to be able to be built with it.</p><ul>
<li><i>Konrad Hinsen:</i><p>Indeed, but very soon Python 2 will have to be banned into some sandbox because security bugs are no longer fixed. It's good that NumPy LTS will remain available for frozen code, but it's not sufficient to keep code alive and useful.</p></li>
</ul>
</li>
<li><i>jsierles:</i><p>Freezing the stack may end up being the only real solution as dependencies trees grow in complexity. If this were easy to do, and long term reproducibility could be guaranteed, would you accept it as a solution?</p><ul>
<li><i>Konrad Hinsen:</i><p>As I wrote in my post, it's a partial solution, OK from a reproducibility point of view, but insufficient for long-running projects, or for taking up old projects again. For that, I need to be able to use ten-year-old and two-year-old libraries together from the same script.</p><ul>
<li><i>jsierles:</i><p>I won't argue that long term support is important at a library level. However, it seems unrealistic in modern software environments to expect it. Rather, I think we need to look towards new ways of WRITING code, and of defining dependencies. For example, if you could split your script into sections, each using a different dependency tree for each, but passing values between them outside the runtime, you could avoid a lot of typical problems with dependency hell. Also, tools like Guix (which you've written about) help solve the underlying dependency graph problem in a manageable way. I've seen some success with this approach.</p><p>I agree this is not a 'technical' issue, but also think there are more solutions available than are made obvious at this level of discussion. Would love to see some actual code and see how we could specific problems!</p><ul>
<li><i>Konrad Hinsen:</i><p>There are various *possible* technical solutions, and more are being worked on. But today, we have no solution that works in practice, meaning that it is sufficiently simple on all major platforms that the majority of scientists can work with it. Which is why for now, and a few more years to come, breaking changes in infrastructure are a danger for reproducibility.</p><p>BTW, labelling a potential solution as "unrealistic" is a major contribution to the problem itself. As I pointed out with the examples of the Fortran, COBOL, and Java ecosystems, stability is possible not only in theory but also in practice, under the condition that everyone keeps it in mind during design and development. In a community where most people consider stability unrealistic, it cannot happen.</p><ul>
<li><i>jsierles:</i><p>I completely agree that the label contributes to the problem.  And that some call for stability is justified in any heavily used software project. However, in the case of Python, and other languages like Javascript, the issues run deeper than the label. Down to how packaging systems work and the language designers goals when making changes. Stability? Simplicity? Programmer happiness? It's truly hard to reconcile these, and less and less so as more languages enter the space. So I don't see adopting stability as something necessarily easier or faster to do than exploring other solutions that can apply to a wider range of problems.</p><p>Furthermore, I see that technical solutions are equally unfairly labeled as unrealistic because of an unquantifiable cost of adoption. The result is that we see a lot of talk about reproducibility that boil down to a lengthy laundry list of best practices, i.e. (<a href="http://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1003285)" rel="nofollow noopener" title="http://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1003285)">http://journals.plos.org/pl...</a>.</p><p>Instead, as technologists, I think we are responsible to build better tools and more creative solutions to the problem.</p><ul>
<li><i>Konrad Hinsen:</i><p>I pretty much agree with all that. And I would definitely encourage technologists to continue looking for better solutions. The one mistake not to make is to declare victory when a proof of concept has been achieved. That's just the beginning of the next episode: convince enough early adopters that communities like Software Carpentry will add the new technology to their courses.</p></li>
</ul>
</li>
</ul>
</li>
</ul>
</li>
</ul>
</li>
</ul>
</li>
</ul>

<!-- Local Variables: -->

<!-- mode: markdown -->

<!-- End: -->
 ]]></description> </item><item> <title>There is no such thing as software development</title> <link>https://blog.khinsen.net/posts/2017/11/09/there-is-no-such-thing-as-software-development.html</link> <pubDate>2017-11-09</pubDate> <author>Konrad Hinsen</author> <guid isPermaLink="true">https://blog.khinsen.net/posts/2017/11/09/there-is-no-such-thing-as-software-development.html</guid> <category><![CDATA[ scientific software ]]></category> <description><![CDATA[ <p>It's hard to find an aspect of modern life that is not influenced in some way by software. Some of it is very visible, for example the Web browser I start on my computer. Other software is completely invisible, such as the software controlling my car's diesel engine. Some software is safety critical, for example flight control software in airplanes. Other software is used in a much more futile way, such as playing games. I could go on listing characteristics in which different software packages differ, but I will leave it at that - I don't really expect anyone to disagree about the ubiquity and diversity of software in our increasingly digital world.</p>

<!-- more -->

<p>Given this diversity, it is surprising how many seem to consider &quot;software development&quot;, and related terms such as &quot;software engineering&quot;, as general concepts requiring no further
qualification. In particular, plenty of people are happy to discuss in an abstract way how software should best be developed, without any reference to a concrete application domain, project size, expected longevity, etc. Imagine we did the same for the world of atoms, lumping together activities as distinct as chemical synthesis, carpentry, and dental surgery under the label &quot;matter manipulation&quot;, and starting a discussion about best practices for matter manipulation. I doubt anyone would take such a debate seriously.</p>

<p>A good example of such an overly abstract discussion is the one about the benefits of static typing. There is a large camp of static typing enthusiasts who claim that static typing is Right with a capital R. They argue that it's always better to have correctness guarantees than not to have them. The implicit assumption is that static typing comes at no cost, which is manifestly false. The main contributions to this cost are 1) additional cognitive load, 2) the need to work around the limitations of a type checker, and 3) additional <a href="http://blog.khinsen.net/posts/2016/03/04/composition-is-the-root-of-all-evil.html" >barriers</a> to the combination of independently developed libraries. As soon as one admits the necessity of a cost-benefit analysis for static typing, it quickly becomes obvious that this can only be done for 1) some specific category of software and 2) a specific type system. The question then becomes: is type system A useful for improving the quality of software in application domain X? A nice example of this point of view is given by Rich Hickey in his <a href="https://www.youtube.com/watch?v=2V1FtfBDsLU" >keynote on &quot;Effective Programs&quot;</a>, where he explains why none of the well-known type systems are useful for the kind of software he writes, leading to his decision to design <a href="http://clojure.org/" >Clojure</a> as a dynamically typed language.</p>

<p>Focusing software development questions on specific software categories has many potential benefits. Perhaps most importantly, it permits formulating questions in a precise enough way to make them amenable to empirical verification (aka &quot;the scientific method&quot;), acting at the same time as a safeguard against overly generalizing the conclusions from empirical studies. Moreover, the study of specific use cases is likely to lead to improvements in the methodology. In my example of static typing, it can be expected that once type system designers adopt the habit of thinking about specific software categories, they will design and evaluate type systems for various important application domains, taking into account both the kind of data being processed and the kinds of mistakes one would like to protect oneself against. Even better, once type system designers recognize that there is no single type system to rule them all, they might start to think about how to combine pieces of software written using different type systems. In the end, the three cost factors I mentioned might all end up heavily reduced.</p>

<p>Since there is a chance that some type system designers are reading this, I'll profit from having their attention and suggest developing a type system for numerical computations, which by some strange coincidence is what I do in my own work. In this application domain, most data represents physical quantities and its low-level representation is &quot;float&quot; or &quot;array of floats&quot;. Properties that one could usefully monitor in the course of type checking are dimensions and units, but also positivity or non-zeroness. For array operations, the compatibility of array dimensions is worth a check as well. A static proof of complete absence of such mistakes is probably not doable, but detecting as many mistakes as possible while inserting run-time checks for the rest is probably a very useful compromise. It is also worth considering some important sub-categories of numerical software, in particular the different layers of the scientific software stack that I have <a href="http://blog.khinsen.net/posts/2017/01/13/sustainable-software-and-reproducible-research-dealing-with-software-collapse.html" >described before</a>. The required guarantees are much higher for infrastructure software (layer 2) than for scripts and workflows (layer 4), and infrastructure developers can be expected to invest more effort to ensure correctness. However, this does raise the question of type-checking at the interface between layers, a possible solution being <a href="https://en.wikipedia.org/wiki/Gradual_typing" >gradual typing</a>.</p>

<p>Static typing is merely one example for the importance of looking at specific software application domains, there are many others. The utility of paradigms such as object-oriented or functional programming is also mostly discussed in the abstract, as are the relative merits of development strategies like test-driven or agile development. Finally, some less discussed but practically important questions could get more limelight exposure if formulated more concretely in the context of specific applications. I am thinking for example of the choice between using external libraries and writing one's own code, involving the trade-off between development effort and the long-term risk of uncontrollable dependencies.</p>

<h3>Comments retrieved from Disqus</h3>

<ul>
<li><i>Thomas Arildsen:</i><p>I think you raise some very important points here. It is similar in spirit to what I usually  spend a substantial amount of time trying to to convince students in my courses: The choice between fast, compiled, "low-level" languages (such as C) and slower, interpreted, "high-level" languages (such as Python) is not one language to rule them all. It depends highly on how much time/cost you are willing to spend on developing the program vs how much it is actually going to be used after completion. In the case of custom scientific computing software, I find Python or similar languages is what makes sense.<br>Also, I find it very relevant that you point out how data types in numerical computing applications are not simply a case of int vs float. In fact, this is what two PhD students in a recent research project I was involved in tried to solve in this way: <a href="http://vbn.aau.dk/en/publications/validating-function-arguments-in-python-signal-processing-applications(de3f2d32-2305-4b77-88ab-0f004cbdf613).html" rel="nofollow noopener" title="http://vbn.aau.dk/en/publications/validating-function-arguments-in-python-signal-processing-applications(de3f2d32-2305-4b77-88ab-0f004cbdf613).html">http://vbn.aau.dk/en/public...</a> &amp; <a href="http://magni.readthedocs.io/en/latest/magni.utils.validation.html" rel="nofollow noopener" title="http://magni.readthedocs.io/en/latest/magni.utils.validation.html">http://magni.readthedocs.io...</a>. The idea is to do run-time detailed numerical type-checking of function arguments using decorators in Python.</p><ul>
<li><i>Konrad Hinsen:</i><p>Thanks for your comment!</p><p>You point out another tradeoff, language choice, that very much depends on what your software is actually supposed to do. I didn't mention this example because I rarely see language choice discussed abstractly, although it certainly happens.</p><p>It's good to see we agree on the importance of unit checking :-) If it's so rarely done in practice, that's because it is not well supported. For Python, your approach of run-time checking is very appropriate, but people who turn to Fortran or C for speed would expect compile-time checks with no run-time overhead. There is actually a tool (not so well known for now) that does static unit checking for Fortran (<a href="https://camfort.github.io/)" rel="nofollow noopener" title="https://camfort.github.io/)">https://camfort.github.io/)</a> and for C++ it can be done via template metaprogramming (<a href="http://www.boost.org/doc/libs/1_65_0/doc/html/boost_units.html)" rel="nofollow noopener" title="http://www.boost.org/doc/libs/1_65_0/doc/html/boost_units.html)">http://www.boost.org/doc/li...</a>. Microsoft's F# language has dimensional analysis as a built-in feature, as does Frink (<a href="https://frinklang.org/)" rel="nofollow noopener" title="https://frinklang.org/)">https://frinklang.org/)</a>. But I am not aware of any language with a general-purpose type system that would allow the implementation of dimensional analysis. If anyone does, I'd appreciate a pointer.</p><ul>
<li><i>Franklin Chen:</i><p>General-purpose languages like Haskell have type systems that enable building your own dimensional analysis system if you want. One example mature library contributed to the community is <a href="https://hackage.haskell.org/package/dimensional" rel="nofollow noopener" title="https://hackage.haskell.org/package/dimensional">https://hackage.haskell.org...</a></p><ul>
<li><i>Konrad Hinsen:</i><p>Thanks for the pointer! That library looks interesting, though I don't see how exactly it works, given that I have never heard of data kinds and type families before. But I can see from the source code that it does standard dimensional analysis, that it does the checking at compile time, which is the basic list of requirements. What I don't see is how it handles the well-known tricky cases such as making both Hz and Bq compatible with 1/s but not with each other.</p><ul>
<li><i>Franklin Chen:</i><p>Unfortunately, in `dimensional`, currently Hz and Bq are not kept different at all, actually. I see that although the types look different</p><p>```<br>hertz :: Num a =&gt; Unit Metric DFrequency a<br>becquerel :: Num a =&gt; Unit Metric DActivity a<br>```<br>in fact</p><p>DActivity is just an alias to DFrequency rather than a different type. I've submitted an issue at <a href="https://github.com/bjornbm/dimensional/issues/188" rel="nofollow noopener" title="https://github.com/bjornbm/dimensional/issues/188">https://github.com/bjornbm/...</a></p><ul>
<li><i>Konrad Hinsen:</i><p>And I have added a comment to prevent the authors from believing that there is a simple fix. Doing this correctly is probably a research project. But I hope somebody will go for it!</p></li>
</ul>
</li>
</ul>
</li>
</ul>
</li>
</ul>
</li>
</ul>
</li>
</ul>

<!-- Local Variables: -->

<!-- mode: markdown -->

<!-- End: -->
 ]]></description> </item><item> <title>Why Python does so well in scientific computing</title> <link>https://blog.khinsen.net/posts/2017/09/12/why-python-does-so-well-in-scientific-computing.html</link> <pubDate>2017-09-12</pubDate> <author>Konrad Hinsen</author> <guid isPermaLink="true">https://blog.khinsen.net/posts/2017/09/12/why-python-does-so-well-in-scientific-computing.html</guid> <category><![CDATA[ python ]]></category><category><![CDATA[ scientific computing ]]></category> <description><![CDATA[ <p>A few days ago, I noticed this tweet in my timeline:</p>

<blockquote class="twitter-tweet" data-lang="en"><p lang="en" dir="ltr">I &#39;still&#39; program in C. Why? Hint: it&#39;s not about performance. I wrote an essay to elaborate... appearing at Onward! <a href="https://t.co/pzxjfvUs5B">https://t.co/pzxjfvUs5B</a></p>&mdash; Stephen Kell (@stephenrkell) <a href="https://twitter.com/stephenrkell/status/905126286762356736">September 5, 2017</a></blockquote>

<script async src="//platform.twitter.com/widgets.js" charset="utf-8"></script>

<p>That sounded like a good read for the weekend, which it was. The main argument the author makes is that C remains unsurpassed as a system integration language, because it permits interfacing with &quot;alien&quot; code, i.e. code written independently and perhaps even in different languages, down to assembly. In fact, C is one of the few programming languages that lets you deal with whatever data at the byte level. Most more &quot;modern&quot; languages prohibit such interfacing in the name of safety - the only memory you can access is memory allocated through your safe language's runtime system. As a consequence, you are stuck in the closed universe of your language.</p>

<!-- more -->

<p>System integration is indeed an important and often overlooked aspect of working with software. And this is particularly true for scientific computing, where application software with a fixed set of functionality is rare. Solving a scientific problem typically involves combining many pieces of software into a very problem-specific whole, which may well be run only a few times (see also my <a href="http://blog.khinsen.net/posts/2017/01/13/sustainable-software-and-reproducible-research-dealing-with-software-collapse.html" >earlier post</a> on this topic). This is exactly the task of system integration: assembling pieces into a whole using glue code where necessary. In computational science, this glue code takes the form of scripts, workflows, or more recently notebooks. This is technically quite different from the OS-level system integration that Stephen Kell refers to, but functionally it is the same.</p>

<p>Stephen's post reminded me of my long-standing plan to write a blog post about why Python has been so successful in scientific computing, in spite of having a reputation for bad performance. So... here it is.</p>

<p>There are of course many reasons for Python's success, but one of them is that it does a pretty good job at system integration. There are two Python features that I consider important for this, which are not shared by many other languages. One is data types explicitly designed for interfacing, the other is <a href="https://en.wikipedia.org/wiki/Duck_typing" >duck typing</a> in combination with a small but versatile set of standard interfaces.</p>

<p>The first Python data type designed for interfacing in a scientific computing context is the good old <a href="http://www.numpy.org/" >NumPy</a> array - which is in fact older than NumPy, having been introduced in 1995 by NumPy's predecessor, Numeric. Arrays are one of the bread-and-butter data types in scientific computing, to the point of being the only one available in languages like Fortran 77 or APL. The implementation of arrays in Numeric was designed to use the same data layout as Fortran and C, in order to allow interfacing to the Fortran and C libraries that dominated scientific computing in 1995 (and still do, though to a somewhat lesser extent). The idea behind Numeric and later NumPy was always to use Python as a glue language for Fortran and C libraries, and achieve speed by delegating time-critical operations to code written in these languages.</p>

<p>The second Python data type designed for interfacing is <a href="https://docs.python.org/3/library/stdtypes.html#memoryview" >memoryview</a>, related to the <a href="https://docs.python.org/3/c-api/buffer.html" >buffer protocol</a>. This is as close as Python gets to C-style memory access. The buffer protocol lets different Python data types access each other's internal memory at the byte level. A typical use case would be an image data type (e.g. from <a href="https://python-pillow.org/" >Pillow</a>) allowing access to the in-memory representation of an image through an array type (e.g. from NumPy), permitting the implementation of image manipulation algorithms in terms of array operations.</p>

<p>The third and least known Python data type for interfacing is the <a href="https://docs.python.org/3/c-api/capsule.html" >capsule</a> that replaces the earlier <a href="https://docs.python.org/2/c-api/cobject.html" >CObject</a>. Capsules exist solely for the benefit of Python modules written in C, which can exchange opaque data with one another via glue code written in Python, even though the glue code itself cannot inspect or manipulate the data in any way. A typical use is to wrap C function pointers in a Python object such that Python glue code, e.g. a script, can pass a C function from one module to a to C code from another module.</p>

<p>All these interfacing data types mediate between Python and C code, although quite often the Python system integrator is hardly aware of using C code at all. The other Python feature for system integration, duck typing with standard interfaces, is what facilitates glueing together independently written Python modules. By &quot;standard interfaces&quot;, I mean the sequence and dictionary interfaces, but also the standard method names for operator overloading.</p>

<p>To see why this is an important feature, let us look at statically typed languages that by design do not have it. As a concrete example, consider multidimensional arrays in Java. They are not part of the language or its standard library, but they can be implemented on top of it with reasonable effort. In fact, there are several Java implementations you can choose from. And that's the problem. Suppose you want to use an FFT library based on array implementation A together with a linear algebra library based on array implementation B. Bad luck - the arrays from A and B have different types, so you cannot use the output of an FFT as the input to a linear equation solver. It doesn't matter that the underlying abstraction is the same, and that even the implementations have much in common. For a Java compiler, tje types don't match, period.</p>

<p>Python is not completely immune to this problem. It is perfectly possible to write Python code, or C code in a C module, that expects a precise type of data as input, and will raise an exception otherwise. But in Python code that would be considered bad style, and in C modules for Python as well except where required for performance or for compatibility with the C code. Wherever possible, Python programmers are expected to use the standard interfaces for working with data. Iteration and indexing work the same way for arrays as for the built-in lists, for example. For operations that are not covered by the standard interfaces, Python programmers are supposed to use Python methods, which are subject to duck typing as well. In practice, independently implemented Python types are much more interoperable than independently implemented Java types. For the specific case of n-dimensional arrays, Python has had the chance of overwhelming acceptance of a single implementation, which is due more to social and historical than to technical issues.</p>

<p>Finally, even though Python is a pretty good choice for system integration in scientific computing, there are of course limits, which are exactly of the kind that Stephen Kell explains in his essay: combining Python code with code in other managed languages, say R or Julia, requires a lot of work and even then is fragile, because the required hacks depend on undocumented implementation details. I suspect that the only solution would be to have language-neutral garbage-collected data objects proposed as an OS-level service that maintains an option for non-managed byte-level access à la C. The closest existing technology I am aware of is Microsoft's <a href="https://en.wikipedia.org/wiki/Common_Language_Runtime" >CLR</a>, better known by its commercial name .NET. Its implementation is now Open Source and runs on multiple platforms, but its Windows-only origins and strong ties to a huge Microsoft-y library have been an obstacle to adoption by the traditionally Unix-centric scientific computing communty.</p>

<h3>Comments retrieved from Disqus</h3>

<ul>
<li><i>vikas jain:</i><p>Very Impressive Python tutorial. The content seems to be pretty exhaustive and excellent and will definitely help in learning Python. I'm also a learner taken up Python training and I think your content has cleared some concepts of mine. While browsing for Python tutorials on YouTube i found this fantastic video on Python. Do check it out if you are interested to know more.:-<a href="https://www.youtube.com/watch?v=XreVt254514&amp;t=5s" rel="nofollow noopener" title="https://www.youtube.com/watch?v=XreVt254514&amp;t=5s">https://www.youtube.com/wat...</a></p></li>
<li><i>vikas jain:</i><p>I appreciate your work on Python. It's such a wonderful read on Python tutorial. Keep sharing stuffs like this. I am also educating people on similar Python so if you are interested to know more you can watch this Python tutorial:-<a href="https://www.youtube.com/watch?v=XreVt254514&amp;t=5s" rel="nofollow noopener" title="https://www.youtube.com/watch?v=XreVt254514&amp;t=5s">https://www.youtube.com/wat...</a></p></li>
<li><i>Urmila pandey:</i><p>Worthful Python tutorial. Appreciate a lot for taking up the pain to write such a quality content on Python course. Just now I watched this similar Python tutorial and I think this will enhance the knowledge of other visitors for sure. Thanks anyway.:- <a href="https://www.youtube.com/watch?v=HcsvDObzW2U" rel="nofollow noopener" title="https://www.youtube.com/watch?v=HcsvDObzW2U">https://www.youtube.com/wat...</a></p></li>
<li><i>Urmila pandey:</i><p>Worthful Python tutorial. Appreciate a lot for taking up the pain to write such a quality content on Python course. Just now I watched this similar Python tutorial and I think this will enhance the knowledge of other visitors for sure. Thanks anyway.:- <a href="https://www.youtube.com/watch?v=HcsvDObzW2U" rel="nofollow noopener" title="https://www.youtube.com/watch?v=HcsvDObzW2U">https://www.youtube.com/wat...</a></p></li>
<li><i>Manju Gupta:</i><p>Very Impressive Python tutorial. The content seems to be pretty exhaustive and excellent and will definitely help in learning Python. I'm also a learner taken up Python training and I think your content has cleared some concepts of mine. While browsing for Python tutorials on YouTube i found this fantastic video on Python. Do check it out if you are interested to know more.:-<a href="https://www.youtube.com/watch?v=qgOXopu4n7c&amp;" rel="nofollow noopener" title="https://www.youtube.com/watch?v=qgOXopu4n7c&amp;">https://www.youtube.com/wat...</a></p></li>
<li><i>Manju Gupta:</i><p>I appreciate your work on Python. It's such a wonderful read on Python tutorial. Keep sharing stuffs like this. I am also educating people on similar Python so if you are interested to know more you can watch this Python tutorial:-<a href="https://www.youtube.com/watch?v=qgOXopu4n7c&amp;" rel="nofollow noopener" title="https://www.youtube.com/watch?v=qgOXopu4n7c&amp;">https://www.youtube.com/wat...</a></p></li>
<li><i>Chris Barker:</i><p>You write:</p><p>"The idea behind Numeric and later NumPy was always to use Python as a glue language for Fortran and C libraries"</p><p>I have often wondered about this -- I started using Numeric in 1999, and followed the development through numarray, and then numpy, and onward :-)</p><p>I've often said that an ndarray is two things:<br>1) a nice featureful n-dimensional array object for Python, and<br>2) a wrapper around a C array (or really, a pointer to a data block).</p><p>(2) allows enormous power in communicating with Fortran and C codes -- as you mention.</p><p>The question is -- was this an intentional design decision? or a happy accident?</p><p>Has any one ever asked Jim Hugunin or David Asher?</p><p>(though I see your name on my historical copy of the docs from 2000 -- so maybe you were in on that decision at the time!)</p><ul>
<li><i>Konrad Hinsen:</i><p>Yes, I was part of the initial Numerical Python development team, so I can confirm that interfacing to C and Fortran code was an important goal at the time. There is actually some evidence for this in the code and the API. For example, the separation of the array object storage into a data space and a small Python object with just the bookkeeping information. Plus the possibility to create an array using an externally allocated and managed data space.</p><ul>
<li><i>Chris Barker:</i><p>Thanks! good to know it wasn't just a happy accident.</p><p>I haven't followed it recently, but at one point the folks working on the NumPyPy project really didn't "get" the importance of this aspect of numpy.</p></li>
</ul>
</li>
</ul>
</li>
<li><i>Stephen Kell:</i><p>Hi Konrad. Thanks for the "citation" and the kind words. :-)</p><p>One question: would you say it's the design of CPython and/or the Python language that have enabled this, or just the happenstance that somebody wrote those modules (NumPy, memoryview, capsule) and got them adopted? Could it have happened as easily in another dynamic language, say? I'm not familiar enough with the Python library ecosystem to distinguish these cases.</p><p>Your closing paragraph's idea of a "language-neutral garbage-collected data objects proposed as an OS-level service", is very close to what I've been working on with liballocs (<a href="https://github.com/stephenrkell/liballocs)" rel="nofollow noopener" title="https://github.com/stephenrkell/liballocs)">https://github.com/stephenr...</a>. I believe the trick is to tolerate as much diversity as possible, rather than fixing "one true way" to implement higher-level languages and cutting loose the non-conformers (the CLR approach, more-or-less). In particular, I'm (slowly) working towards a treatment of garbage collection that allows a considerable degree of pluralism -- think multiple somewhat-cooperating allocators/collectors, rather than a single shared one.</p><ul>
<li><i>Konrad Hinsen:</i><p>Hi Stephen,</p><p>You will probably find Rich Hickey's talk on the design of Clojure interesting: <a href="https://www.youtube.com/watch?v=2V1FtfBDsLU&amp;app=desktop" rel="nofollow noopener" title="https://www.youtube.com/watch?v=2V1FtfBDsLU&amp;app=desktop">https://www.youtube.com/wat...</a></p><p>He insists very much on a systems point of view and points out the dangers of language lock-in. His context is very different from yours and mine, but the overall message is the same.</p></li>
<li><i>Konrad Hinsen:</i><p>Hi Stephen,</p><p>thanks for your comments!</p><p>To answer your question, I'd say it is a bit of both. CPython had C modules right from the start, in fact it used them in its own implementation. Those C modules are a bit more than the FFI that any modern language has. It is bidirectional in that it gives C modules access to Python data types, and lets them define new ones. That was a perfect basis for the later developments (NumPy, memoryview, capsule), which wouldn't have made much sense otherwise. It didn't have to happen, but it definitely wouldn't have happened without the existing support.</p><p>Your liballocs looks interesting, although the list of build dependencies is a bit discouraging. I'll start by reading the paper :-)  The idea of a lightweight and minimalistic storage management, not tied to a language or even to a bytecode interpreter/compiler, looks very useful. In scientific computing, it could solve many problems of interfacing languages operating at different levels of storage abstraction, e.g. C, C++, Fortran 90 , and dynamic languages such as Python.</p><ul>
<li><i>Stephen Kell:</i><p>Thanks for the clarification!</p><p>Certainly I am interested in finding users of my work within scientific computing... I'm currently scratching my head about how best to achieve this. One possible blocker is that even small run-time overheads are often considered intolerable.</p><p>In case I can offer encouragement, most of the build dependencies are standard (and do not transfer to runtime)... the build instructions "should" "just work", at least on Debian-based machines (and with close equivalents on RPM distros). If not, do file a GitHub issue...  but yes, I am working on packaging the library and tools more nicely in various ways. :-) Some of the dependencies will be eliminated once I have integrated more closely with gcc/clang... again, some work is ongoing, though not going as fast as I'd like.</p><ul>
<li><i>Konrad Hinsen:</i><p>After a quick look at your 2015 paper, I confirm that this looks very interesting. But it seems that all language implementers must build on liballocs for this to work. This might take some time to happen.</p><p>As for run-time overheads, it all depends on where they occur. Much scientific code works on large uniform datasets, typically arrays. An overhead for the first access to an array is usually not a problem. An overhead for every element access would be prohibitive.</p><ul>
<li><i>Stephen Kell:</i><p>My hypothesis is that existing implementations can be retrofitted, rather than building new ones from scratch. But yes, this work needs to be done. And I admit the hypothesis is not tested yet, but is rather a case of "seems to be true" based on my current knowledge of the internals of various language implementations. The V8 modification mentioned in the paper started in this direction... but V8 is a particularly complex case. I hope to do some more work on this fairly soon, using on some simpler VM.</p><p>Yes, I try to confine overheads to rare operations, such as malloc-style allocation. So I think the core run-time services should be supportable on scientific code... just not every possible use of them (e.g. bounds-checking array accesses may have to be skipped).</p><ul>
<li><i>Konrad Hinsen:</i><p>The main problem I see is not so much the amount of work that must be done but the number of people that need to contribute to make it happen. That's perhaps more of a marketing question than a technical one.</p><p>Are people in your corner of computing at least aware of the importance of the problem you are trying to solve? In my corner (computational science), they are not, although in my opinion it's one of our biggest problems in daily life. Most people don't see it as a problem because they don't envisage a solution. Languages being isolated universes is just normal, there is nothing to be done about this. I tried to explain the issues in an earlier blog post (<a href="http://blog.khinsen.net/posts/2016/03/04/composition-is-the-root-of-all-evil.html)" rel="nofollow noopener" title="http://blog.khinsen.net/posts/2016/03/04/composition-is-the-root-of-all-evil.html)">http://blog.khinsen.net/pos...</a>, but apparently with little success.</p><p>As for array bounds checking, I am still hoping some PL designer will come up with a good solution. Mistakes in array index expressions are very frequent, but everyone turns off array bounds checking at compile time because of the huge runtime cost. Static array access validation would be very nice to have.</p></li>
</ul>
</li>
</ul>
</li>
</ul>
</li>
</ul>
</li>
</ul>
</li>
</ul>

<!-- Local Variables: -->

<!-- mode: markdown -->

<!-- End: -->
 ]]></description> </item><item> <title>Which mistakes do we actually make in scientific code?</title> <link>https://blog.khinsen.net/posts/2017/05/04/which-mistakes-do-we-actually-make-in-scientific-code.html</link> <pubDate>2017-05-04</pubDate> <author>Konrad Hinsen</author> <guid isPermaLink="true">https://blog.khinsen.net/posts/2017/05/04/which-mistakes-do-we-actually-make-in-scientific-code.html</guid> <category><![CDATA[ computer-aided research ]]></category><category><![CDATA[ scientific software ]]></category> <description><![CDATA[ <p>Over the last few years, I have repeated a little experiment: Have two scientists, or two teams of scientists, write code for the same task, described in plain English as it would appear in a paper, and then compare the results produced by the two programs. Each person/team was asked to do a maximum amount of verification and testing before comparing to the other person's/team's work.</p>

<!-- more -->

<p>Let me state the most disturbing outcome of this experiment first: we never found complete agreement between the two programs. Not once. And when we explored to find the cause of the discrepancies, we most often found bugs in <em>both</em> programs, plus missing details in the description written initially for human readers.</p>

<p>The two most practically significant experiments of this kind were actual research projects that have since been published:</p>

<ul>
<li><p><a href="http://dx.doi.org/10.1063/1.4821598" >A comparison of reduced coordinate sets for describing protein structure</a>. For this work, Shuangwei Hu wrote Matlab code, and I wrote the Python code that was <a href="https://doi.org/10.6084/m9.figshare.798825.v1" >ultimately published</a>.</p></li>
<li><p><a href="http://dx.doi.org/10.1063/1.4823996" >Model-free simulation approach to molecular diffusion tensors</a>. In this case, Gerald Kneller wrote Mathematica code, and I wrote the <a href="https://doi.org/10.6084/m9.figshare.808594.v1" >Python version</a> again.</p></li>
</ul>

<p>Later on, I did a series of similar experiments with PhD students participating in what can be summarized as advanced Python programming courses. PhD students with limited programming experience are exactly the kind of scientists who write much of the software for research projects. But the setting was &quot;exercises in a course&quot;, with programming tasks being much simpler, and much better specified, than what the typical research project requires.</p>

<p>The results of these experiments that I will summarize here are no more than anecdotal evidence. In fact, the initial goal was not to perform an experiment in scientific computing, but to perform better checks on the code for a research project. It would be interesting to do a larger-scale proper study, but that's beyond my means and competence.</p>

<p>As I already mentioned, there was never complete agreement between the two programs supposed to solve the same problem. In many cases the differences were small, and I suspect many would have brushed them away as caused by uncontrollable round-off, given that all problems were numerical in nature. But upon closer scrutiny, we always found different issues, and got much better agreement after fixing them. This is why I still believe that <a href="https://khinsen.wordpress.com/2015/01/07/why-bitwise-reproducibility-matters/" >bitwise reproducibility matters</a>. When small numerical differences are inevitable, as they are with today's scientific programming languages, it becomes much more difficult to search for and eliminate mistakes.</p>

<p>So which are the mistakes that were uncovered by comparing two independent implementations of the same method?</p>

<p>Number one, by far, is discrepancies between the informal description for human readers and the executable implementation. Put simply, the programs did not compute what the informal description said they should compute, or the informal description was incomplete, admitting more than one interpretation.</p>

<p>Number two is typos in numerical constants and in variable names. Since I can almost hear proponents of static typing saying &quot;that's what you deserve for using Python&quot;, let me add that most typos in variable names would <em>not</em> have been caught by static type checking. If you have two integer loop indices <code>i</code> and <code>j</code>, no type checker will complain when you interchange them by mistake.</p>

<p>Number three is off-by-one-or-two errors in loops and in array indices. If you have a complex formula involving lots of <code>x[i]</code>, <code>x[i+1]</code>, and <code>x[i-1]</code>, it's hard to avoid getting an index wrong occasionally. Unfortunately, array bounds checking does not catch all of these mistakes. Another interesting observation is that this type of mistake is just as likely in the informal description as in the code. Humans are apparently not very good at handling this kind of &quot;detail&quot;.</p>

<p>Is there anything we can do to reduce the risk of these types of mistakes? I'd say yes, but it's not going to be easy.</p>

<p>Let's start with what software engineering techniques could do to improve the situation. The main opportunity I see is for mistakes of the third kind. Index arithmetic could be eliminated altogether by abstracting it away. Most situations correspond to one of a handful of patterns, often called <a href="https://en.wikipedia.org/wiki/Stencil_(numerical_analysis)" >stencils</a>, which could become functions or macros in a suitable domain-specific language. Another idea, applicable to legacy code, is to have code checking tools recognize stencils and small deviations from common stencils and point out potential mistakes - see <a href="https://camfort.github.io/tvcs2017/#contrastin" >this presentation</a> at the recent
<a href="https://camfort.github.io/tvcs2017/" >2nd Meeting on Testing and Verification for Computational Science</a>.</p>

<p>Similar heuristic searches for potential mistakes could be applied to typos in variable names, though it is not sure that such reports would ultimately be useful. The real issue is the widespread use of short and similar variable names. A radical approach would be to ban them as part of a programming style guide, and have source code checkers flag violations of such a rule.</p>

<p>For the main source of mistakes, discrepancies between informal specification and implementation, software engineering approaches are totally hopeless in my opinion. After all, the programs are perfectly reasonable and consistent, they merely solve a problem that is different from the one they were written to solve. Given the current state of technology, the comparison between the two problem decriptions can only be done by human proofreading, as long as at least one problem description is informal. I suspect the best approach we have today is exactly what I described above - develop two independent implementations and compare.</p>

<p>In the long run, we can work on reducing the gap between informal descriptions (papers, software documentations) and executable implementations. I vaguely remember hearing about people exploring the possibility of turning informal descriptions into formal specifications by natural language processing - if anyone has a reference, please leave a comment! But I am rather skeptical of this approach, and therefore I prefer to let humans make the move towards formal specifications. The human-computer interface for such specifications is what I call <a href="http://sjscience.org/article?id=527" >digital scientific notations</a>, and I am currently working on <a href="https://github.com/khinsen/leibniz" >developing such a notation</a> for my corner of science, which is computational physics and chemistry.</p>

<p>Finally, let me point out that my experiments and their conclusion apply only to research code in the strict sense, i.e. code that was written to compute a result that is <em>a priori</em> unknown. Referring to my <a href="http://blog.khinsen.net/posts/2017/01/13/sustainable-software-and-reproducible-research-dealing-with-software-collapse.html" >earlier post on software collapse</a>, this is the fourth and project-specific layer of scientific software. When writing libraries and software tools that implement established methods for wider use, the situation is different because testing can be used much more effectively.</p>

<h3>Comments retrieved from Disqus</h3>

<ul>
<li><i>Vicky Pawar:</i><p>In fact, most scientists would tell you that they wouldn't have it any other way if they didn't make mistakes. This is because making mistakes is frequently the most effective way to learn.<br>You can learn about science and other latest news and articles on<br><a href="https://dewwool.com/" rel="nofollow noopener" title="https://dewwool.com/">Dewwool</a>.</p></li>
<li><i>alqualond:</i><p>Interesting post, thank you. Do you think that software engineering practices for extracting requirements and describing systems (eg UML diagrams) could help with the first problem (mismatch between specification and implementation), or is research software too different than "production code" for them to be useful?</p><ul>
<li><i>Konrad Hinsen:</i><p>The particularity of project-level research code is that its specification evolves as the research is done. In the beginning, the specification is no more than a list of ideas to explore, with references to earlier, more mature work. I have never seen anything like this done with UML or other software engineering notations.</p></li>
</ul>
</li>
<li><i>asmeurer:</i><p>Do you think 0-based indexing vs. 1-based indexing makes any difference for the index related bugs?</p><ul>
<li><i>Konrad Hinsen:</i><p>All my experiments were done in Python and therefore using 0-based indexing, so they cannot help answer this question. There wasn't any complex index arithmetic in any of the programs, as far as I remember so I wouldn't expect 1-based indexing to make any difference, but that's theory, not practice.</p></li>
</ul>
</li>
</ul>

<!-- Local Variables: -->

<!-- mode: markdown -->

<!-- End: -->
 ]]></description> </item><item> <title>Reproducible research in the Python ecosystem: a reality check</title> <link>https://blog.khinsen.net/posts/2017/04/06/reproducible-research-in-the-python-ecosystem-a-reality-check.html</link> <pubDate>2017-04-06</pubDate> <author>Konrad Hinsen</author> <guid isPermaLink="true">https://blog.khinsen.net/posts/2017/04/06/reproducible-research-in-the-python-ecosystem-a-reality-check.html</guid> <category><![CDATA[ python ]]></category><category><![CDATA[ reproducible research ]]></category> <description><![CDATA[ <p>A few years ago, I decided to adopt the practices of reproducible research as far as possible within the technical and social constraints I have to live with. So how reproducible is my published code over time?</p>

<!-- more -->

<p>The example I have chosen for this reproducibility study is a 2013 paper about computing diffusion coefficients from molecular simulations. <a href="https://doi.org/10.6084/m9.figshare.808594.v1" >All code and data</a> has been published as an <a href="http://www.activepapers.org/" >ActivePaper</a> on <a href="https://figshare.com/" >figshare</a>. To save space, intermediate results had been removed from the published archive. This makes my reproducibility check very straightforward: a simple <code>aptool update</code> will recompute everything starting from these intermediate results up to the plots that went into <a href="http://dx.doi.org/10.1063/1.4823996" >the paper</a>.</p>

<p>One nice aspect of ActivePapers is that it stores the version numbers of all dependencies, so I can quickly verify that in 2013, I had used Python 2.7.3, NumPy 1.6.2, h5py 2.1.3, and matplotlib 1.2.x (yes, the x is part of the reported version number).</p>

<h2>First try: use my current Python environment</h2>

<p>The evironment in which I do most of my current research has Python 3.5.2, NumPy 1.11.1, h5py 2.6, and Matplotlib 1.5.1. I set it up about a year ago when I got a new laptop, and haven't had a good reason to update it since then. I had made some effort back in 2013 to make my code compatible with Python 3, so why not try now if this was a worthy investment?</p>

<p>Outcome: running the computations works just fine, with results that are not identical at the bit level but close enough for my application. However, I get some warnings from matplotlib when generating the plots. Here is the first one, the others are similar:</p>

<pre><code>UserWarning: Legend does not support 'x' instances.
A proxy artist may be used instead.
See: http://matplotlib.org/users/legend_guide.html#using-proxy-artist
  "#using-proxy-artist".format(orig_handle)</code></pre>

<p>A quick inspection of the plots shows that the legends have almost disappeared, all that's left is a small white box. That makes many of the plots unintellegible.</p>

<p>Just out of curiosity, I made a quick attempt to figure out the error message. What's that 'x' instance? The following messages also refer to 'yz' instances and a few others. A look at my script reveals that 'x', 'yz' etc. are in fact the strings that I supplied as legends. Sounds strange to call them 'x' instances, as if 'x' were a class. And what's that cryptic reference to a proxy artist?</p>

<p>Better stop here: my goal was to see if I can reproduce my data and figures from 2013 in a Python environment from 2016, and the answer is no. The plots are mutilated to the point of no longer being useful.</p>

<h2>Second try: use my current Python 2.7 environment</h2>

<p>Some of my research code still lives in the Python 2.7 universe, so I also have a Python environment based on Python 2.7.11 on my laptop, with NumPy 1.8.2, h5py 2.5, and matplotlib 1.4.3. That's much closer to the original one, so let's see how well it does in my reproducibility evaluation.</p>

<p>Outcome: Much better. The computations work fine as before, and the plots generate a single warning:</p>

<pre><code>MatplotlibDeprecationWarning: The "loc" positional argument to legend is deprecated. Please use the "loc" keyword instead.</code></pre>

<p>The legends still look OK, so the warning is just a minor nuisance, as one would expect from a deprecation-related message. Interestingly, this warning is also about legends, so it looks like there was a serious backwards-incompatible change in matplotlib's <code>legend</code> function between 1.2 and 1.5, which was prepared by a deprecation warning in 1.4.</p>

<h2>Third try: reconstructing the original environment</h2>

<p>Since I have the version numbers of everything, why not try to reconstruct the original environment exactly? Let's go for the same major and minor version numbers, which should be sufficient. That's a job for Anaconda:</p>

<pre><code>conda create -n python2013 python=2.7 numpy=1.6 h5py=2.1 matplotlib=1.2 anaconda
source active python2013
pip install tempdir
pip install ActivePapers.Py</code></pre>

<p>Outcome: no warnings, no errors. Identical results. Reproducibility bliss at its best.</p>

<h2>Conclusions</h2>

<p>In summary, my little experiment has shown that reproducibility of Python scripts requires preserving the original environment, which fortunately is not so difficult over a time span of four years, at least if everything you need is part of the Anaconda distribution. I am not sure I would have had the patience to reinstall everything from source, given <a href="http://blog.khinsen.net/posts/2015/11/06/a-rant-about-software-deployment-in-2015.html" >an earlier bad experience</a>.</p>

<p>The purely computational part of my code was even surprisingly robust under updates in its
dependencies. But the plotting code wasn't, as matplotlib has introduced backwards-incompatible changes in a widely used function. Clearly the matplotlib team prepared this carefully, introducing a deprecation warning before introducing the breaking change. For properly maintained client code, this can probably be dealt with.</p>

<p>The problem is that I do not intend to maintain the plotting scripts for all the papers I publish. And that's not only out of laziness, but because doing so would violate the spirit of reproducible research. The code I publish is exactly the code that I used for the original work, without any modification. If I started maintaining it, I could easily change the results by accident. I'd thus have to introduce regression tests as a safeguard against such changes. But... how do I test for visual equivalence of plots?  Bitwise reproducibility is about as realistic to expect for image files as for floating-point numbers: I don't even get bitwise identical image files running the same Python code with identical matplotlib versions on different machines.</p>

<p>For my next paper, I will look for alternatives to matplotlib. My plotting needs are rather basic, so perhaps there is some other library with a more stable API that is good enough for me. Suggestions are welcome!</p>

<h3>Comments retrieved from Disqus</h3>

<ul>
<li><i>Vicky Steeves:</i><p>Hi! I'd also recommend checking out ReproZip, which is designed to capture the computational environment of research for reproducibility: <a href="https://reprozip.org" rel="nofollow noopener" title="https://reprozip.org">https://reprozip.org</a> &amp;&amp; <a href="https://examples.reprozip.org" rel="nofollow noopener" title="https://examples.reprozip.org">https://examples.reprozip.org</a></p><p>To create a completely reproducible package (a .rpz file), you just prepend "reprozip trace" to your current command -- so it would look like "reprozip trace python <a href="http://funScript.py" rel="nofollow noopener" title="funScript.py">funScript.py</a>" Then to create the package, you just type "reprozip pack &lt;package-name&gt;"</p><p>You can send that to someone else (a reviewer, a collaborator, yourself in 5 years) who can then reproduce your work across different operating systems/configs using our unpacker plugins. You can use a graphical interface or the command line to unpack.</p><p>The point of ReproZip is to create computationally reproducible work -- that is, capture research at the environment level, like your blog post captured so accurately, as easily as possible. 2 commands to pack, 2 to unpack (unless you use the GUI, then it's only a few clicks).</p><p>This is a low-barrier way to create nice little reproducible packages of your research, to either share with others or share with yourself. Anyway, you should check it out!</p><ul>
<li><i>Konrad Hinsen:</i><p>Thanks for mentioning ReproZip! It does sound familiar - I discovered this a few months ago but at least back then it was Linux only, so I couldn't use it for my own work. Judging from a quick look at the documentation, it seems there is now support for re-executing archives under MacOS and Windows, but making archives still requires Linux.</p><p>Not that this is meant as a criticism - a good tool for Linux is significant progress. And I understand that capturing environments at the executable level requires low-level systems operations that are hardly portable.</p></li>
</ul>
</li>
<li><i>Peter Amstutz:</i><p>Docker helps a lot.  Best practice seems to be to store the Dockerfile and everything that goes into it (including sw packages) in git so you can rebuild the same environment later.  But even then, I've seen changes in kernel version or VM configuration break even Dockerized workflows.</p><ul>
<li><i>Konrad Hinsen:</i><p>In the spirit of my reality check, I would appreciate hearing of experiments with the long-term stability of Docker images. Did anyone try to revive a four-year-old Docker image? Yes, I know that means going back to the very first Docker release, so it's perhaps asking for too much. But two years should be reasonable. Any takers?</p><ul>
<li><i>F. Pina Martins:</i><p>I think there's a paper somewhere in this idea. =-)<br>That being said, I'm genuinely interested in how well this works, since I'm currently using docker containers to make my own research reproducible.</p></li>
</ul>
</li>
</ul>
</li>
<li><i>Damien Irving:</i><p>Have you seen this post from Titus Brown?<br><a href="http://ivory.idyll.org/blog/2017-pof-software-archivability.html" rel="nofollow noopener" title="http://ivory.idyll.org/blog/2017-pof-software-archivability.html">http://ivory.idyll.org/blog...</a></p><p>It explores the concept of a half life for the repeatability of your research:</p><p>"... it is at least plausible to argue that we don't really care about our ability<br>to exactly re-run a decade old computational analysis.  What we do<br>care about is our ability to figure out what was run and what the<br>important decisions were -- something that Yolanda Gil refers to as<br>"inspectability."  But exact repeatability has a short shelf-life."</p><ul>
<li><i>Konrad Hinsen:</i><p>I do remember that post, since I participated quite a bit in the discussion about it. And I think that discussion deserves to continue.</p><p>Most of the current reflections about reproducibility, include my post here, start from the technical end: What is the state of the art? What is feasible in principle, what is feasible with reasonable effort? How can we do a bit better than the current state of the art? An aspect that has been neglected in comparison is the scientific end: What do we need to be able to do with published computational work in order to consider it a part of the scientific record?</p><p>Inspectabilty is an interesting concept in this context, but it remains vague for now. What makes a work inspectable, what makes it verifiable? How can we ensure/check inspectability and/or verifiability at publication time? What are the time scales over which we need to ensure them?</p><p>One key problem with the inspectability concept is that it is not obvious that reading program source code without being able to run it is useful in real life. Once a program reaches a modest level of complexity, looking at the source code is not sufficient to understand what it does, in my experience. A related issue is potential bugs in dependencies, which can only be detected if the precise versions of all dependencies are there - which you can only be sure of in practice if you can actually run the code.</p></li>
</ul>
</li>
<li><i>F. Pina Martins:</i><p>Great Post!<br>I'd recommend keep using matplotlib, regardless of how the package evolves. Focus instead on what you have shown here - reproducing the environment.<br>Keep in mind that *for now* matplotlib broke, but in 5 years, other components may break, since software is constantly evolving.<br>Having a way to "fixate" the environment seems to me like the way to go.</p><p>Regarding the plot "comparison", I wouldn't worry too much about image comparisons, as long as the data used to generate it can be regression tested.</p><ul>
<li><i>Konrad Hinsen:</i><p>Focusing on reproducing the environment is fine with me in principle, but we don't have any approach to this that has been around for long enough to be considered reliable. So for now, I prefer to do my best at both ends - preserving the environment AND avoiding unstable dependencies.</p></li>
</ul>
</li>
<li><i>Pierre de Buyl:</i><p>Looking at old plotting packages then? Would gnuplot somehow fit your needs?</p><ul>
<li><i>Konrad Hinsen:</i><p>I'd prefer something Python-based for two reasons:<br>- avoid the configuration/installation/version-checking issues related to external commands<br>- integration with ActivePapers<br></p></li>
</ul>
</li>
</ul>

<!-- Local Variables: -->

<!-- mode: markdown -->

<!-- End: -->
 ]]></description> </item><item> <title>Reproducibility does not imply reproduction</title> <link>https://blog.khinsen.net/posts/2017/01/24/reproducibility-does-not-imply-reproduction.html</link> <pubDate>2017-01-24</pubDate> <author>Konrad Hinsen</author> <guid isPermaLink="true">https://blog.khinsen.net/posts/2017/01/24/reproducibility-does-not-imply-reproduction.html</guid> <category><![CDATA[ computer-aided research ]]></category><category><![CDATA[ reproducible research ]]></category> <description><![CDATA[ <p>In discussions about computational reproducibility (or replicability, or repeatability, according to the preference of each author), I often see the argument that reproducing computations may not be worth the investment in terms of human effort and computational resources. I think this argument misses the point of computational reproducibility.</p>

<!-- more -->

<p>Obviously, there is no point in repeating a computation identically. The results will be the same. So the only reason to re-run a computation is when there are doubts about the exact software or data that were used in the original work, or doubts about the reliability of the hardware.</p>

<p>The point of computational reproducibility is to dispel those doubts. The holy grail of computational reproducibility is not a world in which every computation is run five times, but a world in which a straightforward and cheap analysis of the published material verifies that it is reproducible, so that there is no need to run it again. Actual reproduction attempts would be rare and reserved for situations such as suspicion of hardware failure or suspicion of fraud.</p>

<p>So how can we make reproducibility credible without actually doing reproduction? By using toolchains that have been proven in practice to make computations reproducible. Of course we do need to attempt <em>some</em> reproductions in order to validate these toolchains, but it's sufficient to do this for short computations. And if the toolchain is any good, the human effort should be close to zero as well.</p>

<p>The mere fact that we discuss computational reproducibility at all shows that we do have doubts. Most of us doing computational science have at some point had doubts about our own work. How did I make this figure? Was it made with the latest version of this script, or an earlier one? Did I run that simulations before or after installing the recent important bug fix? And when it comes to examining work by others described in a journal article, our ignorance usually reaches a level that the word &quot;doubt&quot; cannot convey - we don't really know anything. All we have is someone else's incomplete story. If we have doubts about our own work whose full story we know, why should we trust someone else's story blindly?</p>

<p>So the question about &quot;how much&quot; reproducibility we need comes down to a more basic question: What would it take to make you trust a computational result beyond a reasonable doubt? Here is my personal list of acceptable evidence as of today:</p>

<ul>
<li>I can repeat the computation on my computer and get close enough results.</li>
<li>The results are published as an <a href="http://www.activepapers.org/" >ActivePaper</a>.</li>
<li>The results come with a <a href="https://nixos.org/" >Nix</a> or <a href="http://guixsd.org/" >Guix</a> recipe for reproducing them.</li>
</ul>

<p>The last two cases point to toolchains that I personally consider trustworthy, given the experience I have with them. Both toolchains generate a detailed trace of what happened, with references to all the software and data. And both toolchains make mistakes improbable enough that the remaining risk is acceptable for me. Neither toolchain provides protection from fraud, so if I had a reason to suspect fraud, I'd still attempt a reproduction.</p>

<p>Note that I am not saying that everybody should use one of those toolchains. In their current state, they are neither universal nor sufficiently easy to use. But they do show the toolchain approach to reproducibility is viable.</p>

<!-- Local Variables: -->

<!-- mode: markdown -->

<!-- End: -->
 ]]></description> </item><item> <title>Sustainable software and reproducible research: dealing with software collapse</title> <link>https://blog.khinsen.net/posts/2017/01/13/sustainable-software-and-reproducible-research-dealing-with-software-collapse.html</link> <pubDate>2017-01-13</pubDate> <author>Konrad Hinsen</author> <guid isPermaLink="true">https://blog.khinsen.net/posts/2017/01/13/sustainable-software-and-reproducible-research-dealing-with-software-collapse.html</guid> <category><![CDATA[ computer-aided research ]]></category><category><![CDATA[ reproducible research ]]></category><category><![CDATA[ sustainable software ]]></category> <description><![CDATA[ <p>Two currently much discussed issues in scientific computing are the sustainability of research software and the reproducibility of computer-aided research. I believe that the communities behind these two ideals should work together on taming their common enemy: software collapse. As a starting point, I propose an analysis of how the risk of collapse affects sustainability and reproducibility.</p>

<!-- more -->

<p>What I call software <em>collapse</em> is what is more commonly referred to as software <em>rot</em>: the fact that software stops working eventually if is not actively maintained. The rot/maintenance metaphor is not appropriate in my opinion because it blames the phenomenon on the wrong part. Software does not disintegrate with time. It stops working because the foundations on which it was built start to move. This is more like an earthquake destroying a house than like fungi or bacteria transforming food, which is why I am trying out the term <em>collapse</em>.</p>

<p>The software stacks used in computational science have a multi-layer structure that seems to be nearly universal. At the bottom, there is non-scientific infrastructure, such as operating systems, compilers, and support code for I/O, user interfaces, etc. All of this software is used by scientists in the same way as by other computer users. The predominant view is that this software is external to scientific computing, much like computer hardware. One exception is infrastructure software for high-performance computing, which like the hardware it runs on is often designed specifically for use in science and engineering.</p>

<p>The second layer is  scientific infrastructure. Here we find libraries and utilities used for research in many different disciplines, such as LAPACK, NumPy, or Gnuplot. The people developing this software tend to be researchers or research software engineers, i.e. people with a scientific background. The methods (algorithms, data structures) implemented in these packages are typically well-known and stable. This does not exclude ongoing research on improving the implementations, but from the users' point of view, the job done by the software remains the same, often for several decades.</p>

<p>The third layer contains discipline-specific research software. These are tools and libraries that implement models and methods which are developed and used by research communities. Often the developers are simply a subset of the user community, but even if they aren't, they work in very close contact with their users, who provide essential feedback not only on the quality of the software, but also on the directions that future development should take.</p>

<p>The fourth and final layer is project-specific software, which is whatever it takes to do a computation using software building blocks from the lower three levels: scripts, workflows, computational notebooks, small special-purpose libraries and utilities. At the end of a project, such software may become the starting point for software specific to another project, but it is rarely reused without modification, and rarely used by anyone except the members of the project that developed it.</p>

<p>Computational models and methods often move down the stack in the course of time. They are developed initially within a specific project, then the more widely useful ones become part of discipline-specific software, and some of them may find adoption in other fields of research and become a part of the scientific infrastructure layer.</p>

<p>Software in each layer builds on and depends on software in all layers below it, meaning that changes in any lower layer can cause it to collapse.</p>

<p>The reproducible research community focuses on the fourth layer, the project-specific software. Traditionally, the main obstacle to reproducibility was that this layer was not published, and sometimes even deleted by its authors at the end of a project. This layer also contains algorithms executed by a human user, e.g. by entering commands one by one into the computer. This ephemeral software is typically not even recorded. Fixing these problems is mainly a matter of creating an awareness of their importance, and much progress has been made in this respect. But the problem of layer-4 software collapsing due to changes in the lower levels remains largely unsolved. Project-specific software is particularly vulnerable to collapse because it is almost never maintained, since its active days are over.</p>

<p>The sustainable software community is mainly interested in layer 3, the discipline-specific community software. Its development is fragile because the importance of this software is not yet recognized by institutions and funders, unlike the scientific infrastructure software one layer below. Moreover, this software is often developed by scientists with insufficient training in software engineering techniques. There are essentially two tasks that need to be organized and financed: preventing collapse due to changes in layers 1 and 2, and implementing new models and methods as the scientific state of the art advances. These two tasks go on in parallel and are often executed by the same people, but in principle they are separate and one could concentrate on just one or the other.</p>

<p>The common problem of both communities is collapse, and the common enemy is changes in the foundations that scientists and developers build on. The options they have for dealing with this are about the same as for house owners facing the risk of earthquakes:</p>

<ol>
<li>Accept that your house or software is short-lived. In case of collapse, start from scratch.</li>
<li>Whenever shaking foundations cause damage, do repair work before more serious collapse happens.</li>
<li>Make your house or software robust against perturbations from below.</li>
<li>Choose stable foundations.</li>
</ol>

<p>House owners generally opt for strategies 3 or 4, or a mixture of them. Strategies 1 and 2 are unattractive because house owners might well be injured or killed during a collapse.</p>

<p>Most software developers, in science or elsewhere, prefer strategies 1 or 2. In many business settings, this makes sense because software is short-lived or rapidly evolving anyway, due to changing requirements and newly appearing possibilities. In science, these motivations exist as well, but must be weighed against the need for preservation of the scientific knowledge embodied by scientific software. You may not care about losing the Web browser you used long ago, given that there's a better one now. But if ten years from now, doubts come up about the analysis of <a href="http://ligo.org/" >LIGO</a> data, you want to be able to go back to the analysis code and check what exactly was done at the time.</p>

<p>A difference between the sustainable software and the reproducible research communities is that the former privileges strategy 2, continuous repair, whereas the latter dreams of strategy 4, stable foundations. Strategy 2 is in fact easier to adopt, given that most of the software industry is applying it. Strategy 4 is seen as unrealistic by many, because stable foundations are hard to find, and the few we have impose unpleasant restrictions. But if developers in layer 3 adopt the continuous-repair strategy, this leaves only one option for the code in layer 4 - accept that it is short-lived. This is more or less what we see happening at the moment. For a recent discussion, see <a href="http://ivory.idyll.org/blog/2017-pof-software-archivability.html" >this blog post</a> by C. Titus Brown and the discussion following it.</p>

<p>In one of the comments there, Daniel S. Katz proposes a cost-benefit analysis,
which to the best of my knowledge has not been attempted until now. However, I think it should be done globally, rather than for an individual research project. A move towards stable foundations (strategy 4) is likely to require a large up-front investment, but lower development costs later on, for scientific code in all layers. It might well be interesting for nothing else but reducing global development costs, not even counting the hard to evaluate benefit of long-term reproducibility.</p>

<p>It's also worth looking at <em>why</em> software foundations are shaking all the time. Why can't we just keep on using the same software forever, if we are happy with the way it works?</p>

<p>One reason is the bottom layer of our software stack, which we share with non-scientific software. There are market incentives for shaking up the foundations of commercial software, which then cause collateral damage elsewhere, such as in science. For example, some markets rely on planned obsolescence and never-ending change to create continuous customer demand. Smartphones are a good example. Also, a company controlling a software platform might benefit from changing it a bit all the time in order to retain control and customer attention. Finally, security problems in systems software are discovered regularly, and their fixes can send ripples up the software stack. All this makes it difficult to find stable foundations to build on. However, it is clearly not impossible. After all, banks have been keeping their COBOL software alive for decades. At worst, we could build our own bottom layer instead of sharing it with other application domains. One advantage of scientific software in that respect is that it has few if any security concerns to deal with.</p>

<p>Unfortunately, we also have home-made quakes in our software stack, due to changes in layers 2 and 3. In the fast-paced development of layer 3, collateral damage sometimes leads to collapse in layer 4. I suspect much of this could be avoided with some more attention on stability, plus extensive testing. What's worse is a widespread attitude that considers stability impossible anyway and concludes that one more breaking change is not such a big problem after all. This is particularly harmful for the scientific infrastructure of layer 2. I'll just mention my two-year-old <a href="https://khinsen.wordpress.com/2014/09/12/the-state-of-numpy/" >rant about NumPy</a> as an example. In view of the systematic non-maintenance of layer-4 software, this is an inappropriate attitude in the world of scientific computing in my opinion.</p>

<p>As a final remark, strategy 3 does not seem to exist in the software world. There are no proven techniques for making a program robust against changes in its foundations. Software interfaces are much too rigid for that. I vaguely remember Alan Kay speaking about more lenient interface mechanisms - if anyone has a reference to share, please leave a comment! A recent <a href="https://www.youtube.com/watch?v=oyLBGkS5ICk" >presentation by Rich Hickey</a>, the creator of the Clojure language, also contains useful ideas for dealing with change in interfaces (executive summary: add new features, but don't remove or change existing ones), but it's more of a move towards strategy 4 than strategy 3. More generally, I would like to see more research and development along these lines. Robustness is a major design principle in other engineering domains, and software would benefit from a larger dose as well.</p>

<p><strong>Note added 2019-09-04:</strong> I have written a more detailed article about <a href="https://doi.org/10.1109/MCSE.2019.2900945" >Dealing with Software Collapse</a> for the May 2019 issue of <em>Computing in Science and Engineering</em> magazine. A <a href="https://hal.archives-ouvertes.fr/hal-02117588" >preprint</a> is available as well.</p>

<!-- Local Variables: -->

<!-- mode: markdown -->

<!-- End: -->
 ]]></description> </item><item> <title>From reproducible to verifiable computer-aided research</title> <link>https://blog.khinsen.net/posts/2016/05/11/from-reproducible-to-verifiable-computer-aided-research.html</link> <pubDate>2016-05-11</pubDate> <author>Konrad Hinsen</author> <guid isPermaLink="true">https://blog.khinsen.net/posts/2016/05/11/from-reproducible-to-verifiable-computer-aided-research.html</guid> <category><![CDATA[ computer-aided research ]]></category><category><![CDATA[ reproducible research ]]></category> <description><![CDATA[ <p>The importance of reproducibility in computer-aided research (and elsewhere) is by now widely recognized in the scientific community. Of course, a lot of work remains to be done before reproducibility can be considered the default. Doing computational research reproducibly must become easier, which requires in particular better support in computational tools. Incentives for working and publishing reproducibly must also be improved. But I believe that the Reproducible Research movement has made enough progress that it's worth considering the next step towards doing trustworthy research with the help of computers: verifiable research.</p>

<!-- more -->

<p>Verifiable research is research that you can verify for yourself. Not in the sense of verifying the scientific conclusions, which often can only be done many years later. The more modest goal is to verify that a publication contains no mistakes of the kind that every human being tends to make: mistakes in manual computations, mistakes in transcribing observations from a lab notebook, etc.</p>

<p>Ideally, all research should be verifiable. A paper is supposed to provide sufficient details about the work that was done to enable competent peers to verify the reasoning and repeat any experiments. Peer review is supposed to certify that a paper is verifiable, and reviewers are even encouraged to do the verification if that is possible with reasonable effort.</p>

<p>In the pre-computing era, much published research was indeed verifiable. Given the high cost of verifying experimental work, it is safe to assume that actual verification was the exeception. But theoretical work of any importance was commonly verified by many readers who repeated the (manual) computations.</p>

<p>With the increasing use of computers, papers slowly turned into mere summaries of research work. Providing all the details was simply impossible - software was too complex to be fully described in a journal article. It also became common to use software written by other people, and even commercial software whose detailed workings are secret. This development was nicely summarized <a href="http://statweb.stanford.edu/~wavelab/Wavelab_850/wavelab.pdf" >by Buckheit and Donoho</a> in 1995 in what became a famous quote in the Reproducible Research movement:</p>

<blockquote>
<p>An article about computational science in a scientific publication is not the scholarship itself, it is merely advertising of the scholarship. The actual scholarship is the complete software development environment and the complete set of instructions which generated the figures.</p>
</blockquote>

<p>Today this statement applies not only to computational science, but to all of computer-aided research, as many experimental and theoretical studies involve computers and software as well. The publication of all software and all input datasets in a form that other scientists can actually process on their own computers has become the main objective for making computer-aided research reproducible.</p>

<p>Unfortunately, having all the software and input data that go with a journal article is still not sufficient to make the work verifiable. With the exception of particularly simple computations, it is practically impossible to figure out what the software really computes, and in particular to verify that it computes what the paper claims it computes. Assuming, of course, that the paper actually <em>does</em> provide a detailed description of its claims, which is often not the case. Much computer-aided research is thus <a href="https://en.wikipedia.org/wiki/Not_even_wrong" >&quot;not even wrong&quot;</a>.</p>

<p>It is the complexity of much modern scientific software that makes verification practically impossible, and for that reason software is rarely subjected to peer review. After all, who would accept the Herculean task to verify the correct functioning of a piece of software? Even &quot;software papers&quot;, i.e. papers that merely exist to provide a citable reference for some software, are reviewed without any serious validation of the software itself. At best, reviewers check that best practices of software engineering have been respected, for example by writing a test suite with good code coverage. But no amount of testing can verify that the software computes what it is supposed to compute. If some numerical constant in the source code is off by 10% due to a typo, there's a good chance that nobody will ever notice. Such mistakes have happened (see <a href="http://dx.doi.org/10.1038/467775a" >this article</a> for a few stories), and there are good reasons to believe they are actually frequent (see <a href="http://f1000research.com/articles/3-303/v1" >this article</a> for arguments). The most convincing argument should be our daily experience with computers that crash or ask us to install &quot;critical updates&quot;. If systems software is so clearly full of mistakes, is it reasonable to assume that scientific software has none at all?</p>

<p>The difficulty of verifying computational results in combination with the obvious importance of computational techniques in science has lead to a change of attitude that in my opinion is detrimental to science in the long run. Most importantly, the burden of proof has been shifted from the proponents of a new hypothesis to its opponents. If you cannot show that a computational study is wrong, then it is silently assumed correct. If you want to publish results that are contradictory to work published earlier, it's your obligation to explain why, even though you cannot possibly verify the earlier work. This is why protein structures in contradiction with the <a href="http://www.the-scientist.com/news/home/39805/" >later retracted ones from Geoffrey Chang's group</a> were rejected for publication for a long time. Contradictory results should be handled by a critical inspection of all of them, but this is possible only for verifiable research.</p>

<p>Another detrimental change of attitude is that &quot;correct&quot; has been replaced by &quot;community-accepted&quot; as a quality criterion in many fields. Recently, I have started to ask a simple question after seminars on computational work: &quot;Why should I believe your results? What did you do to verify them?&quot; Most often, the answer is &quot;We used software and protocols that are widely applied in our community&quot;. Unfortunately, popularity can be taken as an indicator of correctness only if it is safe to assume that many users have actually verified those tools and methods. Which again assumes verifiability as a minimum criterion.</p>

<h2>So... what can we do?</h2>

<p>Verifiable computer-aided research is a tiny subset of today's published research. It's even a small subset of today's reproducible research. Can we do something about this? I believe we can, and I will summarize some possible approaches.</p>

<p>The most obvious approach to make a computation verifiable is to document all code and data well enough that a competent reader is convinced of its correctness. Literate programming (for algorithms) and computational notebooks (for computations) are good techniques for this. As with any scientific proofreading, verification by inspection requires much care and a critical attitude. People are easily fooled into believing something because it is well presented, for example. But the most important obstacle to this approach is the modularity of much of today's scientific software. If you reuse existing libraries - and there are of course good reasons to do so - then you probably won't rewrite them in literate programming style for explaining their algorithms to your critical reader. A computation is only as verifiable as its least verifiable ingredient.</p>

<p>Another way to make computer-aided research verifiable is to make the computations reimplementable. This means that the published journal article, or some supplementary material to that article, contains a precise enough human-readable description of the algorithms that a scientist competent in the field can write a new implementation from scratch, and verify that it produces the same (or close enough) results. This is not a fool-proof approach, of course, and again modularity is a major risk factor. If the computation uses some complex library and the reimplementor chooses to use the same library, then the library code is not verified by the reimplementation. The more the reimplementation differs from the original authors' code, the better it is as a verification aid. This is by the way also a strong argument for diversity in scientific software. In terms of development efficiency, a single community-supported software package per field is great, but for verifiability, it is better to have multiple packages that can do the same job.</p>

<p>Both approaches I have outlined fail for complex software. A million-line simulation code developed over many years by an entire research group can neither be studied nor reimplemented by a single person wishing to verify it. Even a small team working in close collaboration wouldn't be up to the task. The solution I propose for this situation is to introduce an intermediate layer between the software and the human-readable documents (papers, software documentation) that describe what it computes. A layer that contains all the science but none of the technicalities of the software, such as parallelism, platform-dependence, or resource management. The idea is to &quot;factor out&quot; the <a href="https://en.wikipedia.org/wiki/No_Silver_Bullet" >accidental complexity</a> and retain only the essential complexity, the one due to the complexity of the models and methods that the software implements. This idea is very similar to the use of <a href="https://en.wikipedia.org/wiki/Formal_specification" >formal specifications</a> in software development. The specification would be verified by human scientists, whereas the conformity of the software to the specification would be checked by automated methods, of which <a href="https://en.wikipedia.org/wiki/QuickCheck" >randomized unit testing</a> is probably the most immediately useful one.</p>

<p>An intermediate layer that factors out accidental complexity is also of interest for other uses in scientific research. That new layer would be the closest we can get to a digital representation of a model or a method. Rather than use it just in the specification of a single piece of software, we can use it for all kinds of analyses and comparisons, and cite it as the main scientific reference in work based on it, in addition to the citation to the software as the technical tool for doing the computations. For this reason, I call this layer &quot;digital scientific knowledge&quot; and the languages for expressing it &quot;digital scientific notation&quot;. None of this exists today, but many developments in computer science can be used as a basis for its development. For the details, see <a href="http://sjscience.org/article?id=527" >this article</a>.</p>

<h3>Comments retrieved from Disqus</h3>

<ul>
<li><i>paper editor:</i><p>It sounds nice that this kind of information was being shared in order for the guidance and ideas that they can promote. Through this, it would be an easy thing for them to see if this will all be worth it.</p></li>
<li><i>Sisaos:</i><p>It looks nice that this kind of information was being shared in order for the guidance and ideas that they can promote. Through this, it would be an easy thing for them to see if this will all be worth it.</p></li>
<li><i>paper editor:</i><p>It seems like a good thing that this kind of information was being shared in order for the guidance and ideas that they can promote. Through this, it would be an easy thing for them to see if this will all be worth it.</p></li>
</ul>

<!-- Local Variables: -->

<!-- mode: markdown -->

<!-- End: -->
 ]]></description> </item><item> <title>Composition is the root of all evil</title> <link>https://blog.khinsen.net/posts/2016/03/04/composition-is-the-root-of-all-evil.html</link> <pubDate>2016-03-04</pubDate> <author>Konrad Hinsen</author> <guid isPermaLink="true">https://blog.khinsen.net/posts/2016/03/04/composition-is-the-root-of-all-evil.html</guid> <category><![CDATA[ computational science ]]></category><category><![CDATA[ scientific computing ]]></category> <description><![CDATA[ <p>Think of all the things you hate about using computers in doing research. Software installation. Getting your colleagues' scripts to work on your machine. System updates that break your computational code. The multitude of file formats and the eternal need for conversion. That great library that's unfortunately written in the wrong language for you. Dependency and provenance tracking. Irreproducible computations. They all have something in common: they are consequences of the difficulty of composing digital information. In the following, I will explain the root causes of these problem. That won't make them go away, but understanding the issues will perhaps help you to deal with them more efficiently, and to avoid them as much as possible in the future.</p>

<!-- more -->

<p>Composing information is something we all do every day, mostly without thinking of it. A shopping list is the composition of names of things you need to buy. An e-mail message is the composition of the recipients' addresses, a subject line, and the body of the message. An address book is a composition of addresses, which in turn are compositions of various pieces of information related to some person.</p>

<p>Science has its own information items and associated compositions. Measurements are composed into tables. Mathematical equations are composed into more complex equations. Datasets are composed to make a database. Hypotheses are composed to make a model.</p>

<p>Writing computer programs means composing expressions and statements into procedures or functions, composing procedures to make modules, and composing modules to make programs. Reading data from a file means composing your algorithms with the data they work on into a complete computation. Configuring a new computer and installing software are about composing an operating system, various libraries, and application software into a functioning whole. </p>

<p>When you look at these examples more closely, you might notice that some of these acts of composition are so trivial that we don't even think about them, whereas others are a real pain. In that second category, we find most of the composition work related to computers. So what is the difference?</p>

<h3>Human and computational information processing</h3>

<p>Humans process information in terms of concepts. We all have accumulated a vast amount of conceptual knowledge over our lifetime, starting with the most basic concepts that we learned in infancy. This knowledge includes the definitions of all the concepts, but also the relations between them. Our knowledge of concepts helps us to &quot;make sense&quot; of information, which includes the detection of probable mistakes and sometimes even their correction. Humans are very tolerant to mistakes and variations in how some piece of information is expressed. We don't care if the items in a shopping list are arranged vertically or horizontally, for example.</p>

<p>When composing information, we read the individual items, translate them into concepts, and then write out the composition. I use the vocabulary of processing written language here, but the same holds for oral or visual communication. Variations in notation may be an inconvenience, but not a real problem. As long as the information refers to familiar concepts, we can deal with it.</p>

<p>Computers process information by applying precise mechanical rules. They don't care about concepts, nor about context. If you ask a computer to do something stupid, it will happily do so. This may look like a criticism of how computers work, but it's also exactly why they are so useful in research: they have different strengths and weaknesses compared to humans, and are therefore complementary &quot;partners&quot; in solving problems.</p>

<h3>Formal languages</h3>

<p>At the hardware level of a digital computer, a computation is a multi-step process that transforms an input bit sequence into an output bit sequence under the control of a program that is stored as a bit sequence as well. Information processing by computers thus requires all data to be expressed as bit sequences. Dealing with bit sequences is, however, very inconvenient for humans. We therefore use data representations that are more suitable for human brains, but still exactly convertible from and to the bit sequences that are stored in a computer's memory. These representations are called <a href="http://en.wikipedia.org/wiki/Template:Formal_languages_and_grammars" ><em>formal languages</em></a>.
The definition of a formal language specifies precisely how some piece of information is encoded as sequences of bits. Many formal languages are defined in terms of sequences of text characters instead of sequences of bits, for another level of human convenience. Since the mapping from text characters to bits is straightforward, this makes little difference in practice. The term &quot;formal language&quot; is commonly used in computer science, but in computational science we usually speak of &quot;data formats&quot;, &quot;file formats&quot;, and &quot;programming languages&quot;, all of which are specific kinds of formal languages. The use of formal languages, rather the the informal languages of human communication, is the defining characteristic of digital information.</p>

<p>The definition of a formal language consists of two parts, syntax and semantics. Syntax defines which bit patterns or text strings are valid data items in the language. Syntax rules can be verified by a suitable program called a parser. Semantics define the <em>meaning</em> of syntactically correct data items. With one important exception, semantics are mere conventions for the interpretation of digital data. Meaning refers to conceptual knowledge that a computer neither has nor needs: all it does is process bit sequences. The exception concerns formal languages for expressing algorithms, i.e. rules for the transformation of data. The semantics of an algorithmic language defines how each operation transforms input data into output data. Writing down such transformation rules obviously requires a notation for the data is being worked on. For that reason, a formal language that can express algorithms also defines the syntax and semantics of the input and output data for these algorithms. Your favorite programming language, whichever it is, provides a good illustration.</p>

<p>There is a huge number of formal languages today, which can be organized into a hierarchy of abstraction layers, such that languages at a higher level incorporate languages from lower levels. As a simple example, a programming language such as Fortran incorporates formal languages defining individual data elements - integers, floating-point numbers, etc. At the lowest level of this hierarchy, close to the bit level at which computing hardware operates, we have formal languages such as <a href="http://unicode.org/" >Unicode</a> for text characters or the floating-point number formats of <a href="http://dx.doi.org/10.1109%2FIEEESTD.2008.4610935" >IEEE standard 754</a>. One level up we find the memory layout of Fortran arrays, the layout of <a href="https://en.wikipedia.org/wiki/UTF-8" >UTF-8</a> encoded text files, and many other basic data structures and file formats. Structured file formats such as XML or HDF5 are defined on the next higher level, as they incorporate basic data structures such as arrays or text strings. Programming languages such as Python or C reside on that level as well.</p>

<p>Different formal languages that encode the same information at the semantic level can be converted into each other. The two best-known translations of this kind in the daily life of a computational scientist are file-format conversion and the compilation of software source code into processor instructions. However, if you take into account that the in-memory data layout of any program is a formal language as well, all I/O operations can be considered conversions between two formal languages.</p>

<h3>Composition of digital information</h3>

<p>Digital information is, by definition, information expressed in a formal language. Composition of digital information produces a new, more complex, digital information item, which is of course expressed in a formal language as well. And since the ingredients remain accessible as parts of the whole, everything must be expressed in one and the same formal language. And that's where all our trouble comes from.</p>

<p>If we start from ingredients expressed in different languages, we have basically two options: translate everything to a common language, or define a new formal superlanguage that incorporates all the languages used for expressing the various ingredients. We can of course choose a mixture of these two extreme approaches. But both of them imply a lot of overhead and add considerable complexity to the composed assembly. Translation requires either tedious and error-prone manual labor, or writing a program to do the job. Defining a superlanguage requires implementing software tools for processing this new superlanguage.</p>

<p>As an illustration, consider a frequent situation in computational science: a data processing program that reads a specific file format, and a dataset stored in a different format. The translation option means writing a file format converter. The superlanguage option means extending the data processing program to read a second file format. In both cases, the use of multiple formal languages adds complexity to the composition that is unrelated to the real problem to be solved, which is the data analysis. In software engineering, this is known as &quot;<a href="https://en.wikipedia.org/wiki/No_Silver_Bullet" >accidental complexity</a>&quot;, as opposed to the &quot;essential complexity&quot; inherent in the problem.</p>

<p>As a second example, consider writing a program that is supposed to call a procedure written in language A and another procedure written in language B. The translation option means writing a compiler from A to B or vice-versa. The superlanguage option means writing an interpreter or compiler that accepts both languages A and B. A mixed approach could use two compilers, one for A and one for B, that share a common target language. The latter solution seems easy at first sight, because compilers from A and B to processor instructions probably already exist. However, the target language of a compiler is not &quot;processor instruction set&quot; but &quot;processor instruction set plus specific representations of data structures and conventions for memory management&quot;. It is unlikely that two unrelated compilers for A and B are compatible at that level. Practice has shown that combining code written in different programming languages is always a source of trouble, except when using tools that were explicitly designed for implementing the superlanguage from the start.</p>

<p>In the last paragraph, I have adopted a somewhat unusual point of view which I will continue to use in the following. We usually think of a language as something named and documented, such as C or Unicode. The point of view I adopt here is that the language in which a piece of digital information is expressed consists of all the rules and constraints that must be satisfied, <em>including the rules and constraints due to composition</em>. To illustrate the difference, consider the <a href="https://www.python.org/" >Python language</a> and the Python language with the <a href="http://www.numpy.org/" >NumPy extension</a>. According to the standard point of view, Python is the language and NumPy is a library written in Python. In my point of view, Python+NumPy is a language <em>different</em> from plain Python. To see that libraries modify their underlying languages, consider the Python statement <code>import numpy</code>. It fails in plain Python, so it is not a valid statement in the Python language, whereas it is valid in the Python+NumPy language. Moreover, in the Python+NumPy language you are not allowed to write a module called <code>numpy</code>. The addition of NumPy to plain Python makes some formerly invalid programs valid and vice versa, which justifies speaking of different, though certainly similar, languages.</p>

<h3>Lots of languages, lots of problems</h3>

<p>The above discussion suggests that to keep our lives simple, we should use as few different formal languages as possible. Unfortunately, an inventory of what we have to deal with shows that we are very far from that optimum.</p>

<p>Data formats are the easiest part. Even the number of &quot;standard&quot; formats is enormous, and many of them aren't that well standardized, leading to different dialects. Worse, many scientific programs make up their own <em>ad-hoc</em> data formats that are scarcely documented. That's why file conversion takes up so much of our time. Moreover, we usually have different on-disk and in-memory data formats for the same data, which is why we need to write I/O routines for our software.</p>

<p>But the complexity of formal languages used to define programs completely dwarfs the complexity of data formats. Let's start at the bottom level: the processor's instruction set. If you write an operating system (OS), that's the level you work at. Otherwise, your program is a plug-in to be composed with an operating system, and the operating system defines the formal language in which you need to provide your program. The &quot;OS language&quot; includes the processor's instruction set, but also adds constraints (memory use, relocatability, ...) and access to OS functions. The OS language can be as simple as the <a href="https://en.wikipedia.org/wiki/COM_file" >COM file format</a> from CP/M and DOS days but also as complex as Linux' <a href="https://en.wikipedia.org/wiki/Executable_and_Linkable_Format" >ELF format</a>.</p>

<p>The ELF format introduces the next level of composition: object files and dynamic libraries, in addition to executable files. In a modern OS, a program is composed from several ingredients immediately before execution. The motivation for introducing this last-minute composition was the possibility to share frequently used program building blocks among the hundreds of processes running in parallel, thus reducing their memory footprint. But this comes at the price of considerable accidental complexity. The OS language that your program must be written in now includes note only the processor instruction set and the ELF format specification, but also conventions about where certain shared libraries are stored in the file system. That's why it is no longer possible to prepare a generic program for the Linux platform. Different Linux distributions have different conventions for arranging the shared libraries in the file system, and moreover these conventions change over time. They have different OS languages.</p>

<p>Upon closer inspection, the situation is actually even worse. The OS language for a given piece of software includes <em>all the software packages that have been installed on the same computer before</em>. Obviously, only one software package can occupy a given filename. Once you have installed a package that uses the file <code>/usr/lib/libm.so</code>, no other package can occupy the same slot. That makes it impossible to wrap up &quot;my program and all the files it requires&quot; for installation on some other machine. If package A contains <code>/usr/lib/libm.so</code> and package B another <code>/usr/lib/libm.so</code>, even if it is only a slightly older version of the same library, the two packages could not coexist. The only solution is to distribute programs and libraries as building blocks to be added to a growing assembly, whose composition - now called &quot;software installation - is left to the system administrator. Each block comes with a list of &quot;required dependencies&quot;, whose presence the system administrator must ensure. Moreover, each block occupies certain slots that must be available in the system. In the terminology of formal languages, each new block must conform to a language that its author cannot know in advance, and cannot even fully describe. I have described this error-prone approach in an <a href="http://www.activepapers.org/2014/01/31/Installing-Software.html" >earlier blog post</a> as the Tetris model of software installation, because of its obvious similarities with the well-known video game. It's the most widely used model in scientific computing today.</p>

<p>The obvious problems caused by this approach have motivated the development of various tools for the management of software installations. Some are specific to some OS platform (the package managers of Debian, RedHat, BSD, etc.). Others are specific to a programming language, e.g. Python's <code>distutils</code> system and its derivates. The multitude of software installation managers has created a secondary composition problem: to install a Python package on a Debian system, you must negotiate a compromise between Python's and Debian's views on how software installation should be managed.</p>

<p>Another approach is to give up on sharing common resources, and provide some way to package programs with all the files they need into a single unit, even if this leads to duplication of data on disk and in memory. This is the idea behind MacOS X application bundles (which go back to <a href="https://en.wikipedia.org/wiki/NeXTSTEP" >NextSTEP</a>) and also <a href="https://www.docker.com/" >Docker</a> containers. Tools such as Python's <a href="https://pypi.python.org/pypi/virtualenv" ><code>virtualenv</code></a> proceed in a similar way, by isolating a specific composition of building blocks from other potentially conflicting compositions of building blocks on the same computer.</p>

<p>An ingenious construction that combines the best of both worlds is the approach taken by the <a href="http://nixos.org/" >Nix</a> package manager and its offshoot <a href="https://www.gnu.org/software/guix/" >Guix</a>. Instead of having building blocks refer to each other through filenames, they use a hash code computed from the actual contents of the files. This allows the composition of arbitrary building blocks, including pairs that would claim the same filenames in a standard Linux system, but also prevents multiple identical copies of any building block. This idea is known as <a href="https://en.wikipedia.org/wiki/Content-addressable_storage" >content-addressable storage</a>, and is also used in the popular version control system <a href="https://git-scm.com/" >git</a>.</p>

<p>Up to here, I have described the composition of specific programs with an operating system. But the program that is prepared as a plug-in to an OS is itself already a composition. How it is composed and from which constituents depends on the programming language(s) being used and on the tools that implement them. In Python, for example, a program consists in general of packages which consist of modules which consist of name-value pairs. A C program consists of source code files and header files, which each contain value and function definitions and interact via macro definitions. Like in the case of the &quot;OS language&quot;, the precise formal language in which each piece is written is not just Python or C. It also includes constraints and extensions coming from other building blocks -- libraries -- that the program refers to, as I have illustrated above for the example of Python plus NumPy.</p>

<p>Comparing these two situations, we can identify the common culprit: the use of a global namespace for composing building blocks. In the &quot;OS language&quot; of a typical Linux system, the global namespace is the filesystem. In Python, it's the namespace of top-level module names. In C, it's the namespace of non-static top-level definitions. Composition requires one building block to refer to another building block through a name in that namespace. And that in turn requires each building block to occupy a specific name in that namespace, so that others can refer to it.</p>

<p>One way to alleviate this problem is encouraging the use of very specific names. That's the approach taken by Java, whose global namespace for packages is supposed to contain &quot;reversed domain names&quot; such as <code>org.apache.commons.lang3.math</code>. While such a rule, if respected, indeed reduces the risk of name collisions between unrelated packages to almost zero, the most frequent source of name collisions remains: different versions of a package have the same name and can therefore not be used together in a composition. When composing building blocks into a program, one can argue that mixing different versions is bad practice anyway. But in the Tetris model of a single global software collection per computer, not being able to have several versions of a building block is often a serious restriction.</p>

<p>A final kind of formal language worth mentioning in this context is languages for defining compositions. This category includes Makefiles, Dockerfiles autoconf configuration files, and of course the package specification files of the various package managers. Their multitude shows the importance of the composition problem, but it also contributes to it. It is not rare to see a specification file for one package manager refer to another package manager. Conversion from one such language to another is nearly impossible, because the precise language for defining a composition depends not only on the package manager, but also on the other existing packages. It's exactly the same situation as with the &quot;OS language&quot; and programming languages extended by libraries.</p>

<h3>Is there a way out?</h3>

<p>I believe that there is, and I have some ideas about this, but I will leave them for another time as this post is already quite long. I hope that the above analysis contributes to a better understanding of the problems that computational scientists are facing in their daily work, which is the prerequisite to improving the situation.</p>

<p>As a first step, I encourage everyone to prefer <em>solutions</em> to <em>workarounds</em> when faced with composition-related issues. Solutions identify a cause and eliminate it, whereas workarounds merely alleviate the impact of the problem, often re-creating the same problem at another level later on. In the approaches I have discussed above, an example of a solution is content-addressable storage, as used in Nix. In contrast, the traditional Linux package managers are workarounds, because they re-create a composition issue at the package level. Linux distribution authors have done a lot of hard and useful work with these package managers, which I don't want to play down in any way. But the fruit of that work can be carried over to better foundations. The Tetris model of software installation is not sustainable in my opinion. We have to move on.</p>

<h3>Comments retrieved from Disqus</h3>

<ul>
<li><i>Shalabh:</i><p>I strongly agree with much of what you have written above. The composition problem is widespread and eating most of our resources. Unfortunately it is mostly *invisible* in the sense that there is no deep understanding or study of 'composition models' like there is of 'programming languages'. Bit formats and PLs are also something we keep getting better at, so the tendency is "let's do more of those". We must first put on the 'composition oriented glasses' to look at our systems to even start seeing the problem.</p><p>The solution vs workaround distinction indeed seems useful. Perhaps it also depends on perspective? E.g. we could consider Nix to be a workaround to the root problem that the file system substrate does not provide content addressable storage.</p><p>This problem affects not just scientists, but almost all computer users, and even developers, ironically. Did you write more about the ideas you have around this? I skimmed this blog but didn't find anything specific on this theme.</p><ul>
<li><i>Konrad Hinsen:</i><p>I didn't write much else on this topic, but I have been working a lot on and with various reproducibility tools. One of them is [Guix](<a href="https://guix.gnu.org/)" rel="nofollow noopener" title="https://guix.gnu.org/)">https://guix.gnu.org/)</a>, which can be summarized as an alternative implementation of the Nix idea. You can certainly consider it a workaround for the lack of content-addressable storage at the OS level, but then content-addressable storage is only one ingredient to reproducible software systems. Most of the effort of the Guix community go into the package definitions. The build procedures of many software packages must be heavily modified to replace the standard paths by configurable ones, and then there is the work shared by all packagers of figuring out which versions of packages actually work together.</p><p>One idea I have pursuing is to start from the integration end of software building. Given Guix as a system integration tool, how should software be written and distributed to fit into a Guix system without prior torture? Some measures would be simple to define and apply, other more complicated, but all would hit the obstacle of rendering the software more difficult to manage without Guix, so all of that is unlikely to happen.</p></li>
</ul>
</li>
<li><i>Stephen Kell:</i><p>Hello again Konrad! Commenting here this time....</p><p>I really enjoyed this article, and it parallels my own thinking to an uncanny degree.</p><p>I particularly appreciate the wide treatment of different kinds of "language", extending to file formats and memory layouts, and seeing the resulting problems as language composition problems. And I completely agree that focus on programming languages, instruction sets and the like, distracts us from the fact that the conventions we layer in top of PLs, or indeed underneath them at the implementation level, are as important to composition as the languages themselves -- often more so, in fact. My own thinking with liballocs has definitely been working towards better support for automating (or even semi-automating) solutions to these problems, roughly limiting my scope to "within one process" for now.</p><p>I believe that the proliferation of "[generalised] languages" is something we need to address primarily by mitigation, and only secondarily by minimisation. Computers can help us mitigate a lot more than at present. Minimisation is to be preferred as far as it is feasible, butdiversity of requirements and decentralised working will guarantee the existence, survival and (re-)creation of more languages than are in theory necessary.</p><p>Both translation and supersetting can become tasks that the machine assists us in, even though at present they are mostly manual. We lack the kind of metaprogramming infrastructure for recovering commonality among distinct "languages". In fact, for my PhD (<a href="http://www.cl.cam.ac.uk/~srk31/#phd)" rel="nofollow noopener" title="http://www.cl.cam.ac.uk/~srk31/#phd)">http://www.cl.cam.ac.uk/~sr...</a> I more directly attempted a version of this problem, without huge success; in any case, a better infrastructure for doing these things is exactly my medium-term vision for liballocs. Although I have started with type information for in-memory objects, binary file formats are an obvious next step, and there's nothing to prevent an extended descriptive framework from covering other languages, filesystem objects, etc.. It feels particularly important to capture the layering of encodings (somehow!).</p><p>I agree that the problem extends to namespace management in general and software installation/deployment in particular... I have some ideas in that space too, again having much in common with Nix/Guix... though in general I am not fitting this into liballocs yet... I would rather come at these problems "from the other end" in the hope of converging later.</p><ul>
<li><i>Konrad Hinsen:</i><p>Thanks for your extensive comments! It's good to see that at least one reader has understood this post. Most feedback I got at the time (privately) was along the lines of "I have no idea what you are talking about".</p><p>Mitigation is certainly the right approach for progressing without having to start from scratch. Mediation (as in your Cake language) sounds good as well. As a start, I propose that every PL design or implementation team should include an experienced diplomat. A lot would be gained if people would stop designing closed universes.</p><p>Ubiquituous metadata for introspection, which is my hopefully not too wrong summary of your liballocs project, looks like another approach to mediation. I have some experience with this on the file format level (via HDF5, which stores data structure definitions along with datasets, see <a href="https://support.hdfgroup.org/HDF5/" rel="nofollow noopener" title="https://support.hdfgroup.org/HDF5/">https://support.hdfgroup.or...</a> for details), where it works very well, in part because the HDF5 library includes data conversion utilities for the most frequent situations.</p><p>Now we just have to convince the rest of the world that this is an important problem to solve...</p></li>
</ul>
</li>
<li><i>online assignment in australia:</i><p>Well, this was being explained well especially that there might be a lot of people who might mislead their thought regarding on this kind of matter. At least, there are this kind of blogs that helps them to understand more.</p></li>
</ul>

<!-- Local Variables: -->

<!-- mode: markdown -->

<!-- End: -->
 ]]></description> </item><item> <title>On HDF5 and the future of data management</title> <link>https://blog.khinsen.net/posts/2016/01/07/on-hdf5-and-the-future-of-data-management.html</link> <pubDate>2016-01-07</pubDate> <author>Konrad Hinsen</author> <guid isPermaLink="true">https://blog.khinsen.net/posts/2016/01/07/on-hdf5-and-the-future-of-data-management.html</guid> <category><![CDATA[ scientific computing ]]></category> <description><![CDATA[ <p>Yesterday a <a href="http://cyrille.rossant.net/moving-away-hdf5/" >blog post</a> by Cyrille Rossant entitled &quot;Moving away from HDF5&quot; caught my eye. My own tendency at the moment is to use HDF5 more and more, so I was interested in why someone else would want to do the opposite. Here is my conclusion after reading his post, plus some ideas about where scientific data management is or should be heading in my opinion.</p>

<!-- more -->

<p>Any evaluation of some technology happens in the context of a specific application's requirements, and this is where Cyrille's and my own experience differ in an important point: I have never run into performance problems with HDF5, probably because my jobs do much more computation (relative to I/O) than his. This also makes parallel access less of a problem for me, although I agree that HDF5's parallel support could be better.</p>

<p>Otherwise, I agree with much of his criticism of HDF5, but I still conclude that its problems are the smallest evil compared to any other technology I know of. The big problem with HDF5 from my point of view is what Cyrille calls &quot;opacity&quot;: the complexity of the file format which in practice means that the only way to use HDF5 files is via the HDF5 library. Which is, indeed, far from perfect. However, given my requirements, there is pretty much no competition to HDF5. The only alternative would be to roll my own system, which isn't a pleasant idea either.</p>

<p>The peculiar combination of requirements that to the best of my knowledge only HDF5 fulfills is:</p>

<ul>
<li>the hierarchical management of multiple datasets with associated metadata as a single unit for archiving and publishing</li>
<li>efficient access to the individual datasets</li>
</ul>

<p>The first requirement rules out the approach of using a directory with lot of individual files. The second requirement rules out container formats such as zip - having to unpack a dataset for processing is too much overhead.</p>

<p>My first requirement is exactly what Cyrille describes as the &quot;HDF5 philosophy&quot;, so it's no wonder that HDF5 fits my needs rather well. His question &quot;One can wonder why not just use a hierarchy of files within a directory.&quot; thus deserves a few comments. I have done that for a while, and many of my colleagues still do it. My experience is that, after copying around the data between different machines a few times, I always ended up losing files or having mismatched versions. Which, of course, raises the questions why I copy around the data.</p>

<p>Cyrille says that &quot;today's datasets are so big that they don't tend to move a lot.&quot; Well, first of all, mine are not <em>that</em> big. My HDF5 files are a few MB to a few GB in size. Individual datasets range from a few hundred bytes to a few GB, and the number of datasets in a HDF5 file ranges from ten to a few thousand. And I copy them around because I handle different tasks in my workflow on different machines. Most data transfers happen between my desktop/laptop and the computing cluster that I use for number crunching. I couldn't do the number crunching on my desk, nor the data inspection and visualization on the cluster in batch mode. Since the two machines have no shared file storage, I can't avoid copying the data back and forth. Moreover, collaborators' desktop machines participate in the overall workflow as well.</p>

<p>For jobs that handle much bigger datasets, copying is indeed not an option, and the usual way to work is to keep the data on a single server-type machine that also handled the computation. I cannot use that kind of setup because I have neither my software nor my computers are made for it. All my software was written with local disk storage in mind - just like HDF5.</p>

<p>Taking a step back from the technical details, my analysis of the situation is that we are living in a transition period from local to distributed storage of scientific data. Local storage was the only option in the past, before fast networks came along. Distributed storage is what fits today's working patterns best: large data, geographically widespread collaborations, etc. But distributed storage still lacks good infrastructure, and is therefore badly supported by much scientific software.</p>

<p>The future of scientific data management is, in my opinion, something like <a href="https://ipfs.io/" >IPFS</a>: a single logical view of data spread out over a vast network of machines. Software accesses the data using a mixture of references (like filenames, URLs, etc.) and content-based addressing (e.g. through hashes). If performance demands local storage, the data is cached by the middleware. The middleware also ensures availability with decent performance and redundant storage to prevent data loss. No data would ever be copied explicitly, but simply retrieved &quot;from the cloud&quot;.</p>

<p>In such a world, my HDF5 files would become small datasets containing references to other, potentially big datasets, plus metadata. Content-based addressing plus transparant data movements performed by the middleware would ensure coherence - nothing would be messed up by me shoveling data around with manually typed scp commands. I suspect Cyrille would be happy with this as well. The only problem is that we do not have this infrastructure. Worse, given the cost and building and maintaining such infrastructure, we are not likely to have it for many years to come. So... after this short dream, it's back to HDF5 for me.</p>

<h3>Comments retrieved from Disqus</h3>

<ul>
<li><i>Daan van Vugt:</i><p>Hi Konrad,</p><p>For that kind of analysis tasks you might look at sshfs with a generous cache as a kind of distributed file system middleware, I use it very often when analysing data.</p><p>Ipfs looks very interesting and loads better!</p><p>Daan</p></li>
</ul>

<!-- Local Variables: -->

<!-- mode: markdown -->

<!-- End: -->
 ]]></description> </item><item> <title>From facts to narratives</title> <link>https://blog.khinsen.net/posts/2015/12/08/from-facts-to-narratives.html</link> <pubDate>2015-12-08</pubDate> <author>Konrad Hinsen</author> <guid isPermaLink="true">https://blog.khinsen.net/posts/2015/12/08/from-facts-to-narratives.html</guid> <category><![CDATA[ computational science ]]></category> <description><![CDATA[ <p>A recurrent theme in computational science (and elsewhere) is the need to combine machine-readable information (which in the following I will call &quot;facts&quot; for simplicity) with a narrative for the benefit of human readers. The most obvious situation is a scientific publication, which is essentially a narrative explaining the context and motivation for a study, the work that was undertaken, the results that were observed, and conclusions drawn from these results. For a scientific study that made use of computation (which is almost all of today's research work), the narrative refers to various computational facts, in particular machine-readable input data, program code, and computed results.</p>

<!-- more -->

<p>A computational notebook, as pioneered by <a href="https://www.wolfram.com/technology/nb/" >Mathematica</a> and recently popularized by <a href="https://jupyter.org/" >Jupyter</a> (formerly known as the IPython notebook), is another document that mixes facts and narratives. Compared to a scientific article, program code takes a much more prominent role, and the narrative is focused on the computation. In software development tools, we find the fact-narrative mixture in version control, where the commits are a stream of facts to which the commit messages attach a narrative. At a more basic level, comments in program code can be thought of as narratives embedded into the code. <a href="http://www.literateprogramming.com/" >Literate programming</a> inverts this relation by embedding the code into a narrative.</p>

<p>All these situations share a common problem: the tools we have today force us to choose between treating the facts first-class, accepting a low-quality narrative, or to optimize the narrative while compromising on the quality of fact management. In the following, I will argue that this is due to a poorly thought-out relation between facts and narratives, and outline possible improvements.</p>

<p>Comments in source code are an example where priority is given to the facts, i.e. the program. The reader is supposed to read the code, the comments are there only to provide non-obvious background information, and sometimes to outline an overall structure. Reading commented code takes a lot of time and effort, because the reader has to deal with all the details of the program code. A pure narrative would explain software at a more abstract level, leaving out details or relegating them to an appendix. As an example for the opposite extreme, a scientific article is primarily a narrative, including only small pieces of the facts for illustration. A complete description of the facts would require all of the program code and input data. This is why replicability and reproducibility are currently big issues in computational science.</p>

<p>Facts and narratives live in two different universes. Facts belong to the computational universe, in which all information is encoded in formal languages with (ideally) well-defined syntax and semantics. Computation processes input data (which includes the program code) and produces output data in a process that is perfecly well-defined and deterministic. A real-life computation depends on a lot of input data due to the many details that matter. That means a lot of facts, but computers are very good at handling a lot of facts.</p>

<p>Narratives belong to the universe of human thought and communication. They rely on a rich context that human readers are expected to have acquired through prior study. This context contains in particular the appropriate abstractions that allow the narrative to remain at a manageable level, because humans can only keep a limited amount of details in their heads. To see the importance of this point, imagine a narrative that explains how to &quot;open a door&quot; in terms of the detailed eye movements and muscle contractions required to perform this task - such a narrative would be completely incomprehensible. On the other hand, narratives do not need to be very precise in many aspects because humans excel at &quot;making sense&quot; of information even if it contains mistakes and incongruences.</p>

<p>Computers are good at handling facts but not narratives. Humans are good at handling narratives but not facts in the quantities that typically define a computation. Letting computers intervene in the processing of narratives leads to funny results  - try <a href="https://translate.google.com/" >Google Translate</a> on a non-trivial text for an illustration. Letting humans intervene in the execution of a computation is a major source of mistakes. That is why a key ingredient to improving replicability is the automation of all computational steps. In the ideal world, no part of a computation would be defined by a narrative providing nstructions for a human operator. Anyone who has every had to install software knows that we are still far away from that ideal world.</p>

<p>Note that I only said that humans should not intervene in the <em>execution</em> of a computation. They do of course intervene in its definition. Program source code, after all, is written by humans. More generally, humans intervene quite often in computational science by using interactive tools. In that case, the stream of user interactions becomes part of the definition of a computation. If it is recorded, the computation can later be executed again without human intervention. This is of course well known: replicability requires that all user interaction must be recorded.</p>

<p>Since facts and narratives live in different universes, we should avoid mixing them carelessly. Crossing the boundary between the two universes should always be explicit. A narrative should not include copies of pieces of facts, but references to locations in a fact universe. And facts should not refer to narratives at all. The relation between the two universes is not symmetric: computers are tools made by humans for their benefit, so the computational universe is subordinate to the human universe.</p>

<p>Now let us look at the examples cited in the beginning from this new point of view. In scientific communication, the separation of facts and narratives was actually well respected initially. The lab notebook recorded facts, and the published paper contained a narrative quoting facts from the lab notebook. No scientist would ever have contemplated writing a paper by modifying the contents of his or her lab notebook! Unfortunately, this basic wisdom was lost with the adoption of computers. Computers make it very easy to modify information, to the point that version control had to be invented to prevent massive information loss by careless editing. Moreover, the distinction between a lab notebook and a paper became blurred by both being files processed using a computer. Finally, computational scientists never adopted the habit of keeping lab notebooks until very recently, coming mostly from a theoretical rather then an experimental background.</p>

<p>Today there is a lot of discussion about &quot;electronic lab notebooks&quot;, but the fundamental characteristic of a lab notebook being a record of facts is not often mentioned in this context. Very frequently, computational notebooks as implemented by Jupyter or Mathematica are claimed to be lab notebooks for computational science. It is probably clear at this point that I do not agree. Computational notebooks are designed for writing narratives that include computations and their results. They are best considered specialized word processors that encourage refining a document through many iterations of modification involving the code, its results, and the textual elements. The computational side of notebooks is limited to efficient interactive code evaluation. There is no logging of interactions, and no description of the computational infrastructure (libraries, ...) on which the interactive computations rely. As a consequence, computations in a notebook are in general not replicable. I believe this can be fixed, and I have made <a href="https://github.com/jupyter/enhancement-proposals/pull/4" >a concrete proposal</a> for doing so, but unfortunately I do not have the means to actually implement this idea.</p>

<p>In version control as it was originally designed, a repository is a fact database that contains sequences of versions of file sets. Commit messages, like comments in a program, are small narratives that provide a high-level overview and often a motivation for each change. The role of a repository is similar to the role of a lab notebook: it is a permanent record of what happened, with narratives written close in time to the recorded events. As commits and commit messages accumulate over time, following along becomes an arduous task for a human reader: the narrative contains too much irrelevant detail. This became a serious practical issue as version control was adopted as a tool for collaboration, with members of a team communicating through commit messages. Git therefore introduced the approach of <a href="https://git-scm.com/book/en/v2/Git-Tools-Rewriting-History" >&quot;rewriting history&quot;</a>. The idea is to &quot;clean up&quot; a stream of commits by re-ordering and merging them and by writing new commit messages, with the goal of creating a better narrative. Rewriting history remains a hot topic of debate. Most people realize the utility of cleaning up the narrative, but it also feels wrong to destroy the original historical record in the process. Moreover, there is a clear risk of introducing mistakes when rewriting history. In view of what I said above, the basic mistake is the failure to separate cleanly facts from narratives. The cleaned-up narrative should be separate from the original commented stream of commits and refer to it. In git terminology, rewriting history should create a new branch, and the rebasing operations done in deriving the new branch from the initial one should be recorded. Moreover, the editing tools should ensure that the final file contents are the same in the two branches.</p>

<p>I hope that these two examples have illustrated why it is desirable to keep facts and narratives distinct, with well-defined references from narratives to facts. Unfortunately, today's computational technology doesn't help much with reaching this goal when the facts are parts of a complex computation. We cannot define such a computation while remaining completely in the computational universe. And we cannot define unambiguous references to arbitrary facts inside a computational universe either. Most of the data formats and tools we use for preparing narratives do not even try to respect the separation of universes. Finally, the formal languages we use to encode computational facts (programming languages, file formats, etc.) are mostly not designed for being embedded into narratives. There's still a lot to do.</p>

<!-- Local Variables: -->

<!-- mode: markdown -->

<!-- End: -->
 ]]></description> </item><item> <title>This blog is moving!</title> <link>https://blog.khinsen.net/posts/2015/11/12/This-blog-is-moving.html</link> <pubDate>2015-11-12</pubDate> <author>Konrad Hinsen</author> <guid isPermaLink="true">https://blog.khinsen.net/posts/2015/11/12/This-blog-is-moving.html</guid>  <description><![CDATA[ Welcome to the last post on this WordPress blog. I have set up a <a href="http://blog.khinsen.net/">new blog</a> for all my future writing.<br><br>The reason for the move is that the user interface at WordPress is changing all the time without ever getting better. I like to write my posts on my own computer using Emacs, rather than typing into a rudimentary editing window on a Web site. This is not completely impossible with WordPress, but more hassle than it's worth.<br><br>My new blog is <a href="https://github.com/khinsen/blog">hosted on GitHub</a> and powered by <a href="https://github.com/greghendershott/frog">Frog</a>, a static Web site generator that mixes my posts written as plain <a href="https://daringfireball.net/projects/markdown/">Markdown</a> files with HTML templates based on the <a href="http://getbootstrap.com/">Bootstrap</a> framework to produce the pages you can read. This setup gives me much more control over my blog, while at the same time making it easier for me to publish new posts.<br><br>The one feature that will disappear is the possibility to subscribe to my blog in order to be informed about new posts by e-mail. If you have a GitHub account, you can get the same effect by following updates to the <a href="https://github.com/khinsen/blog">repository</a> that contains my blog. But the easiest way to learn about new posts is to <a href="http://twitter.com/khinsen">follow me on Twitter</a>.
 ]]></description> </item><item> <title>The lifecycle of digital scientific knowledge</title> <link>https://blog.khinsen.net/posts/2015/11/09/the-lifecycle-of-digital-scientific-knowledge.html</link> <pubDate>2015-11-09</pubDate> <author>Konrad Hinsen</author> <guid isPermaLink="true">https://blog.khinsen.net/posts/2015/11/09/the-lifecycle-of-digital-scientific-knowledge.html</guid> <category><![CDATA[ computational science ]]></category> <description><![CDATA[ <p>Like all information with a complex structure, scientific knowledge evolves over time. New ideas turn into validated models, and are ultimately integrated into a coherent body of knowledge defined by the concensus of a scientific community. In this essay, I explore how this process is affected by the ever increasing use of computers in scientific research. More precisely, I look at  &quot;digital scientific knowledge&quot;, by which I mean scientific knowledge that is processed using computers. This includes both software and digital datasets. For simplicity, I will concentrate on software, but much of the reasoning applies to datasets as well, if only because the precise meaning of non-trivial datasets is often defined by the software that treats them.</p>

<!-- more -->

<p>Before looking at the &quot;digital&quot; aspects, I will summarize the traditional lifecycle of scientific knowledge from the &quot;printed page&quot; era. It has been going on for centuries and follows well-established procedures and habits. I will then argue that these procedures should serve as a guideline for the management of digital scientific knowledge as well, and that computing technology for science should be designed to support this lifecycle.</p>

<p>New observations, instruments, models, methods, and ideas are first published in journal articles. Such an article explains the background and motivation for the work, summarizes the state of the art, and then exposes the new elements that the authors wish to contribute to the scientific record. Other scientists from the field read the article, and draw conclusions for their own work, which are translated to citations to the article in their own publications. After some time, if the original publication creates enough interest, it will become a subject of discussion in its research community, and it will be mentioned in review articles, which place it in the context of other recent work in the field.</p>

<p>Being cited in review articles is typically the last step in the lifecycle of an individual contribution. Its ideas and conclusions are then merged with related ideas and conclusions and reformulated to become part of the state of the art of the field, recorded in reference works, monographs, and textbooks. These works represent a kind of community concensus. New research, in the same or in other domains, builds on such concensus knowledge, often implicitly by assuming that every reader of a journal article is familiar with the contents of reference works, monographs, and textbooks.</p>

<p>The introduction of computers into scientfic research has lead to many changes to this process. Some of them, such as the transition from paper to computer files as a support medium for scientific article and, reference works, are relatively minor. The most profound change is that an important part of digital scientific knowledge exists only in the form of software. This is true in particular for complex scientific models, for which we have no other convenient form of representation. An example where this situation is very explicit is the <a href="https://www2.cesm.ucar.edu/" >Community Earth System Model</a> for climate research, which takes the form of a software package. Most often, the status of computational models is more fuzzy. As an example, consider force fields for proteins such as <a href="https://en.wikipedia.org/wiki/AMBER" >AMBER</a> or <a href="https://en.wikipedia.org/wiki/CHARMM" >CHARMM</a>. People refer to these force fields by citing scientific articles, but these articles contain only outlines of the models. Their only complete recorded expressions are implementations as part of simulation software packages, but unlike for the Community Earth System Model, there is no software package designed to function as a reference implementation defining the model.</p>

<p>The fundamental difference between software and other media for storing scientific knowledge is that software has two sides: a human-facing side, and a machine-facing side. As a medium for expressing scientific knowledge, software fulfills the same role as prose or mathematical formulas. But the necessity of specifying a computation so precisely that a machine can execute it imposes severe constraints (software must be expressed using formal languages), and the desire to perform computations efficiently in a world of finite resources adds a different set of priorities in software development that are often in conflict with the criteria attached to the role of a medium for expressing ideas. As an illustration, the source code of a simulation program that has been heavily optimized for parallel execution combines 10% of scientific model with 90% of resource management and bookkeeping, making the scientific model not only hard to understand but even hard to find in the source code. For a more detailed discussion, see <a href="http://f1000research.com/articles/3-101/v2" >my article</a> in F1000Research.</p>

<p>Many of the problems that computational science is facing today (reliability, reproducibility, black-box mentality, etc.) can be traced back to an insufficient support for the lifecycle of scientific knowledge by today's software development tools. Practically all of them were developed by and for software development communities outside of scientific research. As a consequence, these tools (programming languages, compilers, packaging and deployment tools, version control systems, etc.) do not take into account the specificities of scientific computing. Worse, computational scientists do nothing to improve the situation. The dominant attitude today is &quot;scientists have to adopt best practices from software engineering and acquire the skills required to apply them&quot;. What I advocate is a somewhat different point of view: scientists should adapt these practices and the tools that implement them to their specific needs.</p>

<p>To see where the problems are, let's look at the lifecycle of scientific knowledge expressed as software. New models and methods are developed by a mixture of thinking, tinkering,  and exploring the consequences. This requires a representation that humans can understand and manipulate easily. Executability by a computer is a condition, but other machine-related criteria hardly matter at this stage. Once some useful contribution to the field has been identified, it is communicated to the research community, in a form that is easily understandable, but also easy to deploy on other people's computers. This step is the equivalent of publishing a scientific paper. Next, other scientists start to play with the new stuff. This includes comparisons with other models and methods, analysis of model properties, application to different scenarios, etc. The conclusions from this work should take a form similar to a review article. This would be a toolkit in which different models and methods are made available for execution, with added annotations about their relative strengths and weaknesses. Finally, a synthesis of different ideas leads to a concensus implementation supported and maintained by a wider community of scientists, both as a basis for their own future work and as an infrastructure tool for other communities. This last step corresponds to reference works, and should be accompanied by tutorials that take the role of textbooks. At this stage, usability and performance become major criteria, whereas it is acceptable that not everyone can easily understand the implementation. Those who do wish to understand the method can go back to the &quot;review paper&quot; stage.</p>

<p>Most of the discussion about scientific software today is focused on the last stage. It's about community-supported software packages, whose sustained development requires significant efforts and investments. Most of this effort is required to keep the software useful in a world of rapidly changing computational environments, and to improve its human interfaces. A smaller part is dedicated to implementing new scientific models and methods. This effort has no equally important counterpart in the traditional lifecycle of scientific knowledge, and therefore the people who work on it find it hard to get recognition for their work. It is &quot;not science&quot; by the standards of the generation that occupies most leadership positions in research today. Fortunately, this attitude is starting to change.</p>

<p>This focus on the last stage is perhaps also the reason for the dominating attitude that scientists should simply adopt best practices from software engineering. In fact, the development and maintenance of community software packages implementing concensus models and methods is technically close enough to software development in business and industry that the same tools and procedures can be applied. This is not true, however, for the the earlier stages in the lifecycle of digital scientific knowledge. As we will see, they are not well supported by today's software development tools and practices. What's worse is that most computational scientists accept this situation as inevitable.</p>

<p>At the first stage, a scientist's activity is better described by &quot;manipulating and exploring models and methods&quot; than by &quot;software development&quot;. Computational models are of course algorithms, and thus software, but this is almost a technical detail. What is more important is a clear view of the hypotheses and approximations that have lead to a specific model, and a trace of the scientific validation that has been performed (comparison with experimental data and with other models). Programming languages are not at all a good match for this kind of work, nor are software engineering approaches such as testing. In terms of software technology, a computational model is much closer to a specification than to a piece of software.</p>

<p>For the next stage, the evaluation of a new idea in a narrow community of specialists, the technical requirements are somewhere in between the two neighboring stages. The manipulation of computational models loses some importance, whereas evaluation and comparison become more relevant. Interoperability matters a lot: even if the authors of two models chose different languages (corresponding to different scientific notations in the traditional scenario), a comparative evaluation should be a straightforward task. With programming languages, it clearly isn't. The technical difficulties of making programs written in different languages talk to each other are effectively discouraging scientists from even trying. We would need tools such as &quot;notational adapters&quot; and, even more importantly, some low-level conventions for code and data that everybody can agree and build on. As a guideline for developing such technology, keep the analogy with review articles in mind. What would an executable review article about similar but independently developed computational methods look like? Which authoring tools are available to support such work?</p>

<p>Finally, the transition from the first two stages to the last one is not as smooth as it ought to be. Quite often, an implementation written for convenient manipulation by humans must be completely rewritten in order to fit into a collection of optimized subroutines. What we should have is compiler-like tools that translate code from the first two stages into standard programming languages, using annotations added by expert programmers for guidance. The idea is to have a toolchain (1) guarantee the equivalence of the initial and the optimized level, and (2) keep track of additional approximations that were made for performance reasons. Moreover, community-supported optimized software libraries should be usable as infrastructure tools in the next level of model and method development, and thus be interoperable with the tools appropriate for the first stage, which are inexistent for now.</p>

<p>Another way to describe this specificity of scientific computing, compared to other application domains, is the absence of a clear borderline between software developers and software users. Most scientists are users of tried and trusted computational methods while working on the development or validation of methods at another level. The only clear separation we have, conceptually, is the one between scientific models and methods on one hand and computing technology (in particular resource management) on the other hand. Unfortunately, that is exactly the separation that current software technology does not allow us to make.</p>

<h3>Comments retrieved from Disqus</h3>

<ul>
<li><i>:</i><ul>
<li><i>Konrad Hinsen:</i><p>I kind of agree with much of what you say, but it's about publishing, not about knowledge representation. In that respect, the transition to digital has opened up many new options and I am all for exploring these - in fact, I am participating actively in doing so.</p><p>The specific topic of this post is not how information is shared and archived, but how knowledge is encoded in the form of symbols. What I want to preserve from the printed paper era is the flexibility of adapting notation to the task. It is programming languages that are rigid and constraining when seen as a medium for expressing thoughts. There are good (but also bad) reasons for that in the context of software development, but they don't carry over to doing science.</p><p>Finally, the analogy with statically linked executables is not very useful in my opinion: an executable is not at all useful for communication scientific knowledge.</p></li>
</ul>
</li>
</ul>

<!-- Local Variables: -->

<!-- mode: markdown -->

<!-- End: -->
 ]]></description> </item><item> <title>A rant about software deployment in 2015</title> <link>https://blog.khinsen.net/posts/2015/11/06/a-rant-about-software-deployment-in-2015.html</link> <pubDate>2015-11-06</pubDate> <author>Konrad Hinsen</author> <guid isPermaLink="true">https://blog.khinsen.net/posts/2015/11/06/a-rant-about-software-deployment-in-2015.html</guid> <category><![CDATA[ python ]]></category><category><![CDATA[ scientific computing ]]></category><category><![CDATA[ rants ]]></category> <description><![CDATA[ <p>We all know that software deployment in a research environment can be a pain, but knowing this as a fact is not quite the same as experiencing it in reality. Over the last days, I spent way more time that I would have imagined on what sounds like a simple task: installing a scientific application written in Python on a Linux machine for use by a group of students in a training session. Here is an outline of the difficulties, in the hope that it will (1) help others who face similar problems and (2) contributes a little bit to improving the situation.</p>

<!-- more -->

<p>The software that I installed is <a href="http://dirac.cnrs-orleans.fr/nMOLDYN/" >nMOLDYN</a>, an analysis tool for Molecular Dynamics trajectories. From a software engineering point of view, this is a rather standard Python program building on <a href="http://numpy.scipy.org" >NumPy</a> and <a href="http://dirac.cnrs-orleans.fr/MMTK" >MMTK</a> for its computations and on <a href="https://wiki.python.org/moin/TkInter" >Tkinter</a> and <a href="http://matplotlib.org/" >matplotlib</a> for the graphical user interface. There is no need for anything on the bleeding edge, a decent three-year old installation of the scientific Python stack would support this perfectly well.</p>

<p>The machine that was set up for the training session is configured much like a typical node in a compute cluster: stable and trusted software installed once and never updated. More specifically, the machine runs <a href="https://www.centos.org/" >CentOS</a> 6.7. Another feature rather typical of compute nodes is the very restricted network connectivity: users can log in via <code>ssh</code>, and copy data in and out using <code>scp</code>. Everything else is blocked, in particular all outgoing network traffic. The idea is that students will work on desktop or laptop machines, from where they have full network access to search for information, and connect to the compute server only for running scientific software. For my own software installation I had to limit myself to a user account, i.e. no administrator rights, although I could ask the systems administrator to install additional RPMs from CentOS.</p>

<p>A first exploration of the system's Python installation showed a collection of oldies: Python 2.6.6, NumPy 1.4.1, matplotlib 0.99.1.1. That's the state of the art five years ago. I quickly decided not to use it at all, for two reasons. First, I wasn't sure how much of what I had to add would still work with such old versions. All the software was already around five years ago, but I would have had to track down the versions that were current back then. Second, adding modules in a user account to a Python installation at the system level can easily lead to a fragile total. Following Murphy's law such problems would show up during the student sessions. So I decided to start with a fresh install of Python 2.7.</p>

<p>First surprise: no C compiler. An e-mail to the administrator, and I had gcc. Trying to install Python showed that the Tcl/Tk setup was incomplete: the header files were missing. An another e-mail asking for tcl-devel and tk-devel, and that was settled as well. Python, NumPy, netCDF, ScientificPython, and MMTK  were up and running half an hour later. An attempt to install nMOLDYN resulted in the information that I still needed to install Pyro and matplotlib. That can't be so hard, right?</p>

<p>Pyro was no problem indeed, but matplotlib kept me busy for a few more hours. All I had done in the past was <code>pip install matplotlib</code>, but <code>pip</code> is useless without outgoing network connections. I had to track down source tarballs for matplotlib and all its dependencies. There's a <a href="http://matplotlib.org/users/installing.html" >list</a> of dependencies on the matplotlib Web site, but it's incomplete in two ways: some dependencies are missing (setuptools and six), and others are given by name but without a link. Try googling for &quot;cycler&quot; - you will learn a lot about celestial mechanics before you find a package with this name on <a href="https://pypi.python.org/pypi" >PyPI</a>. Of all the matplotlib dependencides, only freetype was already available on my machine, so I had some searching and downloading to do.</p>

<p>The installation instructions for setuptools clearly do not consider the possibility of not having a network connection. They tell me to download a Python script and execute it to download the real software. Fortunately, there is the &quot;advanced&quot; installation option via a tarball. Which ends rather quickly with an error message complaining about the absence of the <code>zlib</code> module.</p>

<p>That module is part of the Python standard library, but it is compiled only if zlib (the C library) is installed on the machine. It wasn't on mine. This is not particularly difficult to fix, but rather annoying: I had to install zlib, and then run the Python installation once more. Not to forget: I knew <code>zlib</code> was in the standard library, and I immediately saw why it was missing on my machine, because I have been installing Pythons in lots of environments over twenty years. Someone else might well have spent a few hours figuring out what to do about zlib.</p>

<p>From then on everything went smoothly, so this is the end of my story. In order to provide something constructive, here is the complete list of matplotlib dependencies with links, and in the order of installation:</p>

<ul>
<li><a href="http://www.zlib.net/" >zlib</a></li>
<li><a href="http://www.python.org/" >Python</a></li>
<li><a href="http://www.numpy.org/" >numpy</a></li>
<li><a href="http://pypi.python.org/pypi/setuptools" >setuptools</a></li>
<li><a href="http://pypi.python.org/pypi/six" >six</a></li>
<li><a href="http://pypi.python.org/pypi/Cycler" >cycler</a></li>
<li><a href="http://pypi.python.org/pypi/pyparsing" >pyparsing</a></li>
<li><a href="http://pypi.python.org/pypi/python-dateutil" >python-dateutil</a></li>
<li><a href="http://pypi.python.org/pypi/pytz" >pytz</a></li>
<li><a href="http://www.libpng.org/pub/png/libpng.html" >libpng</a></li>
</ul>

<p>Finally, I will pass on a hint that came in this morning via Twitter:</p>

<blockquote class="twitter-tweet" lang="de"><p lang="en" dir="ltr"><a href="https://twitter.com/khinsen">@khinsen</a> <a href="https://twitter.com/MrTheodor">@MrTheodor</a> pip install —download . —no-use-wheel &lt;foobar&gt;&#10;&#10;Won’t work if there are Linux specific dependencies though.</p>&mdash; Donald Stufft (@dstufft) <a href="https://twitter.com/dstufft/status/662620534010740736">6. November 2015</a></blockquote>

<script async src="//platform.twitter.com/widgets.js" charset="utf-8"></script>

<p>Using <code>pip install --download . --no-use-wheel matplotlib</code>, run of course on a machine that has a network connection, you get tarballs for matplotlib and all its dependencies that pip knows about. You still have to add setuptools (which pip doesn't download because it depends on it itself), the C libraries libpng and zlib, and of course Python's standard but not-always-there <code>zlib</code> module.</p>

<p>Looking back at my twenty years using Python, I come to the unfortunate conclusion that software installation is much more of a problem today than it was back in 1995. The main reason is of course that Python software has become more feature-rich and complex - in 1995 something like matplotlib was only a dream. But the state of Python packaging tools is also to blame, with three overlapping and partially compatible tools (distutils, setuptools, and distribute) creating a lot of confusion and various distribution formats (tarballs, eggs, wheels) adding another layer of complexity. What is also sorely missing is a straightforward way to package an application program with all its dependencies in such a way that it can be installed with reasonable effort on all common platforms.</p>

<h3>Comments retrieved from Disqus</h3>

<ul>
<li><i>ostrokach:</i><p>Check out the Anaconda python distribution (<a href="https://anaconda.org" rel="nofollow noopener" title="https://anaconda.org">https://anaconda.org</a>). It addresses most of the issues that you list in this blog.</p><ul>
<li><i>Konrad Hinsen:</i><p>Anaconda is great if all the packages you need are in it. If you have to add pure Python packages by hand afterwards, that's OK as well, but if you have to add packages with C extension modules, Anaconda is more of a pain than a help in my experience. And that's what I needed in this specific case (Scientific Python + MMTK + nMOLDYN, all with extension modules).</p><p>Moreover, I am not sure that Anaconda is usable in an environment without Internet connection, though I haven't tried.</p><ul>
<li><i>ostrokach:</i><p>You could set up a local Anaconda repository, and copy into it all the packages (*.tar.bz files) that you require. Anaconda comes with `zlib` and all the other binaries which were giving you a hard time, compiled on old CentOS 5 and thus working on most Linux distros. Furthermore, you could created conda packages for MMTK and nMOLDYN on a local machine, and so you wouldn't even need `gcc` on the server.</p><p>Edit:</p><p>&gt; If you have to add packages with C extension modules, Anaconda is more of a pain than a help in my experience</p><p>If you don't have root access on your machine, you will end up compiling many of the required C libraries in your home directory anyway. You might as well make a conda package out of them so you can use them anywhere. It also helps you want to install several python packages that have different dependencies (NumPy, Boost, etc.).</p><ul>
<li><i>Konrad Hinsen:</i><p>That all sounds nice in theory, but practice is different. I did )try to build conda packages for ScientificPython and MMTK but gave up. Linking extension modules to the shared libraries already provided by Anconda proved to be impossible in a portable way.</p><ul>
<li><i>Chris Barker:</i><p>It's been another year, and Conda / Anaconda has grown more features and support. Particularly with conda-forge</p><p>This would probably be very easy today. Though a non-connected environment is a lot harder.</p><p>You'd need to download all your conda packages by hand, but that's not too hard to do, if annoying.</p><p>And there is the Constructor project:</p><p><a href="https://github.com/conda/constructor" rel="nofollow noopener" title="https://github.com/conda/constructor">https://github.com/conda/co...</a></p><p>that might not even be neccesary</p><p>So it HAS gotten better!</p></li>
</ul>
</li>
</ul>
</li>
</ul>
</li>
</ul>
</li>
<li><i>Emmanuel V.:</i><p>Mmm. You're right about Python packaging tools.</p><p>For your setup, an approach could have been:</p><p>1) replicate the "hostile" CentOS environment in a local VM (eg VirtualBox on your laptop, same base OS version, but with Internet connectivity)</p><p>2) Install all needed software as an unprivileged user</p><p>3) copy this user account on the target CentOS</p><p>Emmanuel</p><ul>
<li><i>matthew scholz:</i><p>This is a solution that should not be required.</p><ul>
<li><i>Chris Barker:</i><p>well, it's not, but really, on eo f the sources of the problem here is a locked down environment -- having a not-isolated "copy" of the exact same environment should be standard part of such a system.</p></li>
</ul>
</li>
<li><i>Konrad Hinsen:</i><p>Yes, that's another possible strategy. But setting up a virtual machine is also a lot of work, so it comes down to estimating how close one will be to the break-even point. I really didn't expect to spend much time on this before I started.</p></li>
</ul>
</li>
</ul>

<!-- Local Variables: -->

<!-- mode: markdown -->

<!-- End: -->
 ]]></description> </item><item> <title>Beyond Jupyter: what&#039;s in a notebook?</title> <link>https://blog.khinsen.net/posts/2015/09/03/Beyond-Jupyter-whats-in-a-notebook.html</link> <pubDate>2015-09-03</pubDate> <author>Konrad Hinsen</author> <guid isPermaLink="true">https://blog.khinsen.net/posts/2015/09/03/Beyond-Jupyter-whats-in-a-notebook.html</guid> <category><![CDATA[ computational science ]]></category><category><![CDATA[ python ]]></category><category><![CDATA[ reproducible research ]]></category><category><![CDATA[ scientific computing ]]></category> <description><![CDATA[ Yesterday I participated (as a visitor) in the kickoff meeting for <a href="http://opendreamkit.org/">OpenDreamKit</a>, where one recurrent topic of discussion was notebooks, both <a href="http://jupyter.org/">Jupyter</a> and <a href="http://www.sagemath.org/">Sage</a>, including the question if they could be brought together. This reminded me of a recent <a href="http://opiateforthemass.es/articles/why-i-dont-like-jupyter-fka-ipython-notebook/">blog post</a> by Kirill Pomogajko entitled "Why I don't like Jupyter". And it reminded me of my own long-term project of integrating Jupyter with my <a href="http://www.activepapers.org/">ActivePapers</a> system for reproducible research. That's three reasons for writing down my thoughts about notebooks and their role(s) in computational research, so here we go.<br><br>One key observation is in <a href="https://disqus.com/home/discussion/opiateforthemasses/why_i_dont_like_jupyter_fka_ipython_notebook/#comment-2222536779">Gaël Varoquaux's comment</a> on Kirill's blog post: using Jupyter for doing science creates a lock-in, because all collaborators on a project must agree on using Jupyter. There is no other tool that can be used productively for working with notebooks. It's a case of "wordization": digital content is taken hostage by a tool that defines a storage format for its own convenience without much consideration for other tools, be they competing or complementary. Wordization not only restricts the users' freedom to work with their data, but also creates headaches for the future. A data format defined by a tool can easily become unusable as the tool evolves and introduces incompatibilities, or of course if it disappears. In the case of Jupyter, its developers have always provided upgrade paths for notebooks between versions, but at some time this is bound to create trouble. Bugs are a fact of life, and I don't expect that the version-2-compatibility-feature will get much testing in Jupyter version 23. To make it worse, a Jupyter notebook can depend on third-party code that implements embedded widgets. This is one of the reasons why I don't use Jupyter for my research, although I am a big fan of using it for teaching. The other reason is that I cannot usefully link a notebook to other relevant information, such as code and data dependencies. Jupyter doesn't provide any functionality for this, and they are hard to implement externally exactly because of wordization.<br><br>Wordization is often associated with evil intentions of market dominance, as they are regularly assumed for a company like Microsoft. But I believe that the fundamental cause is the obsession with tools over content that has driven the computing industry for many years. The tool aspects of a piece of software, such as its feature list and its user interface, are immediately visible. On the contrary, its data model attracts attention only by a few specialists, if at all. Users feel the consequences of bad (or absent) data model design through the symptoms of wordization, in particular lock-in, but rarely understand where it comes from. Interestingly, this problem was also mentioned yesterday at the OpenDreamKit meeting, by <a href="http://www.jacobs-university.de/directory/mkohlhase">Michael Kohlhase</a> who discussed the digital representation of mathematical knowledge and the difficulty of exchanging it between different software tools. I have written earlier about another aspect, the representation of <a href="http://dx.doi.org/10.12688/f1000research.3978.2">scientific models</a> in computational science, which illustrates the extreme case of tools having absorbed scientific content to the point that its users don't even realize that something is missing.<br><br>Back to notebooks. Let's forget about tools for the moment and consider the question of what a notebook actually <i>is</i>, as a digital document. I think that notebooks are trying to be two different things, and that many of the problems we have with them come from this ambiguity. One role of notebooks is the documentation of computational work as a narrative with direct access to the data. This is why people publish notebooks. The other role is as a protocol of interactive explorative work, i.e. the computational scientist's equivalent of a lab notebook. The two roles are not completely unrelated, but they still significatively different.<br><br>To see the difference, look at how experimental scientists worked in the good old days of pencil, paper, and the printing press. As experiments were done, all the relevant information (preparation, results, …) was written down, immediately, with a time stamp, in the lab notebook. Like a bank ledger, a lab notebook is an immutable protocol of what happened. You don't go back and change earlier entries, that would even be considered fraud. You just add information at the end. Of course, the resulting protocol is not a good way to communicate one's findings. Therefore they are distilled and written up in a separate narrative, which surrounds a description of the work and its most important results by a motivating introduction and summarizing conclusions. This is the classic scientific article.<br><br>Today's computational notebooks are trying to be both protocol and narrative, and pretend that there is a fluent transition between them. One unfortunate consequence is that computational protocols disappear as they are edited to become narratives. This could be alleviated by keeping notebooks under version control, but I have yet to see good versioning support in any notebook-type tool. But, fundamentally, today's notebook tools don't encourage keeping a protocol. They encourage frequent changes to the code and the results, keeping only the latest version. As editors for narratives, notebook tools are also far from ideal because they encourage interactive execution of small code snippets, making it easy to lose track of what was actually executed and in what order. In Jupyter, the only way to ensure a coherent narrative is to (1) restart the kernel and (2) re-execute all cells. There is not even a single menu entry for this operation. Actually, I wonder how many Jupyter users are aware that they must restart the kernel before re-executing all the cells if they want to ensure reproducibility.<br><br>With all that said, here is my current idea of what a notebook should look like at the bit level. A notebook data model should have two distinct entries, one for a protocol and one for a narrative. The protocol entry is a sequence of code cells and results, as they were executed since the start of the computation (for Jupyter, that means the last kernel restart). The narrative is a user-edited sequence of code cells, documentation cells, and results. The actual cell contents could well be shared between the two views: store each cell with a unique ID, and make the protocol and the narrative simple lists of IDs. The representation of code and documentation cells in such a data model is straightforward, though there's a huge potential for bikeshedding in defining the details. The representation of results is much more difficult if you want to support more than plain text output. In the long run, it will be inevitable to define clear data models for every type of display widget, which is a lot of work.<br><br>From the tool point of view, the current Jupyter interface could be complemented by a non-editable protocol view. I'd also like to see a single command (menu/keyboard) for the "clean slate" operation: save the current state as a snapshot (or commit it directly to version control), restart the kernel, and re-initialize the protocol to an empty list. But what really matters to me is the data model. Contrary to the current one implemented in Jupyter, the one outlined above could be integrated into workflow management and archivation tools, such as my own <a href="http://www.activepapers.org/">ActivePapers</a>. We'd probably see an Emacs mode for working with it as well. Plus pretty-printing tools, analysis tools, etc. We'd see an ecosystem of tools working with notebooks. A Dream of Openness.
 ]]></description> </item><item> <title>The future of the Scientific Python ecosystem</title> <link>https://blog.khinsen.net/posts/2015/07/16/The-future-of-the-Scientific-Python-ecosystem.html</link> <pubDate>2015-07-16</pubDate> <author>Konrad Hinsen</author> <guid isPermaLink="true">https://blog.khinsen.net/posts/2015/07/16/The-future-of-the-Scientific-Python-ecosystem.html</guid> <category><![CDATA[ computational science ]]></category><category><![CDATA[ programming ]]></category><category><![CDATA[ python ]]></category> <description><![CDATA[ <p><br>SciPy 2015 is over, meaning that many non-participants like myself are now busy catching up with what happened by watching <a href="https://www.youtube.com/playlist?list=PLYx7XA2nY5Gcpabmu61kKcToLz0FapmHu">the videos</a>. Today's dose for me was Jake VanderPlas' <a href="https://www.youtube.com/watch?v=5GlNDD7qbP4&amp;list=PLYx7XA2nY5Gcpabmu61kKcToLz0FapmHu&amp;index=3">keynote</a> entitled "State of the Tools". It's about the history, current state, and potential future of what is now generally known as the Scientific Python ecosystem: the large number of libraries and tools written in or for Python that scientists from many disciplines use to get their day-to-day computational work done.<br></p><br><br><p><br>History is done, the present status is a fact, but the future is open to both speculation and planning, so that's what I find most interesting in Jake's keynote. What struck me is that everything he discussed was about paying back technical debt: refactoring the core libraries, fixing compatibility problems, removing technical obstacles to installation and use of various tools. In fact, 20 years after Python showed up in scientific computing, the ecoystem is in a state that is typical for software projects of that age: a bit of a mess. The future work outlined by Jake would help to make it less of a mess, and I hope that something like this will actually happen. The big question mark for me is how this can be funded, given that it is "only" maintenance work, producing nothing fundamentally new. Fortunately there are people much better than me at thinking about funding, for example everyone involved in the <a href="http://numfocus.org/">NumFOCUS</a> foundation.<br></p><br><br><p><br>Jake's approach to outlining the future is basically "how can we fix known problems and introduce some obvious improvements" (but please do watch the video to get the full story!). What I'd like to present here is an alternate approach: imagine an ideal scientific computing environment in 2015, and try to approximate it by an evolution of the current SciPy ecosystem while retaining a sane level of backwards compatibility. Think of it as the equivalent of Python 3 at the level of the core of the scientific ecosystem.<br></p><br><br><p><br>One aspect that has changed quite a bit over 20 years is the interaction between Python and low-level code. Back then, Python had an excellent C interface, which also worked well for Fortran 77 code, and the ease of wrapping C and Fortran libraries was one of the major reasons for Python's success in scientific computing. We have seen a few generations of wrapper code generators, starting with <a href="http://www.swig.org/">SWIG</a>, and the idea of a hybrid language called <a href="http://www.cosc.canterbury.ac.nz/greg.ewing/python/Pyrex/">Pyrex</a> that was the ancestor of today's <a href="http://cython.org/">Cython</a>. LLVM has been a major game changer, because it permits low-level code to be generated and compiled on-the-fly, without explicitly generating wrappers and compiling code. While wrapping C/C++/Fortran libraries still remains important, the equally important task of writing low-level code for performance can be handled much better with such tools. <a href="http://numba.pydata.org/">Numba</a> is perhaps the best-known LLVM-based code generator in the Python world, providing JIT compilation for a language that is very similar to a subset of Python. But Numba is also an example of the mindset that has led to the current mess: take the existing ecosystem as given, and add a piece to it that solves a specific problem.<br></p><br><br><p><br>So how would one approach the high-/low-level interface today, having gained experience with LLVM and PyPy? Some claim that the distinction doesn't make sense any more. The authors of the <a href="http://julialang.org/">Julia</a> language, for example, <a href="https://github.com/stevengj/julia-mit">claim</a> that it "avoids the two-language problem". However, as I have <a href="https://khinsen.wordpress.com/2015/06/18/another-look-at-julia/">pointed out on this blog</a>, Julia is fundamentally a performance-oriented low-level language, in spite of having two features, interactivity and automatic memory management, that are traditionally associated with high-level languages. By the way, I don't believe the idea of a both-high-and-low-level language is worth pursuing for scientific computing. The closest realization of that idea is Common Lisp, which is as high-level as Python, perhaps more so, and also as low-level as Julia, but at the cost of being a very complex language with a very steep learning curve, especially for mastering the low-level aspects. Having two clearly distinct language levels makes it possible to keep both of them manageable, and the separation line serves as a clear warning sign to scientists, who should not attempt to cross it without first acquiring some serious knowledge about software development.<br></p><br><br><p><br>The model to follow, in my opinion, is the one of <a href="http://lush.sourceforge.net/">Lush</a> and <a href="http://terralang.org/">Terra</a>. They embed a low-level language into a high-level language in such a way that the low-level code is a data structure at the high level. You can use literals for this data structure and get the equivalent of Numba. But you can also write code generators that specialize low-level code for a given problem. Specialization allows both optimization and simplification, both of which are desirable. The low-level language would have arrays as a primitive data structure, and both NumPy and Pandas, or evolutions such as <a href="https://xray.readthedocs.org/">xray</a>, would become shallow Python APIs to such low-level array functionality. I think this is much more powerful than today's Numba building on NumPy. Moreover, wrapper generators become simple plain Python code, making the construction of interfaces to complex libraries (think of <a href="http://www.h5py.org/">h5py</a>) much easier than it is today. Think of it as <a href="https://docs.python.org/3.5/library/ctypes.html">ctypes</a> on steroids. For more examples of what one could do with such a system, look at <a href="http://docs.julialang.org/en/release-0.3/manual/metaprogramming/">metaprogramming in Julia</a>, which is exactly the same idea.<br></p><br><br><p><br>Another aspect that Jake talks about in some detail is visualization. There again, two decades of code written by people scratching their own itches has led to a mess of different libraries with a lot of overlap and no clear distinctive features. For cleaning it up, I propose the same approach: what are the needs and the available technologies for scientific visualization in 2015? We clearly want to profit from all the Web-based technologies, both for portability (think of mobile platforms) and for integration with <a href="http://jupyter.org/">Jupyter</a> notebooks. But we also need to be able to integrate visualization into GUI applications. From the API point of view, we need something simple for simple plots (<a href="https://toyplot.readthedocs.org/">Toyplot</a> looks promising), but also more sophisticad APIs for high-volume data visualization. The main barrier to overcome, in my opinion, is the current dominance of Matplotlib, which isn't particularly good in any of the categories I have outlined. Personally, I don't believe that any evolution of Matplotlib can lead to something pleasant to use, but I'd of course be happy to be proven wrong.<br></p><br><br><p><br>Perhaps the nastiest problem that Jake addresses is packaging. He seems to believe that <a href="http://www.continuum.io/blog/conda">conda</a> is the solution, but I don't quite agree with that. Unless I missed some recent evolutions, a Python package prepared for installation through conda can only be used easily with a Python distribution built on conda as well. And that means Anaconda, because it's the only one. Since Anaconda is not Open Source, there is no way one can build a Python installation from scratch using conda. Of course, Anaconda is perfectly fine for many users. But if you need something that Anaconda does not provide, you may not be able to add it yourself. On the Mac, for example, I cannot compile C extensions compatible with Anaconda, because Mac Anaconda is built for compatibility with ancient OSX versions that are not supported by a standard XCode installation. Presumably that can be fixed, but I suspect that would be a major headache. And then, how about platforms unsupported by Anaconda?<br></p><br><br><p><br>Unfortunately I will have to leave this at the rant level, because I have no better proposition to make. Packaging has always been a mess, and will likely remain a mess, because the underlying platforms on which Python builds are already a mess. Unfortunately, it's becoming more and more of a problem as scientific Python packages grow in size and features. It's gotten to the point where I am not motivated to figure out how to install <a href="http://www.ill.eu/fr/instruments-support/computing-for-science/cs-software/all-software/nmoldyn/">the latest version</a> of <a href="http://dirac.cnrs-orleans.fr/plone/software/nmoldyn/nmoldyn-2/">nMOLDYN</a> on my Mac, although I am a co-author of that program. The previous version is good enough for my own needs, and much simpler to install though already a bit tricky. That's how you get to love the command line&#x2026; in 2015.<br></p><br>
 ]]></description> </item><item> <title>Another look at Julia</title> <link>https://blog.khinsen.net/posts/2015/06/18/Another-look-at-Julia.html</link> <pubDate>2015-06-18</pubDate> <author>Konrad Hinsen</author> <guid isPermaLink="true">https://blog.khinsen.net/posts/2015/06/18/Another-look-at-Julia.html</guid> <category><![CDATA[ computational science ]]></category><category><![CDATA[ scientific computing ]]></category> <description><![CDATA[ <p><br>Three years ago, I first looked at the then-very-new language <a href="http://julialang.org/">Julia</a>. <a href="http://khinsen.wordpress.com/2012/04/04/julia-a-new-language-for-scientific-computing/">Back then</a>, I concluded that there were many interesting features, but also regretted too much bad Matlab influence in the array handling.<br></p><br><br><p><br>A hands-on <a href="http://calcul.math.cnrs.fr/spip.php?article259">Julia tutorial</a> in my neighborhood was a good occasion to take another look at this language, which has evolved quite a bit since 2012, and continues to evolve rapidly. The tutorial taught by <a href="http://sistemas.fciencias.unam.mx/~dsanders/">David Sanders</a> was an excellent introduction, and his <a href="https://github.com/dpsanders/hands_on_julia">notebooks</a> should even be good for self-teaching. If you already have some experience in computational science, and are interested in trying Julia out on small practical applications, have a look at them.<br></p><br><br><p><br>The good news is that Julia has much improved over the years, not only by being more complete (in particular in terms of libraries), but also through changes in the language itself. More changes are about to happen with version~0.4 which is currently under development. The changes being discussed include the <a href="http://github.com/JuliaLang/julia/issues/7941">array behavior</a> that I criticized three years ago. It's good to see references to APL in this discussion. I still believe that when it comes to arrays, APL and its successors are an excellent reference. It's also good to see that the Julia developers take the time to improve their language, rather than rushing towards a 1.0 release. <br></p><br><br><p><br>Due to David's tutorial, this time my contact with Julia was much more practical, working on realistic problems. This was a good occasion to appreciate many nice features of the language. Julia has taken many good features from both Lisp and APL, and combined them seamlessly into a language that, in spite of some warts, is overall a pleasure to use. A major aspect of Julia's Lisp heritage is the built-in metaprogramming support. Metaprogramming has always been difficult to grasp, which was clear as well during the tutorial. It isn't obvious at all what kind of problem it helps to solve. But everyone who has used a language with good metaprogramming support doesn't want to go back.<br></p><br><br><p><br>A distinctive feature of Julia is that it occupies a corner of the programming language universe that was almost empty until now. In scientific computing, we have traditionally had two major categories of languages. "Low-level" languages such as Fortran, C, and C++, are close to the machine level: data types reflect those directly handled by today's processors, memory management is explicit and thus left to the programmer. "High-level" languages such as Python or Mathematica present a more abstract view of computing in which resources are managed automatically and the data types and their operations are as close as possible to the mathematical concepts of arithmetic. High-level languages are typically interpreted or JIT-compiled, whereas low-level languages require an explicit compilation step, but this is not so much a feature of the language as of their age and implementation.<br></p><br><br><p><br>Julia is resolutely modern in opting for modern code transformation techniques, in particular under-the-hood JIT compilation, making it both fully compiled and fully interactive. In terms of the more fundamental differences between "low-level" and "high-level", Julia chooses an unconventional approach: automatic memory management, but data types at the machine level.<br></p><br><br><p><br>As an illustration, consider integer handling. Julia's default integers are the same as C's: optimal machine-size signed integers with no overflow checks on arithmetic. The result of <code>10^50</code> is <code>-5376172055173529600</code>, for example. This is the best choice for performance, but it should be clear that it can easily create bugs. Traditional high-level languages use unlimited integers by default, eventually offering machine-size integers as a optimization option for experienced programmers. Julia does have a <code>BigInt</code> type, but using it requires a careful insertion of <code>big(...)</code> in many places. It's there if you absolutely need it, but you are expected to use machine-sized integers most of the time.<br></p><br><br><p><br>As a consequence, Julia is a power tool for experienced scientific programmers who are aware of the traps and the techniques to avoid falling into them. Julia is not a language suitable for beginners or occasional users of scientific programming, because such inexperienced scientists need more of a safety net than Julia provides. Neither is Julia a prototyping language for trying out new ideas, because when concentrating on the science you also need a safety net that protects you from the traps of machine-level abstractions. In Julia, you have to design your own safety net, and you also have to verify that it is strong enough for your needs.<br></p><br><br><p><br>Perhaps the biggest problem with Julia is that this is not obvious at first glance. Julia comes with all the nice interactive tools for rapid development and interactive data analysis, in particular the <a href="https://github.com/JuliaLang/IJulia.jl">IJulia</a> notebook which is basically the same as the now-famous IPython/Jupyter notebook. At a first glance, Julia <b>looks</b> like a traditional high-level language. A strong point of David's Julia tutorial is that it points out right from the start that Julia is different. Whenever a choice must be made between run-time efficiency and simplicity, clarity, or correctness, Julia always chooses efficiency. The least important consequence is surprising error messages that make sense only with a basic understanding of how the compiler works. The worst consequence is that inexperienced users are easily induced to write unsafe code. There are nice testing tools, in particular <a href="https://github.com/JuliaLang/FactCheck.jl">FactCheck</a> which looks very nice, but scientists are notoriously unaware of the need of testing.<br></p><br><br><p><br>The worst design decision I see in Julia is the explicit platform dependence of the language: the default integer size is either 32 or 64 bits, depending on the underlying platform. This default size is used in particular for integer constants. As a consequence, a Julia program does in general not have a single well-defined result, but two distinct results. This means that programs must be tested on two different architectures, which is hard to do even for experienced programmers. Given the ongoing very visible debate about the (non-)reproducibility of computational research, I cannot understand how anyone can make such a decision today. Of course I do understand the performance advantage that results from this choice, but this clearly goes to far for my taste. If I ever use Julia for my research, I'll start each source code file with <code>@assert WORD_SIZE==64</code> just to make sure that everyone knows what kind of machine I tested my code on.<br></p><br><br><p><br>As for the surprising but not dangerous features that can probably only be explained by convenience for the compiler, there is first of all the impossibility to redefine a data type without clearing the workspace first - and that means losing your whole session. It's a bit of a pain for interactive development, in particular in <a href="https://github.com/JuliaLang/IJulia.jl">IJulia</a> notebooks. Another oddity is the <code>const</code> declaration, which makes a variable to which you can assign new values as often as you like, as long as the type remains the same. It's more a typed variable declaration than the constant suggested by the name.<br></p><br><br><p><br>Finally, there is another point where I think the design for speed has gone too far. The choice of machine-size integers turns into something completely useless (in my opinion) when it comes to rational arithmetic. Julia lets you create fractions by writing <code>3//2</code> etc., but the result is a fraction whose nominator and denominator are machine-size integers. Rational arithmetic has the well-known performance and memory problem of denominators growing with each additional operation. With machine-size integers, rational arithmetic <a href="https://github.com/JuliaLang/julia/issues/11736">rapidly crashes or returns wrong results</a>. Given that the primary application of rationals is unlimited precision arithmetic, I don't see a practical use for anything but <code>Rational{BigInt}</code>.<br></p><br><br><p><br>In the end, Julia leaves me with a feeling of a lost opportunity. My ideal software development environment for computational science would support the whole life cycle of computational methods, starting from prototyping and ending with platform-specific optimizations. As code is progressively optimized based on profiling information, each version would be used as a reference to test the next optimization level. In terms of fundamental language design, Julia seems to have everything required for such an approach. However, the default choice of fast-and-unsafe operations almost forces programmers into premature optimization. Like in the traditional high-/low-level language world, computational science will require two distinct languages, a safe and a fast one.<br></p><br>
 ]]></description> </item><item> <title>The compartmentalization of knowledge</title> <link>https://blog.khinsen.net/posts/2015/06/05/The-compartmentalization-of-knowledge.html</link> <pubDate>2015-06-05</pubDate> <author>Konrad Hinsen</author> <guid isPermaLink="true">https://blog.khinsen.net/posts/2015/06/05/The-compartmentalization-of-knowledge.html</guid> <category><![CDATA[ science ]]></category> <description><![CDATA[ <p><br>Now that the birch pollen season is definitely over, I can draw some conclusions from a two-year experiment with the impressive sample size of one - myself. As you will see, my topic is not so much the experiment itself, but the circumstances in which it happened.<br></p><br><br><p><br>I have been allergic to birch pollen for more than thirty years. My allergy is strong enough to make normal life impossible when the birch pollen concentration is high, which happens for about three to four weeks every year. For those who have no experience with allergies, consider how sneezing five times in five minutes a few times per hour would impact your daily activities. Like most victims of pollen allergy, I consulted medical doctors in search for relief. In the course of thirty years spent in various places, even different countries, I have seen many of them, from three categories: general practitioner, otorhinolaryngologists, and allergologists. All these doctors agreed that the only reasonable treatment is <a href="http://en.wikipedia.org/wiki/Histamine_antagonist">antiihistamines</a>, arguing that the only other option, immunosuppressive treatments such as cortisone, has side effects that are too severe compared to the benefit obtained.<br></p><br><br><p><br>Unfortunately, antihistamines also have a frequent side effect: drowsiness. Its degree varies between people and across different antihistamines. But in spite of undeniable progress over the years, I have yet to try an antihistamine that I could live with comfortably. I was always faced with the choice of the lesser evil: sneezing or drowsiness. I usually tried to take antihistamines as little as possible, based of birch pollen concentration forecasts, but I found that strategy hard to apply in practice.<br></p><br><br><p><br>So far for the motivation for my recent experiment. Last year I discovered, somewhat by accident, a <a href="http://www.herboristeriedeparis.fr/">herbalist</a> in Paris offering <a href="http://www.harmonisanatura.com/art-allergies-pc-252.htm">a mixture of eight plant extracts</a> for treating allergy symptoms. I asked if they considered their product sufficient as the sole treatment for a rather severe case of birch pollen allergy. They said it's worth a try, though they didn't want to make a clear promise. I tried, and it worked. Perfectly. No sneezing, no side effects. Spring 2014 was the first one I fully enjoyed since ages ago. Spring 2015 was the second. I haven't taken any antihistamines since then, nor any other allergy treatment recognized by official medecine. Of course, my new treatments has its drawbacks as well. First, it's rather expensive, about 40€ for one birch pollen season. Second, you can't take a single daily dose, you have to distribute it over the day. I followed the recommendation to dilute the daily dose in a bottle of water, which I carried with me and drank over the day.<br></p><br><br><p><br>My sample-size-one study doesn't of course permit any conclusions about the efficiency of this treatment for allergies in general, but that's not my point anyway. What I find remarkable about this story is that a small herbalist shop in Paris offers something that according to all medical doctors I ever consulted doesn't exist. Herbal remedies have been used by people all over the world for all of known history. All the eight plants in my new treatment (<a href="https://en.wikipedia.org/wiki/Plantago_lanceolata">Plantago lanceolata</a>, <a href="https://en.wikipedia.org/wiki/Artichoke">artichoke</a>, <a href="https://en.wikipedia.org/wiki/Arctium">arctium</a>, <a href="https://en.wikipedia.org/wiki/Boldo">boldo</a>, <a href="https://en.wikipedia.org/wiki/Desmodium">desmodium</a>, <a href="https://en.wikipedia.org/wiki/Taraxacum">dandelion</a>, <a href="https://en.wikipedia.org/wiki/Taraxacum">horsetail</a>, <a href="https://en.wikipedia.org/wiki/Thymus_%2528plant%2529">thyme</a>) have been used by herbalists for centuries. Combining them into an efficient treatment certainly requires some solid knowledge about medical plants, but probably not a stroke of genius. How is it possible then that not even specialized allergologists are aware of such treatments? Even if it works only for 10% of pollen victims (a number I just made up), it's worth knowing about.<br></p><br><br><p><br>This compartmentalization of knowledge between traditional herbalists and 21st century medical doctors, which I suspect to be due to pure snobism, is also a lost opportunity for medical research. According to the description of my plant mixture on the Web site, its mode of action is completely different from that of antihistamines. Studying these mechanisms might well lead to new insight into the causes of pollen allergies and their treatments. <br></p><br>
 ]]></description> </item><item> <title>Software in scientific research</title> <link>https://blog.khinsen.net/posts/2015/04/23/Software-in-scientific-research.html</link> <pubDate>2015-04-23</pubDate> <author>Konrad Hinsen</author> <guid isPermaLink="true">https://blog.khinsen.net/posts/2015/04/23/Software-in-scientific-research.html</guid> <category><![CDATA[ computational science ]]></category> <description><![CDATA[ <p><br>In a <a href="http://ivory.idyll.org/blog/2015-software-as-a-primary-product-of-science.html">recent blog post</a>, Titus Brown asks if software is a primary product of science, and basically says "no" (but do read the post for the details). A <a href="https://danielskatzblog.wordpress.com/2015/04/22/software-can-be-a-primary-product-of-scientific-research/">blog-post length reply</a> by Daniel Katz comes to the opposite conclusion (again, please read the post before continuing here). I left a short comment on Titus' blog but also felt compelled to expand this into a blog post of its own - so here it is.<br></p><br><br><p><br>Titus introduces a useful criterion for what "primary product of science" is: could you get a Nobel prize for it? As Dan comments, Nobel prizes in science are awarded for discoveries and inventions. There we no computers when Alfred Nobel set up his foundation, so we have to extrapolate this definition a bit to today's situation. Is software like a discovery? Clearly not. Like an invention? Perhaps, but it doesn't fit very well. Dan makes a comparison with scientific writing, i.e. papers, textbooks, etc. Scientific writing is the traditional way to communicate discoveries and inventions. But what scientists get Nobel prizes for is not the papers, but the work described therein. Papers are not primary products of science either, they are just a means of communication. There is a fairly good analogy between papers and their contents on one hand, and software and algorithms on the other hand. And algorithms are very well comparable to discoveries and inventions. Moreover, many of today's scientific models are in fact expressed as algorithms. My conclusion is that algorithms clearly count as a primary product of science, but software doesn't. Software is a means of communication, just like papers or textbooks.<br></p><br><br><p><br>The analogy isn't perfect, however. The big difference between a paper and a piece of software is that you can feed the latter into a computer to make it <b>do</b> something. Software is thus a scientific tool a well as a means of communication. In fact, today's computational science gives more importance to the tool aspect than to the communication aspect. The main questions asked about scientific software are "What does it do?" and "How efficient is it?" When considering software as a means of communication, we would ask questions such as "Is it well-written, clear, elegant?", "How general is the formulation?", or "Can I use it as the basis for developing new science?". These questions are beginning to be heard, in the context of the scientific software crisis and the need for reproducible research. But they are still second thoughts. We actually accept as normal that the scientific contents of software, i.e. the models implemented by it, are understandable only to software specialists, meaning that for the majority of users, the software is just a black box. Could you imagine this for a paper? "This paper is very obscure, but the people who wrote it are very smart, so let's trust them and base our research on their conclusions." Did you ever hear such a claim? Not me.<br></p><br><br><p><br>Scientists haven't yet fully grasped the particular status of software as both an information carrier and a tool. That may be one of the few characteristics they share with lawyers. The latter make a difference between "data" (including written text), which is covered by copyright, and "software", which is covered by both copyright and licenses, and in some countries also by patents. Superficially, this makes sense, as it reflects the dual nature of software. It suffers, however, from two problems. First of all, the distinction exists only in the intention of the author, which is hard to pin down. Software is just data that can be interpreted as instructions for a computer. One could conceivably write some interpreter that turns previously generated data into software by executing it. Second, and that's a problem for science, the licensing aspect of software is much more restrictive than the copyright aspect. If you describe an algorithm informally in a paper, you have to deal only with copyright. If you communicate it in executable form, you have to worry about licensing and patents as well, even if your main intention is more precise communication.<br></p><br><br><p><br>I have written <a href="http://dx.doi.org/10.12688/f1000research.3978.2">a detailed article</a> about the problems resulting from the badly understood dual nature of scientific software, which I won't repeat here. I have also <a href="http://beta.briefideas.org/ideas/27deb8bb6d86c83d29f53320f02a22a5">proposed</a> a solution, the development of formal languages for expressing complex scientific models, and I am <a href="https://github.com/khinsen/term-algebra">experimenting</a> with a concrete approach to get there. I mention this here mainly to motivate my conclusion:<br></p><br><br><ul class="org-ul"><br><li>Q: Is software a primary product of science?<br></li><br><li>A: No. But neither is a paper or a textbook.<br></li><br></ul><br><br><ul class="org-ul"><br><li>Q: Is software a means of communication for primary products of science?<br></li><br><li>A: Yes, but it's a bad one. We need something better.<br></li><br></ul><br>
 ]]></description> </item><item> <title>Why bitwise reproducibility matters</title> <link>https://blog.khinsen.net/posts/2015/01/07/Why-bitwise-reproducibility-matters.html</link> <pubDate>2015-01-07</pubDate> <author>Konrad Hinsen</author> <guid isPermaLink="true">https://blog.khinsen.net/posts/2015/01/07/Why-bitwise-reproducibility-matters.html</guid> <category><![CDATA[ computational science ]]></category><category><![CDATA[ reproducible research ]]></category><category><![CDATA[ scientific computing ]]></category> <description><![CDATA[ <div id="content"><br><br><p><br>While reading the <a href="https://www.xsede.org/documents/659353/d90df1cb-62b5-47c7-9936-2de11113a40f">final report</a> of the <a href="https://www.xsede.org/web/reproducibility">reproducibility workshop</a> at XSEDE14, I noticed a statement that I encounter frequently in discussions about reproducible research:<br></p><br><br><blockquote>"One general consensus was that bitwise reproducibility is often an unrealistic expectation"</blockquote><br><br><p><br>In the interest of clarity, let me start by pointing out that within the systematic terminology that I am trying to adopt (see <a href="http://khinsen.wordpress.com/2014/08/27/reproducibility-replicability-and-the-two-layers-of-computational-science/">this post</a> for an explanation), I will write "bitwise replicability" from now on, as the problem falls into the technical domain (getting the same result from running the same program on the same data) rather than into the scientific one (verifying a result with similar but not identical methods and tools).<br></p><br><br><p><br>The particularity of bitwise replicability is that is almost always brushed aside as "unrealistic", which prevents any discussion about its possible importance in computational science. The main point of this post is to explain why I consider bitwise replicability important, but first of all I need to get the label "unrealistic" out of the way.<br></p><br><br><p><br>"Unrealistic" means more or less "possible in principle but impossible given various real-life contraints", and therefore the term should always be qualified by listing the constraints that make something impossible. In the context of bitwise replicability,  which always refers to floating-point computations, the main constraint is that floating-point arithmetic is incompletely specified in most of today's programming languages, and that whatever specification there is is incompletely implemented in many of today's compilers. This is a valid reason for proclaiming bitwise replicability unrealistic for a short-term research project, but it is not an insurmountable barrier on a longer time scale. All we need are tighter specifications and implementations that respect them. That's a lot of work, but not a technical challenge. We know how to do it, but we are not (yet) willing to invest the effort to make it happen.<br></p><br><br><br><p><br>The main reason why I consider bitwise replicability important is software testing. No matter what precise approach is used for testing, it always involves comparing results of computations, either to a known good result, or to the result of another, presumably more reliable, computation. For any application of computing other than number crunching, comparing results means testing for equality, at the bit level. The results are equal or they aren't. If they aren't, there's a reason. You have to figure out what that reason is, and fix the problem.<br></p><br><br><p><br>If you accept the idea that floating-point operations are only approximate, the notion of a computation having one and only one result disappears, and testing becomes impossible. If two computations lead to similar but slightly different results, how do you decide if this is due to a bug or to some "inevitable" fuzziness of floating-point arithmetic? The answer is that you can't. If you accept that bitwise replicability is not possible, you also accept that rigorous software testing is not possible. For some illustrations of this problem, and some interesting discussion around them, see <a href="http://software-carpentry.org/blog/2014/10/why-we-dont-teach-testing.html">this post</a> on the Software Carpentry blog.<br></p><br><br><p><br>The most common counterargument is that numerical methods are only approximate, that floating-point arithmetic is approximate as well, and that the main source of error comes from these two sources. That may or may not be true in any specific situation, as it really depends on what you are computing. But my point is that this statement can only be true if you assume that the implementation of your method contains no mistakes. The amount of error introduced by a bug in the code is completely unbounded. And even if it's small for some particular test run, it can be very large elsewhere. There is not much point in worrying about the error in an approximate numerical method unless you have some confidence in your code actually implementing this method correctly.<br></p><br><br><p><br>In fact, the common counterargument discussed above conflates several sources of error, which can and should be discussed and analyzed separately. A typical numerical computation is the result of several steps, starting from a mathematical model that takes the form of algebraic or differential equations:<br></p><br><br><ol class="org-ol"><br><li>Construct a computable approximation<sup><a id="fnr.1" name="fnr.1" class="footref" href="#fn.1">1</a></sup> to the original equations, using techniques such as discretization of continuous quantities.<br></li><br><br><li>Replace real-numbers by floating-point numbers.<br></li><br><br><li>Implement the floating-point version in software.<br></li><br></ol><br><br><p><br>The errors introduced in the first step are the subject of numerical analysis, a well-established domain of applied mathematics. They are well understood for most commonly employed numerical methods. The errors introduced in the second step are rarely discussed explicitly, outside of a small circle of researchers interested in the peculiarities of floating-point arithmetic. The third step should not introduce any errors, and that should be verified by testing. But uncoupling steps 2 and 3 is possible only if our software tools guarantee bitwise replicability.<br></p><br><br><p><br>So why don't today's tools permit this? The reason is a mixture of widespread ignorance about floating-point arithmetic and the desire to get maximum performance. Both come into play in step 2, which is approximating discrete equations for real numbers by discrete equations for floating-point numbers. Most scientific programmers are unaware that this is an approximation that they should understand and control. They just type their real-number equation into a program and expect the computer to handle it somehow. Compiler writers and language specification authors take advantage of this ignorance and declare this step their business, profiting from the many optimization possibilities it offers.<br></p><br><br><p><br>The optimization opportunities come from the fact that a typical real-number equation has a large number of a priori equally plausible floating-point number approximations. Many of the identities for real numbers do not apply to floating-point numbers, for example associativity of addition and multiplication. Where the real-number equation says <i>a</i>+<i>b</i>+<i>c</i>, there are  three floating-point approximations: <code>(a+b)+c</code>, <code>a+(b+c)</code>, and <code>(a+c)+b</code>. For more complex equations, the number of variants quickly becomes important. The results of these variants are not the same, but which one to choose? The choice <i>should</i> be made after a careful analysis of the relative precision and performance of each variant. There <i>should</i> be tool support to help with this. But what happens in practice, most of the time, is that the choice is made by the compiler, which goes exclusively for performance. Since every compiler optimizes differently, the same program source code yields different results on different platforms. And that's why we don't have bitwise replicability.<br></p><br><br><p><br>To prevent any misunderstanding: I am <i>not</i> saying that production-level compiled code needs to ensure bitwise reproducibility across machines. It's OK to have compiler optimization options that introduce platform-specific approximations. But it should be possible to reproduce one unique result identically on all platforms. This result is then the reference against which additional "lossy" optimizations can be tested.<br></p><br><br><div id="footnotes"><br><h3 class="footnotes">Footnotes: </h2><br><div id="text-footnotes"><br><br><div class="footdef"><a id="fn.1" name="fn.1" class="footnum" href="#fnr.1">1</a> I am using the term "computable approximation" somewhat vaguely here. While the original continuous-variable equations are almost always non-computable, and the numerical approximations are mostly computable, there are exceptions on both sides. The main focus of numerical analysis is not computability in the strict sense of computability theory, but "practical" computability that has the subsequent transformation to floating-point operations in mind.<br></div><br><br></div><br></div><br></div><br>
 ]]></description> </item><item> <title>Drawing conclusions from empirical science</title> <link>https://blog.khinsen.net/posts/2014/12/29/Drawing-conclusions-from-empirical-science.html</link> <pubDate>2014-12-29</pubDate> <author>Konrad Hinsen</author> <guid isPermaLink="true">https://blog.khinsen.net/posts/2014/12/29/Drawing-conclusions-from-empirical-science.html</guid> <category><![CDATA[ science ]]></category> <description><![CDATA[ <p><br>A <a href="http://dx.doi.org/10.1371/journal.pone.0115069">recent paper in PLOS One</a> made some noise in my twittersphere over the Christmas days. It compares the productivity of writing scientific documents using Microsoft Word and using LaTeX, and concludes that Microsoft Word is so clearly superior that, in the interest of saving taxpayers' money, scientific publishers should abandon LaTeX to allow authors to become more productive.<br></p><br><br><p><br>The noise in my twittersphere is about the technical shortcomings of the study, whose findings are in clear contradiction to the personal experience of everyone who has used both LaTeX and Microsoft Word in preparing real-life scientific articles for publication. This is well discussed in the <a href="http://www.plosone.org/article/comments/info%253Adoi%252F10.1371%252Fjournal.pone.0115069">comments</a> on the paper. In short, the situations explored in the study are limited to the reproduction of a given piece of text with some typical "scientific" elements such as tables or formulas, but without the complexity of real-life documents: references, citations, revisions, collaborative editing, etc.<br></p><br><br><p><br>The topic of this post is a more fundamental problem illustrated by the study cited above, and which is shared by a large number of scientific explorations of much more important subjects, in particular concerning health and medicine. It is the problem of drawing practical conclusions from the results of a scientific study, such as the conclusion cited above that abandoning LaTeX would lead to significant savings in the field of scientific publishing. In the following, I will concentrate on this issue and leave aside everything else: let's assume for a few minutes that published scientific studies are 100% reliable and described clearly enough that no misunderstandings or erroneous interpretations ever occur.<br></p><br><br><p><br>The feature that the Word vs. LaTeX study shares with much of modern research is that it is purely empirical. It starts from the question if science writers are more productive using Word or using LaTeX, taking into account a few obvious parameters such as prior experience with one or the other system. To answer that question, a specific experiment is designed, performed, and analyzed. Importantly, there is no underlying model that is used to interpret the results, which is what makes the model purely empirical.<br></p><br><br><p><br>Empirical studies are characteristic of relatively young domains of scientific exploration. It's what every new field starts out with: the search for systematic relations between observable facts and quantities. As our understanding of some aspect of nature improves, we move on to the next level of scientific inquiry: the construction of models. A model makes assumptions about the mechanisms underlying the observed behavior, and allows the prediction of results that some not-yet-performed experiment <i>should</i> produce. The introduction of models is an enormous boost to the power and efficiency of scientific research. First of all, predictions can be tested, and therefore the models can be tested. Of course, an isolated hypothesis ("Word makes scientists more productive than LaTeX") can also be tested, but a model produces a whole family of related hypotheses that can be tested as a whole. In particular, one can search for corner cases that may be untypical from a real-world point of view, but provide a particularly precise way to test a model. Second, a model allows scientists to develop an intuitive understanding of the phenomena they are looking at, which again makes their work more efficient and more reliable. But perhaps most importantly, a model that has been exposed to several rounds of serious testing comes with a list of scenarios in which it works or doesn't work, which is a very important element in generating trust in its predictions.<br></p><br><br><p><br>As an example of a successful model, consider Newtonian mechanics as taught in high-school physics classes. It has been around for a few centuries, and its strengths and limitations are well known. Contrary to what people believed initially, it is not universally true. It breaks down for objects moving at extremely high speed, and for objects of atomic size. But it works very well for many practically relevant situations. Thanks to this and other well-tested models, engineers and architects can design engines and buildings that work as expected.<br></p><br><br><p><br>In contrast, purely empirical science provides only provisional answers to the questions asked, because it is impossible to know, or even test, that all relevant aspects of the situation have been taken into account. In the Word vs. LaTeX study, prior knowledge of either system was taken into account as a parameter, but many other factors weren't. It is conceivable, for example, that a person's native language may make them "better tuned" to one or the other system. Or their work experience, or their education. And why not genetic factors or dietary habits - this sounds far-fetched, but it can't be excluded. As long as there is no model explaining where productivity differences come from, it is not even clear what one would have to study in order to improve our understanding of the situation.<br></p><br><br><p><br>This uncertainty stemming from the existence of many unexplored potential factors makes it very risky to draw practical conclusions from purely empirical studies, no matter how well they were designed and executed. And this is a very real problem in many aspects of today's life. Suppose you are determined to adopt the "healthiest" dietary regime possible, and turn to the scientific literature for guidance. You will find a bewildering collection of partially contradicting findings. Does eating eggs expose you to a higher risk of cardiovascular diseases? Do oranges protect you against the flu? You will find studies that claim to provide the answers to such questions, but they are purely empirical and based on a small number of observations. They may even be based on experiments on mice that were extrapolated to humans. And they definitely have not explored all imaginable aspects of the question. What it vitamin C is beneficial to everyone except people with some rare blood group? What if a specific gene variant decides how your body reacts to high sugar intake? Most probably no one has ever looked into these possibilities. Not to mention the much more fundamental question if a "healthiest" diet exists at all. Perhaps the best you can do is choose between a higher risk of a stroke and a higher risk of cancer.<br></p><br><br><p><br>To end with some practical advice: the next time you see some recommendation made on a "scientific basis", check what that basis is. If it's a single recent study, it's safe to assume that the recommendation is premature. But even if it's a larger body of scientific evidence, check if there is a model behind it, and if it has been tested. If it isn't, be prepared to get a contradictory recommendation in a few years.<br></p><br>
 ]]></description> </item><item> <title>The state of NumPy</title> <link>https://blog.khinsen.net/posts/2014/09/12/The-state-of-NumPy.html</link> <pubDate>2014-09-12</pubDate> <author>Konrad Hinsen</author> <guid isPermaLink="true">https://blog.khinsen.net/posts/2014/09/12/The-state-of-NumPy.html</guid> <category><![CDATA[ python ]]></category><category><![CDATA[ scientific computing ]]></category> <description><![CDATA[ <p><br>The <a href="https://github.com/numpy/numpy/blob/master/doc/release/1.9.0-notes.rst">release of NumPy 1.9</a> a few days ago was a bit of a revelation for me.  For the first time in the combined history of NumPy and its predecessor Numeric, a new release broke my own code so severely thatI don't see any obvious way to fix it, given the limited means I can dedicate to software maintenance. And that makes me wonder for which scientific uses today's Python ecosystem can still be recommended, since the lack of means for code maintenance is a chronic and endemic problem in science.<br></p><br><br><p><br>I'll start with a historical review, for which I am particularly well placed as one of the oldtimers in the community: I was a founding member of the <a href="http://mail.python.org/pipermail/matrix-sig/">Matrix-SIG</a>, a small group of scientists who in 1995 set out to use the still young Python language for computational science, starting with the design and implementation of a module called Numeric. Back then Python was a minority language in a field dominated by Fortran. The number of users started to grow seriously from 2000, to the point of now being a well-recognized and respected community that spans all domains of scientific research and holds several<br>conferences per year across the globe. The combination of technological change and the needs of new users has caused regular changes in the code base, which has grown as significantly as the user base: the first releases were small packages written and maintained by a single person (Jim Hugunin, who later became famous for Jython and IronPython), whereas today's NumPy is a complex beast maintained by a team.<br></p><br><br><p><br>My oldest published Python packages, <a href="http://dirac.cnrs-orleans.fr/plone/software/scientificpython/">ScientificPython</a> and <a href="http://dirac.cnrs-orleans.fr/MMTK/">MMTK</a>, go back to 1997 and are still widely used. They underwent a single major code reorganization, from module collections to packages when Python 1.5 introduced the package system. Other than that, most of the changes to the code base were implementations of new features and the inevitable bug fixes. The two main dependencies of my code, NumPy and Python itself, did sometimes introduce incompatible changes (by design or as consequences of bug fixes) that required changes on my own code base, but they were surprisingly minor and never required more than about a day of work.<br></p><br><br><p><br>However, I now realize that I have simply been lucky. While Python and its standard library have indeed been very stable (not counting the transition to Python 3), NumPy has introduced incompatible changes with almost every new version over the last years. None of them ever touched functionalities that I was using, so I barely noticed them when looking at each new version's release notes. That changed with release 1.9, which removes the compatbility layer with the old Numeric package, on which all of my code relies because of its early origins.<br></p><br><br><p><br>Backwards-incompatible changes are of course nothing exceptional in the computing world. User needs change, new ideas permit improvements, but existing APIs often prevent a clean or efficient implementation of new features or fundamental code redesigns. This is particularly true for APIs that are not the result of careful design, but of organic growth, which is the case for almost all scientific software. As a result, there is always a tension between improving a piece of software and keeping it compatible with code that depends on it. Several strategies have emerged to deal with, depending on the priorities of each community. The point I want to make in this post is that NumPy has made a bad choice, for several reasons.<br></p><br><br><p><br>The NumPy attitude can be summarized as "introduce incompatible changes slowly but continuously". Every change goes through several stages. First, the intention of an upcoming changes is announced. Next, deprecation warnings are added in the code, which are printed when code relying on the soon-to-disappear feature is executed. Finally, the change becomes effective. Sometimes changes are made in several steps to ease the transition. A good example from the 1.9 release notes is this:<br></p><br><blockquote><br>In NumPy 1.8, the diagonal and diag functions returned readonly copies, in NumPy 1.9 they return readonly views, and in 1.10 they<br>will return writeable views.<br></blockquote><br><br><p><br>The idea behind this approach to change is that client code that depends on NumPy is expected to be adapted continuously. The early warnings and the slow but regular rythm of change help developers of client code to keep up with NumPy.<br></p><br><br><p><br>The main problem with this attitude is that it works only under the assumption that client code is actively maintained. In scientific computing, that's not a reasonable assumption to make. Anyone who has followed the discussions about the scientific software crisis and  the lack of reproduciblity in computational science should be well aware of this point that is frequently made. Much if not most scientific code is written by individuals or small teams for a specific study and then modified only as much as strictly required. One step up on the maintenance ladder, there is scientific code that is published and maintained by computational scientists as a side activity, without any significant means attributed to software development, usually because the work is not sufficiently valued by funding agencies. This is the category that my own libraries belong to. Of course the most visible software packages are those that are actively maintained by a sufficiently strong community, but I doubt they are representative for computational science as a whole.<br></p><br><br><p><br>A secondary problem with the "slow continuous change" philosophy is that client code becomes hard to read and understand. If you get a Python script, say as a reviewer for a submitted article, and see "import numpy", you don't know which version of numpy the authors had in mind. If that script calls array.diag() and modifies the return value, does it expect to modify a copy or a view? The result is very different, but there is no way to tell. It is possible, even quite probable, that the code would execute fine with both NumPy 1.8 and the upcoming NumPy 1.10, but yield different results.<br></p><br><br><p><br>Given the importance of NumPy in the scientific Python ecosystem  - the majority of scientific libraries and applications depends on it -, I consider its lack of stability alarming. I would much prefer the NumPy developers to adopt the attitude to change taken by the Python language itself: accumulate ideas for incompatible changes, and apply them in a new version that is clearly labelled and announced as incompatible. Everyone in the Python community knows that there are important differences between Python 2 and Python 3. There's a good chance that a scientist publishing a Python script will clearly say if it's for Python 2 or Python 3, but even if not, the answer is often evident from looking at the code, because at least some of the many differences will be visible.<br></p><br><br><p><br>As for my initial question for which scientific uses today's Python ecosystem can still be recommended, I hesitate to provide an answer. Today's scientific Python ecosystem is not stable enough for use in small-scale science, in my opinion, although it remains an excellent choice for big communities that can somehow find the resources to maintain their code. What makes me hesitate to recommend not using Python is that there is no better alternative. The only widely used scientific programming language that can be considered stable, but anyone who has used Python is unlikely to be willing to switch to an environment with tedious edit-compile-run cycles.<br></p><br><br><p><br>One possible solution would be a long-time-support version of the core libraries of the Python ecosystem, maintained without any functional change by a separate development team. But that development team has be created and funded. Any volunteers?<br></p>
 ]]></description> </item><item> <title>Reproducibility, replicability, and the two layers of computational science</title> <link>https://blog.khinsen.net/posts/2014/08/27/Reproducibility-replicability-and-the-two-layers-of-computational-science.html</link> <pubDate>2014-08-27</pubDate> <author>Konrad Hinsen</author> <guid isPermaLink="true">https://blog.khinsen.net/posts/2014/08/27/Reproducibility-replicability-and-the-two-layers-of-computational-science.html</guid> <category><![CDATA[ computational science ]]></category><category><![CDATA[ reproducible research ]]></category><category><![CDATA[ science ]]></category> <description><![CDATA[ <p><br>The importance of reproducibility in computational science is being more and more recognized, which I think is a good sign. However, I also notice a lot of confusion about what reproducibility means exactly, and also confusion about the difference (if any) between reproducibility and replicability. I don't see a consensus yet about the exact meaning of these terms, but I would like to give my own definitions and justify them by putting them into the general context of computational science.<br></p><br><br><p><br>I'll start with the concept of reproducibility as it was used in science long before computers even existed. It refers to the reproducibility of the conclusions of a scientific study. These conclusions can take very different forms depending on the question that was being explored. It can be a simple "yes" or "no", e.g. in answering questions such as "Is the gravitational force acting in this stone the same everywhere on the Earth's surface?" or "Does ligand A bind more strongly to protein X than ligand B?" It can also be a number, as in "What is the lattice energy of NaCl?", or a mathematical function, as in "How does a spring's restoring force vary with elongation?" Any such result should come with an estimation of its precision, such as an error bar on numbers, or a reliability estimate for a yes/no answer. Reproducing a scientific conclusion means finding a "close enough" answer by performing "similar" experiments and analyses. As the terms "close enough" and "similar" show, reproducibility involves human judgement, which may well evolve over time. Reproducibility is thus not an absolute feature of a specific result, but the evaluation of a result in the context of the current state of knowledge and technology in a scientific domain. Every attempt to reproduce a given result independently (different people, tools, methods, &#x2026;) augments scientific knowledge: If the reproduction leads to a "close enough" results, it provides information about the precision with which the results can be obtained, and if if doesn't, it points to some previously unrecognized crucial difference between the two experiments, which can then be explored.<br></p><br><br><p><br>Replication refers to something much more specific: repeating the exact steps in an experiment using the same (or equivalent) equipment, and comparing the outcomes. Replication is part of testing an experimental setup, or a form of quality assurance. If I measure the same quantity ten times using the same equipment and experimental samples, and get ten slightly different values, then I can use these numbers to estimate the precision of my equipment. If that precision is not sufficient for the purposes of my planned scientific study, then the equipment is not suitable.<br></p><br><br><p><br>It is useful to describe the process of doing research by a two-layer model. The fundamental layer is the technology layer: equipment and procedures that are well understood and whose precision is known from many replication attempts. On top of this, there is the research layer: the well-understood equipment is used in order to obtain new scientific information and draw conclusions from them. Any scientific project aims at improving one or the other layer, but not both at the same time. When you want to get new scientific knowledge, you use trusted equipment and procedures. When you want to improve the equipment or the procedures, you do so by doing test measurements on well-known systems. Reproducibility is a concept of the research layer, replicability belongs to the technology layer.<br></p><br><br><p><br>All this carries over identically to computational science, in principle. There is the technology layer, consisting of computers and the software that runs on them, and the research layer, which uses this technology to explore theoretical models or to interpret experimental data. Replicability belongs to the technology level. It increases trust in a computation and thus its components (hardware, software, overall workflow, provenance tracking, &#x2026;). If a computation cannot be replicated, then this points to some kind of problem:<br></p><br><br><ol class="org-ol"><br><li>different input data that was not recorded in the workflow (interactive user input, a random number stream initialized from the current time, &#x2026;)<br></li><br><br><li>a bug in the software (uninitialized variables, compiler bugs, &#x2026;)<br></li><br><br><li>a fault in the hardware (an unreliable memory chip, a design flaw in the processor, &#x2026;)<br></li><br><br><li>an ambiguous specification of the result of the computation<br></li><br></ol><br><br><p><br>Ideally, the non-replicability should be eliminated, but at the very least its cause should be understood. This turns out to be very difficult in practice, in today's computing environments, essentially because case 4 is frequent and hard to avoid (today's popular programming languages are ambiguous), and because case 4 makes it impossible to identify cases 2 and 3 with certainty. I see this as a symptom of the immaturity of today's computing environments, which the computational science community should aim to improve on. The technology for removing case 4 exists. The keyword is <a href="https://en.wikipedia.org/wiki/Formal_methods">"formal methods"</a>, and there are <a href="http://arxiv.org/abs/1212.6641">first attempts</a> to apply them to scientific computing, but this remains an exotic approach for now.<br></p><br><br><p><br>As in experimental science, reproducibility belongs to the research layer and cannot be guaranteed or verified by any technology. In fact, the "reproducible research" movement is really about replicability - which is perhaps one reason for the above-mentioned confusion.<br></p><br><br><p><br>There is at the moment significant disagreement about the importance of replicability. At one end of the spectrum, there is for example Ian Gent's <a href="http://recomputation.org/blog/2013/04/12/the-recomputation-manifesto/">recomputation manifesto</a>, which stresses the importance of replicability (which in the context of computational science he calls recomputability) because building on past work is possible only if it can be replicated as a first step. At the other end, Chris Drummond <a href="http://www.csi.uottawa.ca/~cdrummon/pubs/ICMLws09.pdf">argues</a> that replicability is "not worth having" because it doesn't contribute much to the real goal, which is reprodcucibility. It is worth reading both of these papers, because they both do a very good job at explaining their arguments. There is actually no contradiction between the two lines of arguments, the different conclusions are due to different criteria being applied: Chris Drummond sees replicability as valuable only if it improves reproducibility (which indeed it doesn't), whereas Ian Gent sees value in it for a completely different reason: it makes future research more efficient. Neither one mentions the main point in favor of replicability that I have made above: that replicability is a form of quality assurance and thus increases trust in published results.<br></p><br><br><p><br>It is probably a coincidence that both of the papers cited above use the term "computational experiment", which I think should best be avoided in this context. In the natural sciences, the term "experiment" traditionally refers to constructing a setup to observe nature, which makes experiments the ultimate source of truth in science. Computations do not have this status at all: they are <a href="http://dx.doi.org/10.12688/f1000research.3978.2">applications of theoretical models</a>, which are always imperfect. In fact, there is an interesting duality between the two: experiments are imperfect observations of the ultimate truth, whereas computations are, in the absence of buggy or ambiguous software, perfect observations of the consequences of imperfect models. Using the same term for these two concepts is a source of confusion, as I have <a href="http://khinsen.wordpress.com/2014/01/21/the-roles-of-computer-programs-in-science-2/">pointed out earlier</a>.<br></p><br><br><p><br>This fundamental difference between experiments and computations also means that replicability has a different status in experimental and computational science. When doing imperfect observations of nature, evaluating replicability is one aspect of evaluating the imperfection of the observation. Perfect observation is impossible, both due to technological limitations and for fundamental reasons (any observation modifies what is being observed). On the other hand, when computing the consequences of imperfect models, replicability does not measure the imperfections of the model, but the imperfections of the computation, which <i>can</i> theoretically be eliminated.<br></p><br><br><p><br>The main source of imperfections in computations is the complexity of computer software (considering the whole software stack, from the operating system to the scientific software). At this time, it is not clear if we will ever succeed in taming this complexity. Our current digital computers are chaotic systems, in which even the tiniest change (flipping a bit in memory, or replacing a single character in a program source code file) can change the result of a computation beyond any bounds. Chaotic behavior is clearly an undesirable feature in any scientific equipment (I can't think of any experimental apparatus suffering from it), but for computation we currently have no other choice. This makes quality assurance techniques, including replicability but also more standard software engineering practices such as unit testing, all the more important if we want computational results to be trustworthy.<br></p><br>
 ]]></description> </item><item> <title>A first experience with Open Access publishing</title> <link>https://blog.khinsen.net/posts/2014/07/04/A-first-experience-with-Open-Access-publishing.html</link> <pubDate>2014-07-04</pubDate> <author>Konrad Hinsen</author> <guid isPermaLink="true">https://blog.khinsen.net/posts/2014/07/04/A-first-experience-with-Open-Access-publishing.html</guid>  <description><![CDATA[ <p>Most scientists have found out by now that a lot has been going wrong with scientific publishing over the years. In many fields, scientific journals are no longer fulfilling what used to be their primary role: disseminating and archiving the results of scientific studies. One of the new approaches that were developed to fix the publishing system is Open Access: the principle that published articles should be freely accessible to everyone (under conditions that vary according to which "dialect" of Open Access is used) and that the cost of the publishing procedure should be payed in some other way than subscription fees. The universe of Open Access publishing has become quite complex in itself. For those who want to know more about it, a good starting point is this <a href="http://cyber.law.harvard.edu/hoap/Open_Access_%2528the_book%2529">book</a>, whose electronic form is, of course, Open Access.</p><p>While I have been following the developments in Open Access publishing for a few years, I had never published any Open Access article myself. I work at the borderline of theoretical physics and biophysics, which sounds like closely related fields but they nevertheless have very different publishing traditions. In theoretical physics, the most well-known journals are produced by non-commercial publishers, in particular scientific societies. Their prices have not exploded, nor do these publishers put pressure on libraries to subscribe to more than they want to. There is a also a strong tradition of making preprints freely available, e.g. on <a href="http://arxiv.org/">arXiv.org</a>. This combined model continues to work well for theoretical physics, meaning that there is little incentive to look at Open Access publishing models. However, as soon as the "bio" prefix comes into play, the main journals are commercial. Some offer a per-article Open Access option, in exchange for the authors paying a few hundred to a few thousand dollars per article. There are also pure Open Access journals covering this field (e.g. <a href="http://www.ploscompbiol.org/">PLOS Computational Biology</a>), whose price range is similar. On the scale of the working budget of a theoretician working in France, these publishing fees are way too high, which is why I never considered Open Access for my "applied" research.</p><p>The fact that I have recently published <a href="http://f1000research.com/articles/3-101/v2">my first Open Access article</a>, in the pure Open Access journal <a href="http://f1000research.com/">F1000Research</a>, is almost a bit accidental. The topic of the article is the role of computation in science, with a particular emphasis on the necessity to keep scientific models distinct from software tools. I had the plan to write such an artile for a while, but it didn't really fit into any of the journals I knew. The subject is computational science, but more its philosophical foundations than the technicalities that journals on computational science specialize in. The audience is scientists applying computations, which is a much larger group than the methodology specialists who subscribe to and read computational science journals. Even if some computational science journal might have accepted my article, it wouldn't have reached most of its intended audience. A journal on the philosphy of science would have been worse, as almost no practitioner of computational science looks at this literature. Since there was no clear venue where the intended audience would have a chance of finding my article, the best option was some Open Access journal where at least the article would be accessible to everyone. Publicity through social networks could then help potentially interested readers discover it. Two obstacles remained: finding an Open Access journal with a suitable subject domain, and getting around the money problem.</p><p>At the <a href="https://etherpad.mozilla.org/sciencelab-calls-jan9-2014">January 2014 Community Call</a> of the <a href="http://mozillascience.org/">Mozilla Science Lab</a>, I learned that F1000Research was starting a new section on "science communication", and was waiving article processing charges for that section in 2014. This was confirmed shortly thereafter on the journal's <a href="http://blog.f1000research.com/2014/01/20/publishing-science-communication-papers-in-f1000research-free-in-2014/">blog</a>. Science communication was in fact a very good label for what I wanted to write about. And F1000Research looked like an interesting journal to test because its attitude to openness goes beyond Open Access: the review process is open as well, meaning that reviews are published with the reviewers' names, and get their own DOI for reference. So there was my opportunity.</p><p>For those new to the Open Access world, I will give a quick overview of the submission and publishing process. Everything is handled online, through the journal's Web site and by e-mail. Since I very much prefer writing LaTeX to using Word, I chose the option of submitting through the <a href="https://www.writelatex.com">writeLaTeX</a> service. The idea of writeLaTeX is that you edit your article using their Web tools, but nothing stops you from downloading the template provided by F1000Research, writing locally, and uploading the final text in the end. I thus wrote my article using my preferred tool (Emacs) and on my laptop even when I didn't have a network connection. Once you submit your article, it is revised by the editorial staff (concerning language, style, and layout, they don't touch the contents). Once you approve the revision, the article is published almost instantaneously on the journal Web site. You are then asked to suggest reviewers, and the journal asks some of them (I don't know how they make their choice) to review the article. Reviews are published as they come in, and you get an e-mail alert. In addition to providing detailed comments, reviewers judge the article as "approved", "approved with reservations" or "not approved". As soon as two reviewers "approve", the article status changes to "indexed", meaning that it gets a DOI and it is listed in databases such as PubMed or Scopus. Authors can reply to reviewers (again in public), and they are encouraged to revise their article based on the reviewers' suggestions. All versions of an article remain accesible indefinitely on the journal's Web site, so the history of the article remains accessible forever.</p><p>Overall I would judge my experience with F1000Research as very positive. The editorial staff replies rapidly and gets problems solved (in my case, technical problems with the Web site). Open review is much more reasonable than the traditional secret peer review process. No more guessing who the reviewers are in order to please them with citations with the hope of getting your revision accepted rapidly. No more lengthy letters to the editor trying to explain diplomatically that the reviewer is incompetent. With open reviewing, authors and reviewers act as equals, as it should always have been.</p><p>The only criticism I have concerns a technical point that I hope will be improved in the future. Even if you submit your original article through writeLaTeX, you have to prepapre revisions using Microsoft Word: you download a Word file for the initially published version, activate "track changes" mode, make your changes, and send the file back. For someone who doesn't have Microsoft Word, or is not familiar with its operation, this is an enormous barrier. A journal that encourages authors to revise their articles should also allow them to do so using tools that they have and are familiar with.</p><p>Will I publish in F1000Research again? I don't expect to do so in the near future. With the exception of the science communication section, F1000Research is heavily oriented towards the life sciences, so most of my research doesn't fit in. And then there is the money problem. Without the waiver mentioned above, I'd have had to pay 500 USD for my manuscript classified as an "opinion article". Regular research articles are twice as much. Compared to a theoretician's budget, which needs to cover mostly travel, these amounts are important. Moreover, in France's heavily bureaucratized public research, every euro comes with strings attached that define when, where, and on what you are allowed to spend it. Project-specific research grants often do allow to pay publication costs, but research outside of such projects, which is still common in the theoretical sciences, doesn't have any specific budget to turn to. The idea of the Open Access movement is to re-orient the money currently spent on subscriptions towards paying publishing costs directly, but such decisions are made on a political and administrational level very remote from my daily work. Until they happen, it is rather unlikely that I will publish in Open Access mode again.</p>
 ]]></description> </item><item> <title>Exploring Racket</title> <link>https://blog.khinsen.net/posts/2014/05/10/Exploring-Racket.html</link> <pubDate>2014-05-10</pubDate> <author>Konrad Hinsen</author> <guid isPermaLink="true">https://blog.khinsen.net/posts/2014/05/10/Exploring-Racket.html</guid> <category><![CDATA[ computational science ]]></category><category><![CDATA[ programming ]]></category> <description><![CDATA[ <p><br>Over the last few months I have been exploring the <a href="http://racket-lang.org/">Racket</a> language for its potential as a language for computational science, and it's time to summarize my first impressions.<br></p><br><br><h3 id="sec-1">Why Racket?</h2><br><p><br>There are essentially two reasons for learning a programing language: (1) getting acquainted with a new tool that promises to get some job done better than with other tools, and (2) learning about other approaches to computing and programming. My interest in Racket was driven by a combination of these two aspects. My background is in computational science (phsyics, chemistry, and structural biology), so I use computation extensively in my work. Like most computational scientists of my generation, I started working in Fortran, but quickly found this unsatisfactory. Looking for a better way to do computational science, I discovered <a href="http://www.python.org/">Python</a> in 1994 and joined the <a href="http://www.python.org/community/sigs/retired/matrix-sig/">Matrix-SIG</a> that developed what is now known as <a href="http://www.numpy.org/">NumPy</a>. Since then, Python has become my main programming language, and the ecosystem for scientific computing in Python has flourished to a degree unimaginable twenty years ago. For <i>doing</i> computational science, Python is one of the top choices today.<br></p><br><br><p><br>However, we shouldn't forget that we are still living in the stone age of computational science. Fortran was the Paleolithic, Python is the Neolithic, but we have to move on. I am convinced that computing will become as much an integral part of doing science as mathematics, but we are not there yet. One important aspect has not evolved since the beginnings of scientific computing in the 1950s: the work of a computational scientist is dominated by the technicalities of computing, rather than by the scientific concerns. We write, debug, optimize, and extend software, port it to new machines and operating systems, install messy software stacks, convert file formats, etc. These technical aspects, which are mostly unrelated to doing science, take so much of our time and attention that we think less and less about why we do a specific computation, how it fits into more general theoretical frameworks, how we can verify its soundness, and how we can improve the scientific models that underly our computations. Compare this to how theoreticians in a field like physics or chemistry use mathematics: they have acquired most of their knowledge and expertise in  mathematics during their studies, and spend much more time applying mathematics to do science than worrying about the intrinsic problems of mathematics. Computing should one day have the same role. For a more detailed description of what I am aiming at, see my recent <a href="http://f1000r.es/3af">article</a>.<br></p><br><br><p><br>This lengthy foreword was necessary to explain what I am looking for in Racket: not so much another language for doing today's computational science (Python is a better choice for that, if only for its well-developed ecosystem), but as an evironment for developing tomorrow's computational science. The Racket Web site opens with the title "A programmable programming language", and that is exactly the aspect of Racket that I am most interested in.<br></p><br><br><p><br>There are two more features of Racket that I found particularly attractive. First, it is one of the few languages that have good support for immutable data structures without being extremist about it. Mutable state is the most important cause of bugs in my experience (see my <a href="http://www.researchgate.net/publication/216857229_Managing_State/file/839957f7f612d58c02a6babe517f86ed.pdf">article on "Managing State"</a> for details), and I fully agree with Clojure's Rich Hickey who says that "immutability is the right default". Racket has all the basic data structures in a mutable and an immutable variant, which provides a nice environment to try "going immutable" in practice. Second, there is a statically typed dialect called <a href="http://docs.racket-lang.org/ts-guide/index.html">Typed Racket</a> which promises a straightforward transition from fast prototyping in plain Racket to type-safe and more efficient production code in Typed Racket. I haven't looked at this yet, so I won't say any more about it.<br></p><br><br><h3 id="sec-2">Racket characteristics</h2><br><p><br>For readers unfamiliar with Racket, I'll give a quick overview of the language. It's part of the <a href="http://en.wikipedia.org/wiki/Lisp_%2528programming_language%2529">Lisp</a> family, more precisely a derivative of <a href="http://en.wikipedia.org/wiki/Scheme_%2528programming_language%2529">Scheme</a>. In fact, Racket was formerly known as "PLT Scheme", but its authors decided that it had diverged sufficiently from Scheme to give it a different name. People familiar with Scheme will still recognize much of the language, but some changes are quite profound, such as the fact that lists are immutable. There are also many extensions not found in standard Scheme implementations.<br></p><br><br><p><br>The hallmark of the Lisp family is that programs are defined in terms of data structures rather than in terms of a text-based syntax. The most visible consequence is a rather peculiar visual aspect, which is dominated by parentheses. The more profound implication, and in fact the motivation for this uncommon choice, is the equivalence of code and data. Program execution in Lisp is nothing but interpretation of a data structure. It is possible, and common practice, to construct data structures programmatically and then evaluate them. The most frequent use of this characteristic is writing <i>macros</i> (which can be seen as code preprocessors) to effectively extend the language with new features. In that sense, all members of the Lisp family are "programmable programming languages".<br></p><br><br><p><br>However, Racket takes this approach to another level. Whereas traditional Lisp macros are small code preprocessors, Racket's macro system feels more like a programming API for the compiler. In fact, much of Racket is implemented in terms of Racket macros. Racket also provides a way to define a complete new language in terms of existing bits and pieces (see the paper "<a href="http://www.ccs.neu.edu/racket/pubs/pldi11-thacff.pdf">Languages as libraries</a>" for an in-depth discussion of this philosophy). Racket can be seen as a construction kit for languages that are by design interoperable, making it feasible to define highly specific languages for some application domain and yet use it in combination with a general-purpose language.<br></p><br><br><p><br>Another particularity of Racket is its origin: it is developed by a network of academic research groups, who use it as tool for their own research (much of which is related to programming languages), and as a medium for teaching. However, contrary to most programming languages developed in the academic world, Racket is developed for use in the "real world" as well. There is documentation, learning aids, development tools, and the members of the core development team are always ready to answer questions on the Racket user mailing list. This mixed academic-application strategy is of interest for both sides: researchers get feedback on the utility of their ideas and developments, and application programmers get quick access to new technology. I am aware of only three other languages developed in a similar context: <a href="http://caml.inria.fr/ocaml/">OCaml</a>, <a href="http://www.haskell.org/haskellwiki/Haskell">Haskell</a>, and <a href="http://www.scala-lang.org/">Scala</a>.<br></p><br><br><h3 id="sec-3">Learning and using Racket</h2><br><p><br>A first look at the <a href="http://docs.racket-lang.org/guide/index.html">Racket Guide</a> (an extended tutorial) and the <a href="http://docs.racket-lang.org/reference/index.html">Racket Reference</a> shows that Racket is not a small language: there is a bewildering variety of data types, control structures, abstraction techniques, program structuration methods, and so on. Racket is a very comprehensive language that allows both fine-tuning and large-scale composition. It definitely doesn't fit into the popular "low-level" vs. "high-level" dichotomy. For the experienced programmer, this is good news: whatever technique you know to be good for the task at hand is probably supported by Racket. For students of software development, it's probably easy to get lost. Racket comes with several subsets developed for pedagogical purposes, which are used in courses and textbooks, but I didn't look at those. What I describe here is the "standard" Racket language.<br></p><br><br><p><br>Racket comes with its own development environment called "DrRacket". It looks quite poweful, but I won't say more about it because I haven't used it much. I use too many languages to be interested in any language-specific environment. Instead, I use <a href="https://www.gnu.org/software/emacs/">Emacs</a> for everything, with <a href="http://www.nongnu.org/geiser/">Geiser</a> for Racket development.<br></p><br><br><p><br>The documentation is complete, precise, and well presented, including a pleasant visual layout. But it is not always an easy read. Be prepared to read through some background material before understanding all the details in the reference documentation of some function you are interested in. It can be frustrating sometimes, but I have never been disappointed: you do find everything you need to know if you just keep on following links.<br></p><br><br><p><br>My personal project for learning Racket is an <a href="https://github.com/mosaic-data-model/mosaic-racket">implementation</a> of the <a href="https://mosaic-data-model.github.io/">MOSAIC</a> data model for molecular simulations. While my implementation is not yet complete (it supports only two kinds of data items, universes and configurations), it has data structure definitions, I/O to and from XML, data validation code, and contains a test suite for everything. It uses some advanced Racket features such as generators and interfaces, not so much out of necessity but because I wanted to play with them.<br></p><br><br><p><br>Overall I had few surprises during my first Racket project. As I already said, finding what you need in the documentation takes a lot of time initially, mostly because there is so much to look at. But once you find the construct you are looking for, it does what you expect and often more. I remember only one ongoing source of frustration: the multitude of specialized data structures, which force you to make choices you often don't really care about, and to insert conversion functions when function A returns a data structure that isn't exactly the one that function B expects to get. As an illustration, consider the Racket equivalent of Python dictionaries, <a href="http://docs.racket-lang.org/guide/hash-tables.html">hash tables</a>. They come in a mutable and an immutable variant, each of which can use one of three different equality tests. It's certainly nice to have that flexibility when you need it, but when you don't, you don't want to have to read about all those details either.<br></p><br><br><p><br>As for Racket's warts, I ran into two of them. First, the worst supported data structure in Racket must be the <a href="http://docs.racket-lang.org/reference/vectors.html">immutable vector</a>, which is so frustrating to work with (every operation on an immutable vector returns a mutable vector, which has to be manually converted back to an immutable vector) that I ended up switching to lists instead, which are immutable by default. Second, the distinction (and obligatory conversion) between lists, streams, generators and a somewhat unclear sequence abstraction makes you long for the simplicity of a single sequence interface as found in Python or Clojure. In Racket, you can decompose a list into head and tail using <code>first</code> and <code>rest</code>. The same operations on a stream are <code>stream-first</code> and <code>stream-rest</code>. The sequence abstraction, which covers both lists and streams and more, has <code>sequence-tail</code> for the tail, but to the best of my knowledge nothing for getting the first element, other than the somewhat heavy <code>(for/first ([element sequence]) element)</code>.<br></p><br><br><p><br>The macro requirements of my first project were modest, not exceeding what any competent Lisp programmer would easily do using <code>defmacro</code> (which, BTW, exists in Racket for compatibility even though its use is discouraged). Nevertheless, in the spirit of my exploration, I tried all three levels of Racket's hygienic macro definitions: <code>syntax-rule</code>, <code>syntax-case</code>, and <code>syntax-parse</code>, in order of increasing power and complexity. The first, <code>syntax-rule</code> is straightforward but limited. The last one, <code>syntax-parse</code>, is the one you want for implementing industrial-strength compiler extensions. I don't quite see the need for the middle one, <code>syntax-case</code>, so I suppose it's there for historical reasons, being older than <code>syntax-parse</code>. Macros are the one aspect of Racket for which I recommend starting with something else than the Racket documentation: Greg Hendershott's <a href="http://www.greghendershott.com/fear-of-macros/">Fear of Macros</a> is a much more accessible introduction.<br></p><br><br><h3 id="sec-4">Scientific computing</h2><br><p><br>As I said in the beginning of this post, my goal in exploring Racket was not to use it for my day-to-day work in computational science, but nevertheless I had a look at the support for scientific computing that Racket offers. In summary, there isn't much, but what there is looks very good.<br></p><br><br><p><br>The basic Racket language has good support for numerical computation, much of which is inherited from Scheme. There are integers of arbitrary size, rational numbers, and floating-point numbers (single and double precision), all with the usual operations. There are also complex numbers whose real/imaginary parts can be exact (integer or rational) or inexact (floats). Unlimited-precision floats are provided by an interface to <a href="http://www.mpfr.org/">MPFR</a> in the Racket math library.<br></p><br><br><p><br>The <a href="http://docs.racket-lang.org/math/index.html">math library</a> (which is part of every standard Racket installation) offers many more goodies: multidimensional arrays, linear algebra, Fourier transforms, special functions, probability distributions, statistics, etc. The <a href="http://docs.racket-lang.org/plot/index.html">plot library</a>, also in the standard Racket installation, adds one of the nicest collections of plotting and visualization routines that I have seen in any language. If you use DrRacket, you can even rotate 3D scenes interactively, a feature that I found quite useful when I used (abused?) plots for molecular visualization.<br></p><br><br><p><br>Outside of the Racket distribution, the only library I could find for scientific applications is Doug Williams' "<a href="http://planet.racket-lang.org/display.ss?package%3Dscience.plt&amp;owner%3Dwilliams">science collection</a>", which predates the Racket math library. It looks quite good as well, but I didn't find an occasion yet for using it.<br></p><br><br><p><br>Could I do my current day-to-day computations with Racket? A better way to put it is, how much support code would I have to write that is readily available for more mature scientific languages such as Python? What I miss most is access to my data in HDF5 and netCDF formats. And the domain-specific code for molecular simulation, i.e. the equivalent of my own <a href="http://dirac.cnrs-orleans.fr/MMTK/">Molecular Modeling Toolkit</a>. Porting the latter to Racket would be doable (I wrote it myself, so I am familiar with all the algorithms and its pitfalls), and would in fact be an opportunity to improve many details. But interfacing HDF5 or netCDF sounds like a lot of work with no intrinsic interest, at least to me.<br></p><br><br><h3 id="sec-5">The community</h2><br><p><br>Racket has an apparently small but active, competent, and friendly community. I say "apparently" because all I have to base my judgement on is the Racket user mailing list. Given Racket's academic and teaching background, it is quite possible that there are lots of students using Racket who find sufficient support locally that they never manifest themselves on the mailing list. Asking a question on the mailing list almost certainly leads to a competent answer, sometimes from one of the core developers, many of whom are very present. There are clearly many Racket beginners (and also programming newbies) on the list, but compared to other programming language users' lists, there are very few naive questions and comments. It seems like people who get into Racket are serious about programming and are aware that problems they encounter are most probably due to their lack of experience rathen than caused by bugs or bad design in Racket.<br></p><br><br><p><br>I also noticed that the Racket community is mostly localized in North America, judging from the peak posting times on the mailing list. This looks strange in today's Internet-dominated world, but perhaps real-life ties still matter more than we think.<br></p><br><br><p><br>Even though the Racket community looks small compared to other languages I have used, it is big and healthy enough to ensure its existence for many years to come. Racket is not the kind of experimental language that is likely to disappear when its inventor moves on to the next project.<br></p><br><br><h3 id="sec-6">Conclusion</h2><br><p><br>Overall I am quite happy with Racket as a development language, though I have to add that I haven't used it for anything mission-critical yet. I plan to continue improving and completing my Racket implementation of Mosaic, and move it to Typed Racket as much as possible. But I am not ready to abandon Python as my workhorse for computational science, there are simply too many good libraries in the scientific Python ecosystem that are important for working efficiently.<br></p><br>
 ]]></description> </item><item> <title>The roles of computer programs in science</title> <link>https://blog.khinsen.net/posts/2014/01/21/The-roles-of-computer-programs-in-science.html</link> <pubDate>2014-01-21</pubDate> <author>Konrad Hinsen</author> <guid isPermaLink="true">https://blog.khinsen.net/posts/2014/01/21/The-roles-of-computer-programs-in-science.html</guid> <category><![CDATA[ computational science ]]></category><category><![CDATA[ science ]]></category> <description><![CDATA[ <br>Why do people write computer programs? The answer seems obvious: in order to produce useful tools that help them (or their clients) do whatever they want to do. That answer is clearly an oversimplification. Some people write programs just for the fun of it, for example. But when we replace "people" by "scientists", and limit ourselves to the scientists' professional activities, we get a<br>statement that rings true: Scientists write programs because these programs do useful work for them. Lengthy computations, for example, or visualization of complex data.<br><br>This perspective of "software as a tool for doing research" is so pervasive in computational science that it is hardly ever expressed. Many scientists even see software, or perhaps the combination of computer hardware plus software as just another piece of lab equipment. A nice illustration is this <a href="https://www.youtube.com/watch?v%3DuF4eZA2HwOU">TEDx lecture</a> by <a href="https://www-s.ks.uiuc.edu/~kschulte/">Klaus Schulten</a> about his "computational microscope", which is in fact <a href="https://www-s.ks.uiuc.edu/Development/">Molecular Dynamics simulation software</a> for studying biological macromolecules such as proteins or DNA.<br><br>To see the fallacy behind equating computer programs with lab equipment, let's take a step back and look at the basic principles of science. The ultimate goal of science is to develop an understanding of the universe that we inhabit. The specificity of science (compared to other approaches such as philosophy or religion) is that it constructs precise <i>models</i> for natural phenomena that it validates and improves by repeated confrontation with <i>observations</i> made on the real thing:<br><a href="http://khinsen.files.wordpress.com/2014/01/science.png"><img class="size-full wp-image-427 aligncenter" alt="science" src="http://khinsen.files.wordpress.com/2014/01/science.png" width="200" /></a><br><br>An <i>experiment</i> is just an optimization: it's a setup designed for making a very specific kind of observation that might be difficult or impossible to make by just looking at the world around us. The process of doing science is an eternal cycle: the model is used to make <i>predictions</i> of yet-to-make observations, whereas the real observations are compared to these predictions in order to validate the model and, in case of a significant discrepancies, to correct it.<br><br>In this cycle of prediction and observation, the role of a traditional microscope is to help make observations of what happens in nature. In contrast, the role of Schulten's computational microscope is to make predictions from a theoretical model. Once you think about this for a while, it seems obvious. To make observations on a protein, you need to <i>have</i> that protein. A real one, made of real atoms. There is no protein anywhere in a computer, so a computer cannot do observations on proteins, no matter which software is being run on it. What you look at with the computational microscope is not a protein, but a <i>model</i> of a protein. If you actually watch Klaus Schulten's video to the end, you will see that this distinction is made at some point, although not as clearly as I think it should be.<br><br>So it seems that the term "a tool for exploring a theoretical model" is a good description of a simulation program. And in fact that's what early simulation programs were. The direct ancestors of Schulten's computational microscope are the first Molecular Dynamics simulation programs made for atomic liquids. A classic reference is <a href="http://dx.doi.org/10.1103%252FPhysRev.136.A405">Rahman's 1964 paper</a> on the simulation of liquid argon. The papers of that time specify the model in terms of a few mathematical equations plus a some numerical parameters. Molecular Dynamics is basically Newton's equations of motion, discretized for numerical integration, plus a simple model for the interactions between the atoms, known as the Lennard-Jones potential. A simulation program of the time was a rather straightforward translation of the equations into FORTRAN, plus some bookkeeping and I/O code. It was indeed a tool for exploring a theoretical model.<br><br>Since then, computer simulation has been applied to ever bigger and ever more complex systems. The examples shown by Klaus Schulten in his video represent the state of the art: assemblies of biological macromolecules, consisting of millions of atoms. The theoretical model for these systems is still a discretized version of Newton's equations plus a model for the interactions. But this model for the interactions has become extremely complex. So complex in fact that nobody bothers to write it down any more. It's not even clear how you would write it down, since standard mathematical notation is no longer adequate for the task. A full specification requires some algorithms and a database of chemical information. Specific aspects of model construction have been discussed at length in the scientific literature (for example how best to describe electrostatic interactions), but a complete and  precise specification of the model used in a simulation-based study is never provided.<br><br>The evolution from simple simulations (liquid argon) to complex ones (assemblies of macromolecules) looks superficially like a quantitative change, but there is in fact a qualitative difference: for today's complex simulations, <b>the computer program <i>is</i> the model.</b> Questions such as "Does program X correctly implement model A?", a question that made perfect sense in the 1960s, have become meaningless. Instead, we can only ask "Does program X implement the same model as program Y?", but that question is impossible to answer in practice. The reason is that the programs are even more complex than the models, because they also deal with purely practical issues such as optimization, parallelization, I/O, etc. This phenomenon is not limited to Molecular Dynamics simulations. The transition from mathematical models to computational models, which can only be expressed in the form of computer programs, is happening in many branches of science. However, scientists are slow to recognize what is happening, and I think that is one reason for the frequent misidentification of software as experimental equipment. Once a theoretical model is complex and drowned in even more complex software, it acquires many of the characteristics of experiments. Like a sample in an experiment, it cannot be known exactly, it can only be studied by observing its behavior. Moreover, these observations are associated with systematic and statistical errors resulting from numerical issues that frequently even the program authors don't fully understand.<br><br>From my point of view (I am a theoretical physicist), this situation is not acceptable. Models play a central role in science, in particular in theoretical science. Anyone claiming to be theoretician should be able to state precisely which models he/she is using. Differences between models, and approximations to them, must be discussed in scientific studies. A prerequisite is that the models can be written down in a human-readable form. Computational models are here to stay, meaning that computer programs as models will become part of the daily bread of theoreticians. What we will have to develop is notations and techniques that permit a separation of the model aspect of a program from all the other aspects, such as optimization, parallelization, and I/O handling. I have presented some ideas for reaching this goal in <a href="http://doi.ieeecomputersociety.org/10.1109/MCSE.2013.104">this article</a> (click <a href="http://online.qmags.com/CISE0913#pg1&amp;mode2">here</a> for a free copy of the issue containing it, it's on page 77), but a lot of details remain to be worked out.<br><br>The idea of programs as a notation for models is not new. It has been discussed in the context of education, for example in <a href="http://dspace.mit.edu/handle/1721.1/6707">this paper</a> by Gerald Sussman and Jack Wisdom, as well as in their <a href="https://en.wikipedia.org/wiki/Structure_and_Interpretation_of_Classical_Mechanics">book</a> that presents classical mechanics in a form directly executable on a computer. The constraint of executability imposed by computer programs forces scientists to remove any ambiguities from their models. The idea is that if you can run it on your computer, it's completely specified. Sussman and Wisdom actually designed a specialized programming language for this purpose. They say it's <a href="https://en.wikipedia.org/wiki/Scheme_(programming_language)">Scheme</a>, which is technically correct, but Scheme is a member of the Lisp family of extensible programming languages, and the <a href="http://groups.csail.mit.edu/mac/users/gjs/6946/refman.txt">extensions</a> written by Sussman and Wisdom are highly non-trivial, to the point of including a special-purpose <a href="https://en.wikipedia.org/wiki/Computer_algebra_system">computer algebra system</a>.<br><br>For the specific example that I have used above, Molecular Dynamics simulations of proteins, the model is based on classical mechanics and it should thus be possible to use the language of Sussman and Wisdom to write down a complete specification. Deriving an efficient simulation program from such a model should also be possible, but requires significant research and devlopment effort.<br><br>However, any progress in this direction can happen only when the computational science community takes a step back from its everyday occupations (producing ever more efficient tools for running ever bigger simulations on ever bigger computers) and starts thinking about the place that it occupies in the pursuit of scientific research.<br><br><b>Update</b> (2014-5-26)  I have also written a <a href="http://f1000r.es/3af" target="_blank">more detailed article</a> on this subject.
 ]]></description> </item><item> <title>Python as a platform for reproducible research</title> <link>https://blog.khinsen.net/posts/2013/11/19/Python-as-a-platform-for-reproducible-research.html</link> <pubDate>2013-11-19</pubDate> <author>Konrad Hinsen</author> <guid isPermaLink="true">https://blog.khinsen.net/posts/2013/11/19/Python-as-a-platform-for-reproducible-research.html</guid> <category><![CDATA[ reproducible research ]]></category><category><![CDATA[ science ]]></category> <description><![CDATA[ <p>The other day I was looking at the <a href="https://github.com/numpy/numpy/blob/master/doc/release/1.8.0-notes.rst">release notes for the recently published release 1.8 of NumPy</a>, the library that is the basis for most of the Scientific Python ecosystem. As usual, it contains a list of new features and improvements, but also sections such as "dropped support" (for Python 2.4 and 2.5) and "future changes", to be understood as "incompatible changes that you should start to prepare for". Dropping support for old Python releases is understandable: maintaining compatibility and testing it is work that needs to be done by someone, and manpower is notoriously scarce for projects such as NumPy. Many of the announced changes are in the same category: they permit removing old code and thus reduce maintenance effort. Other announced changes have the goal of improving the API, and I suppose they were more controversial than the others, as it is rarely obvious that one API is better than another one.<br></p><br><p><br>From the point of view of reproducible research, all these changes are bad news. They mean that libraries and scripts that work today will fail to work with future NumPy releases, in ways that their users, who are usually not the authors, cannot easily understand or fix. Actively maintained libraries will of course be adapted to changes in NumPy, but much, perhaps most, scientific software is not actively maintained. A PhD student doing computational reasearch might well publish his/her software along with the thesis, but then switch subjects, or leave research altogether, and never look at the old code again. There are also specialized libraries developed by small teams who don't have the resources to do as much maintenance as they would like.<br></p><br><p><br>Of course NumPy is not the only source of instability in the Python platform. The most visible change in the Python ecosystem is the evolution of Python itself, whose 3.x series is not compatible with the initial Python language. It is difficult to say at this time for how long Python 2.x will be maintained, but it is well possible that much of today's scientific software written in Python will become difficult to run ten years from now.<br></p><br><p><br>The problem of scientific publications becoming more and more difficult to use is not specific to computational science. A theoretical physicist trying to read Isaac Newton's works would have a hard time, because the mathematical language of physics has changed considerably over time. Similarly, an experimentalist trying to reproduce Galileo Galilei's experiments would find it hard to follow his descriptions. Neither is a problem in practice, because the insights obtained by Newton and Galilei have been reformulated many times since then and are available in today's language in the form of textbooks. Reading the original works is required only for studying the history of science. However, it typically takes a few decades before specific results are universally recognized as important and enter the perpetually maintained canon of science.<br></p><br><p><br>The crucial difference with computations is that computing platforms evolve much faster than scientific research. Researchers in fields such as physics and chemistry routinely consult original research works that are up to thirty years old. But scientific software from thirty years ago is almost certainly unusable today without changes. The state of today's software thirty years from now is likely to be worse, since software complexity has increased significantly. Thirty years ago, the only dependencies a scientific program would have is a compiler and perhaps one of a few widely known numerical libraries. Today, even a simple ten-line Python script has lots of dependencies, most of the indirectly through the Python interpreter.<br></p><br><br><p><br>One popular attitude is to say: Just run old Python packages with old versions of Python, NumPy, etc. This is an option as long as the versions you need are recent enough that they can still be built and installed on a modern computer system. And even then, the practical difficulties of working with parallel installation of multiple versions of several packages are considerable, in spite of tools designed to help with this task (have a look at <a href="https://github.com/hashdist/hashdist">EasyBuild</a>, <a href="http://hpcugent.github.io/easybuild/">hashdist</a>, <a href="http://hpcugent.github.io/easybuild/">conda</a>, and <a href="http://nixos.org/nix/">Nix</a> or its offshoot <a href="https://www.gnu.org/software/guix/">Guix</a>).<br></p><br><p><br>An additional difficulty is that the installation instructions for a library or script at best mention a <i>minimum</i> version number for dependencies, but not the last version with which they were tested. There is a tacit assumption in the computing world that later versions of a package are compatible with earlier ones, although this is not true in practice, as the example of NumPy shows. The Python platform would be a nicer place if any backwards-incompatible change were accompanied by a change in package name. Dependencies would then be evident, and the different incompatible versions could easily be installed in parallel. Unfortunately this approach is rarely taken, a laudable exception being <a href="https://pypi.python.org/pypi/Pyro4">Pyro</a>, whose latest incarnation is called Pyro4 to distinguish it from its not fully compatible predecessors.<br></p><br><p><br>I have been thinking a lot about this issue recently, because it directly impacts my <a href="https://dirac.cnrs-orleans.fr/plone/software/activepapers">ActivePapers</a> project. ActivePapers solves the dependency versioning problem for all code that lives within the ActivePaper universe, by abandoning the notion of a single collection of "installed packages" and replacing it by explicit references to a specific published version. However, the problem persists for packages that cannot be moved inside the ActivePaper universe, typically because of extension modules written in a compiled language. The most fundamental dependencies of this kind are NumPy and h5py, which are guaranteed to be available in an ActivePapers installation. ActivePapers does record the version numbers of NumPy and h5py (and also HDF5) that were used for each individual computation, but it has currently no way to reproduce that exact environment at a later time. If anyone has a good idea for solving this problem, in a way that the average scientist can handle without becoming a professional systems administrator, please leave a comment!<br></p><br><p><br>As I have pointed out in an <a href="https://khinsen.wordpress.com/2013/08/14/platforms-for-reproducible-research/">earlier post</a>, long-term reproducibility in computational science will become possible only if the community adopts a stable code representation, which needs to be situated somewhere in between  processor instruction sets and programming languages, since both ends of this spectrum are moving targets. In the meantime, we will have to live with workarounds.<br></p><br>
 ]]></description> </item><item> <title>ActivePapers for Python</title> <link>https://blog.khinsen.net/posts/2013/09/27/ActivePapers-for-Python.html</link> <pubDate>2013-09-27</pubDate> <author>Konrad Hinsen</author> <guid isPermaLink="true">https://blog.khinsen.net/posts/2013/09/27/ActivePapers-for-Python.html</guid> <category><![CDATA[ reproducible research ]]></category> <description><![CDATA[ Today I have published the first release of ActivePapers for Python, available on <a href="http://pypi.python.org/pypi/ActivePapers.Py/0.1">PyPI</a> or directly from the <a href="https://bitbucket.org/khinsen/active_papers_py">Mercurial repository on Bitbucket</a>. The release coincides with the publication of my first scientific paper for which the complete code and data is in the supplementary material, available through the <a href="http://jcp.aip.org/resource/1/jcpsa6/v139/i12/p124115_s1">J. Chem. Phys. Web site</a> or from <a href="http://dx.doi.org/10.6084/m9.figshare.798825">Figshare</a>. There is a good chance that this is the first fully reproducible paper in the field of biomolecular simulation, but it is of course difficult to verify such a claim.<br><br>ActivePapers is a framework for doing and publishing reproducible research. An ActivePaper is a file that contains code (Python modules and scripts) and data (HDF5 datasets), plus the dependency information between all these pieces. You can change a script and re-run all the computations that depend on it, for example. Once your project is finished, you can publish the ActivePaper as supplementary material to your standard paper. You can also re-use code and data from a published ActivePaper by using DOI-based links, although for the moment this works only for ActivePapers stored on <a href="http://figshare.com/">Figshare</a>.<br><br>I consider this first release of ActivePapers quite usable (I use it, after all), but it's definitely for "early adopters". You should be comfortable working with command-line tools, for example, and of course you need some experience with writing Python scripts if you want to create your own ActivePaper. For inspecting data, you can use any HDF5-based tool, such as <a href="http://www.hdfgroup.org/hdf-java-html/hdfview/">HDFView</a>, though this makes sense only for data that generic tools can handle. My first published ActivePaper contains lots of protein structures, which HDFView doesn't understand at all. I expect tool support for ActivePapers to improve significantly in the near future.
 ]]></description> </item><item> <title>Platforms for reproducible research</title> <link>https://blog.khinsen.net/posts/2013/08/14/Platforms-for-reproducible-research.html</link> <pubDate>2013-08-14</pubDate> <author>Konrad Hinsen</author> <guid isPermaLink="true">https://blog.khinsen.net/posts/2013/08/14/Platforms-for-reproducible-research.html</guid> <category><![CDATA[ computational science ]]></category><category><![CDATA[ reproducible research ]]></category> <description><![CDATA[ This post was motivated by Ian Gent's <a href="http://arxiv.org/abs/1304.3674v1">recomputation manifesto</a> and his <a href="http://www.software.ac.uk/blog/2013-07-09-recomputation-manifesto">blog post</a> about it. While I agree with pretty much everything said there, there is one point that I strongly disagree with, and here I'd like to explain the reasons in some detail. The point in question is <em>"The only way to ensure recomputability is to provide virtual machines"</em>. To be fair, the manifesto specifies that it's the only way "at least for now", so perhaps our disagreement is not as pronounced as it may seem.<br><br>I'll start with a quote from the manifesto that shows that we have similar ideas of the time scales over which computational research should be reproducible:<br><em>"It may be true that code you make available today can be built with only minor pain by many people on current computers. That is unlikely to be true in 5 years, and hardly credible in 20."</em><br><br>So the question is: how can we best ensure that the software used in our computational studies can still be run, with reasonable effort, 20 years from now. To answer that question, we have to look at the possible platforms for computational research.<br><br>By a "platform", I mean the combination of hardware and software that is required to use a given piece of digital information. For example, Flash video requires a Flash player and a computer plus operating system that the Flash player can run on. That's what defines the "Flash platform". Likewise, today's "Web platform" (a description that requires a date stamp to be precise, because Web standards evolve so quickly) consists of HTML5, JavaScript, and a couple of related standards. If you want to watch a Flash video in 20 years, you will need a working Flash platform, and if you want to use an archived copy of a 2013 Web site, you need the 2013 Web platform.<br><br>If you plan to distribute some piece of digital information with the hope that it will make sense 20 years from now, you must either have confidence in the longevity of the platform, or be willing and able to ensure its long-term maintenance yourself. For the Flash platform, that means confidence in Adobe and its willingness to keep Flash alive (I wouldn't bet on that). For the 2013 Web platform, you may hope that its sheer popularity will motivate someone to keep it alive, but I wouldn't bet on it either. The Web platform is too complex and too ill-defined to be kept alive reliably when no one uses it in daily life any more.<br><br>Back to computational science. 20 years ago, most scientific software was written in Fortran 77, often with extensions specific to a machine or compiler. Much software from that era relied on libraries as well, but they were usually written in the same language, so as long as their source code remains available, the platform for all that is a Fortran compiler compatible with the one from back then. For standard Fortran 77, that's not much of a problem, whereas most of the vendor-specific extensions have disappeared since. Much of that 20-year-old software can in fact still be used today. However, reproducing a computational study based on that software is a very different problem: it also requires all the input data and an executable description of the computational protocol. Even in the rare case that all that information is available, it is likely to depend on lots of other software pieces that may not be easy to get hold of any more. The total computational platform for a given research project is in fact as ill-defined as the 2013 Web platform.<br><br>Today's situation is worse, because we use more diverse software written in more different languages, and also use more interactive software whose use is notoriously non-reproducible. The only aspect where we have gained in standardization is the underlying hardware and OS layer: pretty much all computational science is done today on x86 processors running Linux. Hence the idea of conserving the full operating environment in the form of a virtual machine. Just fire up VirtualBox (or one of the other virtual machine managers) and run an exact copy of the original study's work environment.<br><br>But what is the platform required to run today's virtual machines? It's VirtualBox, or one of its peers. Note however that it's not "any of today's virtual machine managers" because compatibility between their virtual machine formats is not perfect. It may work, or it may not. For simplicity I will use VirtualBox in the following, but you can substitute another name and the basic arguments still hold.<br><br>VirtualBox is a highly non-trivial piece of software, and it has very stringent hardware requirements. Those hardware requirements are met by the vast majority of today's computing equipment used in computational science, but the x86 platform is losing market share rapidly on the wider computing device market. VirtualBox doesn't run on an iPad, for example, and probably it never will. Is VirtualBox likely to be around in 20 years? I won't dare a prediction. If x86 survives for another 20 years AND if Oracle sees a continuing interest in this product, then it will. I won't bet on it though.<br><br>What we really need for long-term recomputability is a simple platform. A platform that is simple enough that the scientific community alone can afford to keep it alive for its own needs, even if no one else in the world cares about it.<br><br>Unfortunately there is no suitable platform today, to the best of my knowledge. Which is why virtual machines are perhaps the best option right now, for lack of a satisfactory one. But if we care about recomputability, we should design and develop a good supporting platform, starting as soon as possible.<br><br>For a more detailed discussion of this issue, see <a href="http://www.sciencedirect.com/science/article/pii/S1877050911001190">this paper</a> written by yours truly. It comes to the conclusion that the closest existing approximation to a good platform is the Java virtual machine. What we'd want ideally is something similar to the JVM, but designed and optimized for scientific applications. A basic JVM implementation is quite simple (the complex JIT stuff is not a requirement), a few orders of magnitude simpler than VirtualBox, and it has no specific hardware dependencies. It's even simpler than many of today's scientific software packages, so the scientific community can definitely afford to keep it alive, The tough part is... no, it's not designing or writing the required software, it's agreeing on a specification. Perhaps it will never happen. Perhaps virtual machines will remain the best choice for lack of a satisfactory one. Or perhaps we will end up compiling our software to <a href="http://asmjs.org/">asm.js</a> and run in the browser, just because someone else will keep that platform alive for us, no matter how ill-adapted it is to our needs. But don't say you haven't been warned.
 ]]></description> </item><item> <title>Bye bye Address Book, welcome BBDB</title> <link>https://blog.khinsen.net/posts/2013/06/03/Bye-bye-Address-Book-welcome-BBDB.html</link> <pubDate>2013-06-03</pubDate> <author>Konrad Hinsen</author> <guid isPermaLink="true">https://blog.khinsen.net/posts/2013/06/03/Bye-bye-Address-Book-welcome-BBDB.html</guid> <category><![CDATA[ emacs ]]></category> <description><![CDATA[ About two years ago I wrote a <a href="http://khinsen.wordpress.com/2011/01/04/bye-bye-ical-welcome-org-mode/">post</a> about why and how I abandoned Apple's iCal for my agenda management and moved to Emacs org-mode instead. Now I am in the process of making the second step in the same direction: I am abandoning Apple's Address Book and starting to use the "<a href="http://savannah.nongnu.org/projects/bbdb/">Big Brother DataBase</a>", the most popular contact management system from the Emacs universe.<br><br>What started to annoy me seriously about Address Book is a bug that makes the database and its backups grow over time, even if no contacts are added, because the images for the contacts keep getting copied and never deleted under certain circumstances. I ended up having address book backups of 200 MB for just 500 contacts, which is ridiculous. A quick Web search shows that the problem has been known for years but has not yet been fixed.<br><br>When I upgraded from MacOS 10.6 to 10.7 about a year ago (I am certainly not an early adopter of new MacOS versions), I had a second reason to dislike Address Book: the user interface had been completely re-designed and become a mess in the process. Every time I use it I have to figure out again how to navigate groups and contacts.<br><br>I had been considering moving to BBDB for a while, but I hadn't found any good solution for synchronizing contacts with my Android phone. That changed when I discovered <a href="http://karra-asynk.appspot.com/">ASynK</a>, which does a bi-directional synchronization between a BBDB database and a Google Contacts account. That setup actually works better than anything I ever tried to synchronize Address Book with Google Contacts, so I gained more than I expected in the transition.<br><br>At first glance, it may seem weird to move from technology of the 2000's to technology of the 1970's. But the progress over that period in managing rather simple data such as contact information has been negligible. The big advantage of the Emacs platform over the MacOS platform is that it doesn't try to take control over my data. A BBDB database is just a plain text file whose structure is apparent after five minutes of study, whereas an Address Book database is stored in a proprietary format. A second advantage is that the Emacs developer community fixes bugs a lot faster than Apple does. A less shiny (but perfectly usable) user interface is a small price to pay.
 ]]></description> </item><item> <title>A critical view of altmetrics</title> <link>https://blog.khinsen.net/posts/2013/05/08/A-critical-view-of-altmetrics.html</link> <pubDate>2013-05-08</pubDate> <author>Konrad Hinsen</author> <guid isPermaLink="true">https://blog.khinsen.net/posts/2013/05/08/A-critical-view-of-altmetrics.html</guid> <category><![CDATA[ science ]]></category> <description><![CDATA[ <a href="http://altmetrics.org/manifesto">Altmetrics</a> is one of the hotly debated topics in the Open Science movement today. In summary, the idea is that traditional bibliometric measures (citation counts, impact factors, h factors, ...) are too limited because they miss all the scientific activity that happens outside of the traditional journals. That includes the production of scientific contributions that are not traditional papers (i.e. datasets, software, blog posts, etc.) and the references to scientific contributions that are not in the citation list of a traditional paper (blogs, social networks, etc.). Note that the altmetrics manifesto describes altmetrics as a tool to help find scientists publications worth reading. I find it hard to believe that its authors have not thought of applications in evaluation of researchers and institutions, which will inevitably happen if altmetrics ever takes off.<br><br>At first sight, altmetrics appear as an evident "update" to traditional bibliometry. It sounds pretty obvious that, as scientific communication moves on to new media and finds new forms of expressions, bibliometry should adapt. On the other hand, bibliometry is considered a more less necessary evil by most scientists. Many deplore today's "publish or perish" culture and correctly observe that it is harmful to science in the long term, giving more importance to the marketing of research studies than to their careful design and meticulous execution. I haven't yet seen any discussion of this aspect in the context of altmetrics, so I'd like to start such a discussion with this post.<br><br>First of all, why is bibliometry so popular, and why is it harmful in the long run? Second, how will this change if and when altmetrics are adopted by the scientific community?<br><br>Bibliometry provides measures of scientific activity that have two important advantages: they are objective, based on data that anyone can check in principle, and they can be evaluated by anyone, even by a computer, without any need to understand the contents of scientific papers. On the downside, those measures can only indirectly represent scientific quality precisely because they ignore the contents. Bibliometry makes the fundamental assumption that the way specific articles are received by the scientific community can be used as a proxy for quality. That assumption is, of course, wrong, and that's how bibliometry ultimately harms the progress of science.<br><br>The techniques that people use to improve their bibliometrical scores without contributing to scientific progress are well known: dilution of content (more articles with less content per article), dilution of authorship (agreements between scientists to add each others' names to their works), marketing campaigns for getting more citations, application of a single technique to lots of very similar applications even if that adds no insight whatsoever. Altmetrics will cause the same techniques to be applied to datasets and software. For example, I expect scientific software developers to take Open Source libraries and re-publish them with small modifications under a new name, in order to have their name attached to them. Unless we come up with better techniques for software installation and deployment, this will probably make the management of scientific software a bit more complicated because we will have to deal with lots of small libraries. That's a technical problem that can and should be solved with a technical solution.<br><br>However, these most direct and most discussed negative consequences of bibliometry are not the only ones and perhaps not the worst. The replacement of expert judgement by majority vote, which is the basis of bibliometry, also in its altmetrics incarnation, leads to a phenomenon which I will call "scientiic bubbles" in analogy to market bubbles in economy. A market bubble occurs if the price of a good is determined not by the people who buy it to satisfy some need, but by traders and speculators who try to estimate the future price of the good and make a profit from a rise or fall relative to the current price. In science, the "client" whose "need" is fulfilled by a scientific study is mainly future science, plus in the case of applied research engineering and product development. The role of traders and speculators is taken by referees and journal editors. A scientific bubble is a fashionable topic that many people work on not because of its scientific interest but because of the chance it provides to get a highly visible publication. Like market bubbles, scientific bubbles eventually explode when people realize that the once fashionable topic was a dead end. But before exploding, a bubble has wasted much money and intellectual energy. It may also have blocked alternative and ultimately more fruitful research projects that were refused funding because they were in contradiction with the dominating fashionable point of view.<br><br>My prediction is that altmetrics will make bubbles more numerous and more severe. One reason is the wider basis of sources from which references are counted. In today's citation-based bibliometry, citations come from articles that went through some journal's peer-reviewing process. No matter how imperfect peer review is, it does sort out most of the unfounded and obviously wrong contributions.  To get a paper published in a journal whose citations count, you need a minimum of scientific competence. In contrast, anyone can publish an opinion on Twitter or Facebook. Since for any given topic the number of experts is much smaller than the number of people with just some interest, a wider basis for judgement automatically means less competence on average. As a consequence, high altmetrics scores are best obtained by writing articles that appeal to the masses who can understand what the work is about but not judge if it is well-founded. Another reason why altmetrics will contribute to bubbles is the positive feedback loop created by people reading and citing publications because they are already widely read and cited. That effect is dampened in traditional bibliometry because of the slowness of the publishing and citation mechanism.<br><br>My main argument ends here, but I will try to anticipate some criticisms and reply to them immediately.<br><br>One objection I expect is that the analysis of citation graphs can be used to assign a kind of reputation to each source and weight references by this reputation. That is the principle of Google's famous PageRank algorithm. However, any analysis of the citation graph suffers from the same fundamental problem as bibliometry itself: a method that only looks at relations between publications but not at their contents can't distinguish a gem from a shiny bubble. There will be reputation bubbles just like there are topic bubbles. No purely quantitative analysis can ever make a statement about quality. The situation is similar to mathematical formalisms, with citation graph analysis taking the role of formal proof and scientific quality the role of truth in Gödel's incompleteness theorem.<br><br>Another likely criticism is that the concept of the scientific bubble is dubious. Many paths of scientific explorations have turned out to be failures, but no one could possibly have predicted this in the beginning. In fact, many ultimately successful strategies have initially been criticized as hopeless. Moreover, exploration of a wrong path can still lead to scientific progress, once the mistake has been understood. How can one distinguish promising but ultimately wrong ideas from bubbles? The borderline is indeed fuzzy, but that doesn't mean that the concept of a bubble is useless. It's the same for market bubbles, which exist but are less severe when a good is traded both for consumption and for speculation. My point is that the bubble phenomenon exists and is detrimental to scientific progress.
 ]]></description> </item><item> <title>Lessons from sixteen years of molecular simulation in Python</title> <link>https://blog.khinsen.net/posts/2013/04/10/Lessons-from-sixteen-years-of-molecular-simulation-in-Python.html</link> <pubDate>2013-04-10</pubDate> <author>Konrad Hinsen</author> <guid isPermaLink="true">https://blog.khinsen.net/posts/2013/04/10/Lessons-from-sixteen-years-of-molecular-simulation-in-Python.html</guid> <category><![CDATA[ computational science ]]></category><category><![CDATA[ mmtk ]]></category><category><![CDATA[ python ]]></category><category><![CDATA[ scientific computing ]]></category> <description><![CDATA[ A while ago I was chatting with two users of my <a href="http://dirac.cnrs-orleans.fr/MMTK/">Molecular Modelling Toolkit</a> (MMTK), a library for molecular simulations written in Python. One of them asked me what I would do differently if I were to write MMTK today. That's an interesting question, but not the kind of question I can answer in a sentence or two, so I promised to write a blog post about this. Here it is.<br><br>First, a bit of history. The first version of MMTK was released about 16 years ago. I don't have the exact data, but the first message on the MMTK mailing list, announcing MMTK release 1.0b2, is dated 29 May 1997. Back then Python 1.4 was the state of the art and Numerical Python was a young project that was just beginning to stabilize. MMTK was one of the first domain-specific scientific libraries written in Python, at a time when the scientific Python community was very small and its members were mostly considered cranks by their peers. MMTK was designed from the start as a Python library, with relatively small bits of C code for the time-critical stuff (mainly energy evaluation and MD integration), with NumPy arrays at the Python-C interface. This has since become one of the two main approaches to using Python in scientific computing, the other one being wrapper code around libraries written in C/C++ or Fortran.<br><br>So what would I do differently if I were to start writing MMTK today? Many things, for different reasons. Lets first get the obvious stuff out of the way: the Python ecosystem has evolved significantly since 1997, and of course I would use Python 3, and <a href="http://www.cython.org/">Cython</a> instead of C for the time-critical parts. I would also adopt many of the conventions that the community has developed but which weren't around in 1997. I might even be tempted to use bleeding-edge tools like <a href="http://numba.pydata.org/">Numba</a>, although with hesitation: Numba is not only a moving target at this time, but also requires dependencies (I am thinking mostly of LLVM) which are big and non-trivial to install. One lesson I have learned in 16 years of scientific Python is that dependencies can cause more trouble than they are worth. It's nice in theory to re-use existing tested code, but it also makes installation and deployment more cumbersome.<br><br>So far for changes in the Python ecosystem. What has changed as well, though at a slower pace, is the role of computation in science and in particular in molecular simulation. Back in 1997, there were a few molecular simulation ecosystems that operated almost in isolation. The big players were the <a href="http://www.charmm.org/">CHARMM</a>, <a href="http://ambermd.org/">AMBER</a>, and <a href="http://www.gromos.net/">GROMOS</a>/<a href="http://www.gromacs.org/">GROMACS</a> communities. Each of them had their own software, their own file formats, and their own force fields. Members of these communities would of course talk about science to each other, but not share any software or data. Developing new computational methods required a serious investment into one of these ecosystems. That was in fact my main motivation for developing MMTK: I figured that I would be more efficient (not to mention more satisfied) writing a new system from scratch using modern development tools than trying to get familiar with crufty Fortran code. But I adopted basically the same approach with MMTK: I created a new ecosystem without much regard to sharing code or data with the rest of the world. As an illustration, MMTK defines its own trajectory format which I still consider superior to what the rest of the world is doing, but which is undeniably hard to use without MMTK, given that the definition of a universe is stored as an executable Python expression. MMTK also encourages storing data as Python pickle files, which are even harder to deal with for other programs.<br><br>Today we are seeing a change in attitude in computational science that I am sure will soon reach the molecular simulation community as well. People are starting to realize that computational results have serious reliability problems. The most publicized case in the structural biology community was the <a href="http://www.ncbi.nlm.nih.gov/pubmed/17185570">retraction</a> of a few important published protein structures following the discovery of a bug in the data processing software that lead to completely wrong final structures. This and similar events point to the urgent need for better validation of computational results. One aspect of validation is re-running the same computation with different tools. Another aspect is publishing both software and raw data, enabling other scientists to inspect them and check their validity. Technology for sharing scientific code and data exists today (have a look at <a href="https://github.com/">Github</a>, <a href="http://bitbucket.org/">Bitbucket</a>, and <a href="http://figshare.com/">figshare</a>, for example). But in molecular simulation, there are still important practical barriers to such validation attempts, in particular the use of program-specific and badly documented file formats. While MMTK's file formats are documented, they are still program-specific and thus incompatible with the requirements of the future.<br><br>The sentence that I would like to write now is "If I were to rewrite MMTK today, I would use the exchange data formats accepted by the molecular simulation community". But those formats don't exist yet, although there are a few initiatives to develop them. My own contribution to this effort is the <a href="http://bitbucket.org/molsim/mosaic/">Mosaic</a> data model and data formats - if you are interested in this subject, please have a look at it and send me your feedback. Mosaic will of course find its way into future versions of MMTK.<br><br>Finally, there are things I would do differently because the experience with MMTK has shown that a few initial design decisions were not the best ones. Number one is the absence of stable atom numbers. In MMTK, each atom and molecule is represented by a unique Python object, and there are ways to refer uniquely to everything by using Python expressions. But there is no such thing as a unique order of atoms that would assign a number to each one. Atoms do have numbers by which the low-level C code refers to them, but these numbers can be different every time you run a Python script. My original design goal was to discourage the use of numbers to refer to atoms, because this is an important source of mistakes if the simulated system undergoes changes. But every other molecular simulation program out there uses numbers to refer to atoms, so people are used to them. For interoperability with other programs, atom numbers are fundamental. There are ways to handle such situations, of course, but it's a constant source of headaches.<br><br>The other design aspect that I would change if I were to rewrite MMTK today is the hierarchy of chemical objects. MMTK has Atoms, Groups, Molecules, and Complexes, plus specializations such as AminoAcidResidue (a special Group), PeptideChain (a special Molecule), and Protein (a special Complex). While all of these correspond to some chemical reality, the system is more complex than required for molecular simulation, leading in some situations to code that is bloated by irrelevant special cases. Today I'd go for just Atoms and Groups, with special features of specific kinds of groups indicated by attributes rather than specific classes.
 ]]></description> </item><item> <title>Integrating scientific software and datasets into the citation record</title> <link>https://blog.khinsen.net/posts/2012/11/14/Integrating-scientific-software-and-datasets-into-the-citation-record.html</link> <pubDate>2012-11-14</pubDate> <author>Konrad Hinsen</author> <guid isPermaLink="true">https://blog.khinsen.net/posts/2012/11/14/Integrating-scientific-software-and-datasets-into-the-citation-record.html</guid> <category><![CDATA[ science ]]></category> <description><![CDATA[ <p>This morning I read <a href="http://ivory.idyll.org/blog/w4s-overview.html">C. Titus Brown's blog post</a> on how science could be so much better if scientitic data and the software used to work with it were openly available for reuse. One problem he mentions, like many others have done before, is the lack of incentive for publishing anything else but standard scientific papers. What matters for a scientist's career and for grant applications is papers, papers, papers. Any contribution that's not in a scientific journal with a reputation and an impact factor is usually ignored, even if its real impact exceeds that of many papers that nobody really wants to read.</p><p>Ideally, published scientific data and software should be treated just like a paper: it should be citeable and it should appear in the citation databases that are used to calculate impact factors, h factors, and whatever other metrics bibliometrists come up with and evaluation committees appreciate for their ease of use.</p><p>Treating text (i.e. papers), data, and code identically also happens to be useful for making scientific publications more useful to the reader, by adding interactive visualization and exploration of procedures (such as varying parameters) to the static presentation of results in a standard paper. This idea of "executable papers" has generated a lot of interest recently, as shown by <a href="http://www.executablepapers.com/">Elsevier's Executable Paper Challenge</a> and the <a href="https://sites.google.com/site/beyondthepdf/">Beyond the PDF</a> workshop. For a technical description of how this can be achieved, see my <a href="http://dirac.cnrs-orleans.fr/plone/software/activepapers/">ActivePapers</a> project and/or the <a href="http://www.sciencedirect.com/science/article/pii/S1877050911001190">paper</a> describing it. In the ActivePapers framework, a reference to code being called, or to a dataset being reused, is exactly identical to a reference to a published paper. It would then be much easier for citation databases to include all references rather than filter out the ones that are "classical" citations. And that's a good motivation to finally treat all scientific contributions equally.</p><p>Since the executable papers idea is much easier to sell than the idea of an upated incentive system, a seemingly innocent choice in technology could end up helping to change the way scientists and research projects are evaluated.</p>
 ]]></description> </item><item> <title>The ultimate calculator for Android and iOS</title> <link>https://blog.khinsen.net/posts/2012/09/07/The-ultimate-calculator-for-Android-and-iOS.html</link> <pubDate>2012-09-07</pubDate> <author>Konrad Hinsen</author> <guid isPermaLink="true">https://blog.khinsen.net/posts/2012/09/07/The-ultimate-calculator-for-Android-and-iOS.html</guid> <category><![CDATA[ computational science ]]></category><category><![CDATA[ programming ]]></category> <description><![CDATA[ Calculators are among the most popular applications for smartphones, and therefore it is not surprising that the Google Play Store has more than 1000 calculators for the Android platform. Having used HP's scientific calculators for more than 20 years, I picked RealCalc when I got my Android phone and set it to RPN mode. It works fine, I have no complaints about it. But I no longer use it because I found something much more powerful.<br><br>It's called "<a href="http://www.jsoftware.com/">J</a>", which isn't exactly a very descriptive name. And that's probably a good idea because describing it it not so easy. J is much more than a calculator, but it does the calculator job very well. It's actually a full programming language, but one that differs substantially from everything else that goes by that label. The best description for J I can come up with is "executable mathematical notation". You type an expression, and you get the result. That's in fact not very different from working interactively with Python or Matlab, except that the expressions are very different. You can write traditional programs in J, using loops, conditionals, etc., but you can a lot of work done without ever using these features.<br><br>The basic data structure in J is the array, which can have any number of dimensions. Array elements can be numbers, characters, or other arrays. Numbers (zero-dimensional arrays) and text strings (one-dimensional arrays of characters) are just special cases. In J jargon, which takes its inspiration from linguistics, data items are called "nouns". Standard mathematical operators (such as <code>+</code> or <code>-</code>) are called "verbs" and can have one or two arguments (one left, one right). An expression is called a "sentence". There are no precedence rules, the right argument of any verb being everything to its right. Given the large number of verbs in J, this initially unfamiliar rule makes a lot of sense. A simple example (also showing the use of arrays) is<br><pre>   2 * 3 + 10 20 30<br>26 46 66</pre><br>Up to here, J expressions are not very different from Python or Matlab expressions. What J doesn't have is functions with the familiar <em>f(x, y, z)</em> syntax, accepting any number of arguments. There are only verbs, with one or two arguments. But what makes J really different from the well-known languages for scientific computing are the "parts of speech" that have no simple equivalent elsewhere: adverbs and conjunctions.<br><br>An adverb takes a verb argument and produces a derived verb from it. For example, the adverb <code>~</code> takes a two-argument verb (a <em>dyad</em> in J jargon) and turns it into a one-argument verb (a <em>monad</em>) that's equivalent to using the dyad with two equal arguments. With <code>+</code> standing for plain addition, <code>+~</code> thus doubles its argument:<br><pre>   +~ 1 5 10 20<br>2 10 20 40</pre><br>meaning it is the same as<br><pre>   1 5 10 20 + 1 5 10 20<br>2 10 20 40</pre><br>A conjunction combines a verb with a noun or another verb to produce a derived verb. An example is ^:, the power conjunction, which applies a verb several times:<br><pre>   +~(^:2) 1 5 10 20<br>4 20 40 80<br>   +~(^:3) 1 5 10 20<br>8 40 80 160</pre><br>The parentheses are required to separate the argument of the power conjunction (2 or 3) from the array that is the argument to the resulting derived verb. To see the real power of the power conjunction, consider that it accepts <em>negative</em> arguments as well:<br><pre>   +~(^:_1) 1 5 10 20<br>0.5 2.5 5 10</pre><br>You have seen right: J can figure out that the inverse of adding a number to itself is dividing that number by two!<br><br>Pretty much any programming language permits you to assign values to names for re-use in later expressions. J is no exception:<br><pre>   data =. 1 5 10 20<br>   double =. +~<br>   double data<br>2 10 20 40<br>   inv =. ^:_1<br>   halve =. double inv<br>   halve data<br>0.5 2.5 5 10</pre><br>As you can see, names can be given not just to nouns (i.e. data), but also to verbs, adverbs, and conjunctions. Most J programs are just pieces of expressions that are assigned to names. Which means that the short summary of J that I have given here could well be all you ever need to know about the language - apart from the fact that you will have to acquire a working knowledge of many more verbs, adverbs, and conjunctions.<br><br>Before you rush off to the Play Store looking for J, let me add that J is not yet there, although it's supposed to arrive soon. For now, you have to <a href="https://github.com/mdykman/jconsole_for_android/tree/master/dist">download the APK</a> and install it yourself, using your preferred Android file manager. I should also point out that J is not just for Android. It's been around for more than 20 years, and you can get J for all the common computing platforms from <a href="http://jsoftware.com/">Jsoftware</a>. There's also an <a href="http://itunes.apple.com/us/app/j-programming-language/id532587550?mt%3D8">iOS</a> version for the iPhone and iPad. J's extreme terseness is a perfect fit for smartphones, where screen space is a scarce resource and where every character you don't have to type saves you a lot of time.
 ]]></description> </item><item> <title>The Nix package manager in computational science</title> <link>https://blog.khinsen.net/posts/2012/05/14/The-Nix-package-manager-in-computational-science.html</link> <pubDate>2012-05-14</pubDate> <author>Konrad Hinsen</author> <guid isPermaLink="true">https://blog.khinsen.net/posts/2012/05/14/The-Nix-package-manager-in-computational-science.html</guid> <category><![CDATA[ reproducible research ]]></category> <description><![CDATA[ <p>In an <a href="http://khinsen.wordpress.com/2012/04/10/unifying-version-control-and-dependency-management-for-reproducible-research/">earlier post</a>, I mentioned the <a href="http://nixos.org/">Nix package management system</a> as a candidate for ensuring reproducibility in computational science. What distinguishes Nix from the better known package managers (Debian, RPM, ...) is that it permits the installation of different versions of the same package in parallel, with a dependency tracking system that refers to a precise version of everything, including the versions of the development tools (compilers, ...) that were used to build the libraries and executables. Nix thus remembers for each package the complete details of how it can be reconstructed, which is what we would like to see for ensuring reproducibility.<br><br><p>There are, however, two caveats. First of all, Nix was designed for software installation management and not for computation. While in principle one could define the results (figures, tables, datasets) of some computation as a Nix package and perform the computation by installing the package, such an approach is quite cumbersome with the Nix support tools designed with a different task in mind. However, computation-specific support tools would probably suffice to fix this. Second, while the design of Nix looks quite sound, it is a young project with much less manpower behind it than the big package managers of the Linux world. This means there are fewer package definitions and they are overall less reliable. For example, I haven't yet managed to install my research computing environment (Python, NumPy, matplotlib, plus a few more packages) using Nix under MacOS X, because some packages simply fail to build. Again this is not an insurmountable problem, but it requires some serious effort to fix.<br><br><p>The Nix documentation is pretty good at describing how to use the package manager and the collection of package definitions for Linux and MacOS X named Nixpkgs. It is not so good at giving a basic understanding of how Nix works, which becomes important when you want to use it for something else than traditional package management. The following overview is the result of my own explorations of Nix. I am not a Nix authority, so be warned that there may be mistakes or misunderstandings.<br><br><p>At the heart of Nix is the "Nix store", a central database where everything managed by Nix is kept. Its default location is <code>/nix/store</code> and if you look at it you see an overwhelmingly long list of crypic filenames. Let's zoom in on something to see what's going on. Here is what <code>ls -l /nix/store/*zlib*</code> shows on my machine:<br><pre><br>-r--r--r-- 1 hinsen staff 1000 Jan  1  1970<br> /nix/store/12vkkhs36xffzpqjaaa3vqhqv2yc97vs-zlib-1.2.6.drv<br>-r--r--r-- 1 hinsen staff 1181 Jan  1  1970<br> /nix/store/gymcn145ihhmymm6yk2wxqfd49s5dzdq-zlib-1.2.6.drv<br>dr-xr-xr-x 5 hinsen staff  170 Jan  1  1970<br> /nix/store/mrdqnzzr80rkfnm59q6aywdba6776f66-zlib-1.2.6<br>-r--r--r-- 1 hinsen staff 1000 Jan  1  1970<br> /nix/store/sj8l48kfc40wh8adb5pa843lwy38hskb-zlib-1.2.6.drv<br>-r--r--r-- 1 hinsen staff 1686 Jan  1  1970<br> /nix/store/xpm2xja2zv5agmdzgi362jqd5xx9ny10-zlib-1.2.6.tar.gz.drv<br></pre><br>The single directory in that list actually contains the <code>zlib</code> installation in the familiar Unix file layout that you find under <code>/usr</code> or <code>/usr/local</code>:<br><pre><br>~&gt; ls -R /nix/store/mrdqnzzr80rkfnm59q6aywdba6776f66-zlib-1.2.6<br>/nix/store/mrdqnzzr80rkfnm59q6aywdba6776f66-zlib-1.2.6:<br>include  lib  share<br><br>/nix/store/mrdqnzzr80rkfnm59q6aywdba6776f66-zlib-1.2.6/include:<br>zconf.h  zlib.h<br><br>/nix/store/mrdqnzzr80rkfnm59q6aywdba6776f66-zlib-1.2.6/lib:<br>libz.1.2.6.dylib  libz.1.dylib	libz.a	libz.dylib  pkgconfig<br><br>/nix/store/mrdqnzzr80rkfnm59q6aywdba6776f66-zlib-1.2.6/lib/pkgconfig:<br>zlib.pc<br><br>/nix/store/mrdqnzzr80rkfnm59q6aywdba6776f66-zlib-1.2.6/share:<br>man<br><br>/nix/store/mrdqnzzr80rkfnm59q6aywdba6776f66-zlib-1.2.6/share/man:<br>man3<br><br>/nix/store/mrdqnzzr80rkfnm59q6aywdba6776f66-zlib-1.2.6/share/man/man3:<br>zlib.3.gz<br></pre><br>Note that it contains just <code>zlib</code>, and nothing else, in particular not <code>zlib</code>'s dependencies. Each library or application has its own directory in the Nix store.<br><br><p>Next, let's look at all the other files, those with the extension <code>.drv</code> (for "derivation", a Nix term for any artefact derived from human-provided input). There are three files that end in <code>zlib-1.2.6.drv</code> and one that ends in <code>zlib-1.2.6.tar.gz.drv</code>. Let's look at the contents of the last one first. I have made it more readable by adding whitespace:<br><pre><br>Derive(<br>   [("out",<br>     "/nix/store/s9qgdh7g22nx433y3lk62igm5zh48dxj-zlib-1.2.6.tar.gz",<br>     "sha256",<br>     "21235e08552e6feba09ea5e8d750805b3391c62fb81c71a235c0044dc7a8a61b")],<br>   [("/nix/store/lhc0qhfdrw32rj1z7s5p90nbjfnkydhb-stdenv.drv",<br>     ["out"]),<br>    ("/nix/store/pawry9l3415kwfbfh4zrhgnynwfb10bs-mirrors-list.drv",<br>     ["out"])],<br><br>   ["/nix/store/01w11lngp8s4lxllyr6xbmjfyrfkrn43-builder.sh"],<br><br>   "x86_64-darwin",<br>   "/bin/bash",<br>   ["-e",<br>    "/nix/store/01w11lngp8s4lxllyr6xbmjfyrfkrn43-builder.sh"],<br><br>   [("buildInputs",""),<br>    ("buildNativeInputs",""),<br>    ("builder","/bin/bash"),<br>    ("id",""),<br>    ("impureEnvVars","http_proxy https_proxy ftp_proxy all_proxy no_proxy NIX_CURL_FLAGS NIX_HASHED_MIRRORS NIX_MIRRORS_apache NIX_MIRRORS_bitlbee NIX_MIRRORS_cpan NIX_MIRRORS_debian NIX_MIRRORS_fedora NIX_MIRRORS_gcc NIX_MIRRORS_gentoo NIX_MIRRORS_gnome NIX_MIRRORS_gnu NIX_MIRRORS_gnupg NIX_MIRRORS_hashedMirrors NIX_MIRRORS_imagemagick NIX_MIRRORS_kde NIX_MIRRORS_kernel NIX_MIRRORS_metalab NIX_MIRRORS_oldsuse NIX_MIRRORS_opensuse NIX_MIRRORS_postgresql NIX_MIRRORS_savannah NIX_MIRRORS_sf NIX_MIRRORS_sourceforge NIX_MIRRORS_ubuntu NIX_MIRRORS_xorg"),<br>    ("mirrorsFile","/nix/store/mmk41rbja1fvclbr7ghirzcigxlzl6f0-mirrors-list"),<br>    ("name","zlib-1.2.6.tar.gz"),<br>    ("out","/nix/store/s9qgdh7g22nx433y3lk62igm5zh48dxj-zlib-1.2.6.tar.gz"),<br>    ("outputHash","06x6m33ls1606ni7275q5z392csvh18dgs55kshfnvrfal45w8r1"),<br>    ("outputHashAlgo","sha256"),<br>    ("preferHashedMirrors","1"),<br>    ("preferLocalBuild","1"),<br>    ("propagatedBuildInputs",""),<br>    ("propagatedBuildNativeInputs",""),<br>    ("showURLs",""),<br>    ("stdenv","/nix/store/9fnvs0bvhrszazham5cnl13h52hvm1rk-stdenv"),<br>    ("system","x86_64-darwin"),<br>    ("urls","http://www.zlib.net/zlib-1.2.6.tar.gz mirror://sourceforge/libpng/zlib/1.2.6/zlib-1.2.6.tar.gz")])<br></pre><br>If that looks like a computational expression in a programming language, that's because it is. Don't worry, it's not something you are expected to write yourself, these expressions are created from the package definitions written in a more user-friendly syntax called "Nix expressions", which is very well documneted in the Nix documentation.. The expression shown above defines how to make (or "realise" in Nix jargon) the derivation <code>/nix/store/s9qgdh7g22nx433y3lk62igm5zh48dxj-zlib-1.2.6.tar.gz</code>, which is a rather simple one because the file is simply downloaded and verified for a known checksum. But even such a simple derivation has dependencies: the "standard environment" <code>stdenv</code> and the list of download mirror sites, <code>mirrors-list</code>.<br><br><p>It's time to say something about those funny 32-character prefixes in all the file names in the Nix store. You may have noticed that the <code>zlib</code> file list above contains two entries for <code>zlib-1.2.6.drv</code> that are identical except for this prefix. It looks as if the prefix is there to distinguish things that would otherwise be identical. This is true, and the information encoded in the prefix (which is a hash code) is the complete set of dependencies. The two zlib derivations differ in the version of the standard environment they were built with. I have both of these in my Nix store because I have played around with different releases of Nixpkgs. Nix really tries to keep track of every single dependency, including the exact versions of the various tools (mainly compilers) that were used in building a binary installation. That means you can keep lots of different versions of every single item on your system at the same time, and trace back exactly how they were built. You can also send a copy of the relevant derivation files (those with the <code>.drv</code> extension) to someone else, who can reproduce the exact same environment by "realising" those derivations again.<br><br><p>With so many zlibs floating around, which one does Nix use when you ask it to install some application that uses zlib? The one you specify. When some application requires zlib as a dependency, you have to tell Nix exactly which zlib derivation you want to be used. You don't normally do this manually for every single build (though you could), you'd rather use a coherent set of package definitions (such as Nixpkgs) that specifies all the interdependencies among hundreds of packages. The package definitions take the form of "Nix expressions", which are written in a language specifically designed for this purpose. Files containing Nix expressions have the extension <code>.nix</code>. Since the language is rather well documented in the Nix manual, I won't say any more about it here. A good starting point is to explore Nixpkgs. It helps to know that the central file is <code>pkgs/top-level/all-packages.nix</code>. This file imports the definitions of individual packages from their respective packages and makes a consistent package collection from them. When you build a particular derivation from Nixpkgs, only the packages listed explicitly as its dependencies are available in the build environment that is set up specifically for this build operation. No "default library" (such as <code>/usr/lib</code>) is used at all.<br><br><p>There is one more layer to Nix, whose role is twofold: making it convenient for users to work with programs installed through Nix, and pemitting to remove packages that were installed but are no longer needed.<br>Let's start with the second aspect because it is the simpler one: packages can be removed as soon as nobody needs them any more. This requires a way to figure out which packages are still needed. Obviously the packages that some user on the system wants to access are "needed", and that's why cleanup is related to user profiles which I will cover in a minute. The remaining needed packages are the dependencies of other needed packages. So once we know the packages that all users put together request to use, we can figure out which packages can safely be deleted. This clean-up operation is called "garbage collection" and handled by the command <code>nix-store --gc</code>.<br><br><p>Nix user environments are managed using the command <code>nix-env</code>, and if you don't care about <i>how</i> Nix works, that command is the only one you may ever need. Each user has his/her own environment, of course, which consists mainly of a directory named <code>$HOME/.nix-profile</code>. That directory contains subdirectories called <code>bin</code>, <code>lib</code>, <code>man</code> etc. whose names should sound familiar. They contain nothing but symbolic links into the Nix store. These links define which package the user actually accesses, by putting <code>$HOME/.nix-profile/bin</code> on th3 <code>PATH</code> environment variable. When you use <code>nix-env</code> to install a package, Nix builds it and puts it into the Nix store (unless it's already there), and then creates symbolic links in your Nix profile, which may replace links to some different version of a package. It is important to understand that your use profile never enters into the build process of any Nix derivation. Your profile is exclusively for your own use and has no impact on Nix package management other than protecting the packages you use from being removed during garbage collection.<br><br><p>So far for a first report on my exploration of Nix. I will continue trying to get my computational environment built with Nix, so that I can start to explore how to use it for reproducible computations. Watch this space for news.<br><br><p>PS: After I published this post initially, the friendly people on the Nix mailing list pointed out some additional material for learning about Nix. First of all, there is <a href="http://www.st.ewi.tudelft.nl/~dolstra/pubs/phd-thesis.pdf">Eelco Dolstra's thesis</a> entitled "The Purely Functional Software Deployment Model", which is what you should read if you really want to know <em>everything</em> about Nix. There's also <a href="http://sandervanderburg.blogspot.com/">Sander van der Burg's blog</a> which has some very detailed posts about Nix and what it can be used for. You could start with <a>this</a> introduction.
 ]]></description> </item><item> <title>Unifying version control and dependency management for reproducible research</title> <link>https://blog.khinsen.net/posts/2012/04/10/Unifying-version-control-and-dependency-management-for-reproducible-research.html</link> <pubDate>2012-04-10</pubDate> <author>Konrad Hinsen</author> <guid isPermaLink="true">https://blog.khinsen.net/posts/2012/04/10/Unifying-version-control-and-dependency-management-for-reproducible-research.html</guid> <category><![CDATA[ programming ]]></category><category><![CDATA[ reproducible research ]]></category><category><![CDATA[ software ]]></category><category><![CDATA[ source code repositories ]]></category> <description><![CDATA[ When the Greek philosopher Heraclitus pronounced his famous "πάντα ῥεῖ" (everything flows), he most probably was not thinking about software. But it applies to software as much as to other aspects of life: software is in perpetual change, being modified to remove bugs, add features, and adapt it to changing environments. The management of change is now a well-established part of software engineering, with the most emblematic tool being version control. If you are developing software without using version control, stop reading this immediately and learn about <a href="http://mercurial.selenic.com/">Mercurial</a> or <a href="http://git-scm.com/">Git</a>, the two best version control systems available today. That's way more important than reading the rest of this post.<br><br>Software developers use version control to keep track of the evolution of their software, to coordinate team development, and to manage experimental features. But version control is also of interest for software users: it permits them to refer to a specific version of a piece of software they use in a unique and reproducible way, even if that version is not the current one, nor perhaps even an official numbered release. In fact, official numbered releases are becoming a relict of the past. They make little sense in an Open Source universe where everyone has access to source code repositories under version control. In that situation, an official release is nothing but a bookmark pointing to a specific commit number. There is no need for a release number.<br><br>Why would you want to refer to a specific version of a piece of software, rather than always use the latest one? There are many reasons. As software evolves, some bugs get fixed but others sneak in. You may prefer the bugs you know to the ones that could surprise you. Sometimes later versions of some software are not fully compatible with their predecessors, be it by design or by mistake. And even if you want to use the very latest version at any time, you might still want to note which version you used for a specific application. In scientific computing, this is one of the fundamental principles of reproducible research: note carefully, and publish, the exact versions of all pieces of software that were used for obtaining any published research result. It's the only way for you and others to be able to understand exactly what happened when you look at your work many years later.<br><br>Another undeniable reality of modern software, in particular in the Open Source universe, is that it's modular. Developers use other people's software, especially if it's well written and has the reputation of being reliable, rather than reinventing the wheel. The typical installation instructions of a piece of Open Source software start with a list of dependencies, i.e. packages you have to install before you can install the current one. And of course the packages in the dependency list have their own dependency list. The number of packages to install can be overwhelming. The difficulties of dependency management are so widespread that the term "dependency hell" has been coined to refer to them.<br><br>Systems programmers have come up with a solution to that problem as well: dependency management tools, better known as package managers. Such tools keep a database of what is installed and which package depends on which other ones. The well-known Linux distributions are based on such package managers, of which the ones developed by <a href="http://en.wikipedia.org/wiki/Advanced_Packaging_Tool">Debian</a> and <a href="http://en.wikipedia.org/wiki/RPM_Package_Manager">RedHat</a> are the most popular ones and are now used by other distributions as well. For MacOS X, <a href="http://www.macports.org/">MacPorts</a> and <a href="http://www.finkproject.org/">Fink</a> are the two most popular package managers, and I suspect that the Windows world has its own ones.<br><br>One of the major headaches that many computer users face is that version management and dependency management don't cooperate. While most package managers permit to state a minimal version number for a dependency, they don't permit to prescribe a precise version number. There is a good reason for this: the way software installation is managed traditionally on Unix systems makes it impossible to install multiple versions of the same package in parallel. If packages A and B both depend on C, but require different versions of it, there is simply no simple solution. Today's package managers sweep this problem under the rug and pretend that higher version numbers are always as least as good as their predecessors. They will therefore install the higher of the two version numbers required by A and B, forcing one of them to use a version different from its preference.<br><br>Anyone who has been using computers intensively for a few years has probably run into such a problem, which manifests itself by some program not working correctly any more after another one, seemingly unrelated, has been installed. Another variant is that an installation fails because some dependency is available in a wrong version. Such problems are part of "dependency hell".<br><br>This situation is particularly problematic for the computational scientist who cares about the reproducibility of computed results. At worst, verifying results from 2005 by comparing to results from 2009 can require two completely separate operating system installations running in separate virtual machines. Under such conditions, it is difficult to convince one's colleagues to adopt reproducible research practices.<br><br>While I can't propose a ready-to-use solution, I can point out some work that shows that there is hope for the future. One interesting tool is the <a href="http://nixos.org/nix/">Nix package manager</a>, which works much like the package managers by Debian or RedHat, but permits installing multiple versions of the same package in parallel, and registers dependencies with precise versions. It could be used as a starting point for managing software for reproducible research, the main advantage being that it should work with all existing software. The next step would be to make each result dataset or figure a separate "package" whose complete dependency list (software and datasets) is managed by Nix with references to precise version numbers. I am currently exploring this approach; watch this space for news about my progress.<br><br>For a system even better suited to the needs of reproducible computational science, I refer to my own <a href="http://dirac.cnrs-orleans.fr/plone/software/activepapers">ActivePapers</a> framework, which combines dependency management and version control for code and data with mechanisms for publishing code+data+documentation packages and re-use code from other publications in a secure way. I have to admit that it has a major drawback as well: it requires all code to run on the Java Virtual Machine (in order to guarantee portability and secure execution), which unfortunately means that most of today's scientific programs cannot be used. Time will tell if scientific computing will adopt some virtual machine in the future that will make such a system feasible in real life. Reproducible research might actually become a strong argument in favour of such a development.
 ]]></description> </item><item> <title>Julia: a new language for scientific computing</title> <link>https://blog.khinsen.net/posts/2012/04/04/Julia-a-new-language-for-scientific-computing.html</link> <pubDate>2012-04-04</pubDate> <author>Konrad Hinsen</author> <guid isPermaLink="true">https://blog.khinsen.net/posts/2012/04/04/Julia-a-new-language-for-scientific-computing.html</guid> <category><![CDATA[ programming ]]></category> <description><![CDATA[ <p>New programming languages are probably invented every day, and even those that get developed and published are too numerous to mention. New programming languages developed specifically for science and engineering are very rare, however, and that's why such a rare event deserves some publicity. A while ago, I saw an announcement for <a href="http://julialang.org/">Julia</a>, which announces itself as "a fresh approach to technical computing". I couldn't resist the temptation to download, install, and test-drive it. Here are my first impressions.<br><br><p>The languages used today for scientific computing can be grouped into four categories:<br><ul><br><li> Traditional compiled languages optimized for number crunching. The big player in this category is of course Fortran, but some recent languages such as X10, Chapel, or Fortress are trying to challenge it.</li><br><br><li> Rapid-development domain-specific languages, usually interpreted. Well-known examples are Matlab an R.</li><br><br><li> General-purpose statically compiled languages with libraries for scientific computing. C and C++ come to mind immediately.</li><br><br><li> General-purpose dynamic languages with libraries for scientific computing. The number one here is Python with its vast library ecosystem.</li><br></ul><br><br><p>What sets Julia apart is that it sits somewhere between the first two categories. It's compiled, but fully interactive, there is no separate compilation phase. It is statically typed, allowing for efficient compilation, but also has the default type "Any" that makes it work just like dynamically typed languages in the absence of type declarations. Type infererence makes the mix even better. If that sounds like the best of both worlds, it actually is. It has been made possible by modern code transformation techniques that don't really fit into the traditional categories of "compilers" and "interpreters". Like many other recent languages and language implementations, Julia uses <a href="http://llvm.org/">LLVM</a> as its infrastructure for these code transformations.<br><br><p>Julia has a well-designed type system with a clear orientation towards maths and number crunching: there is support for complex numbers, and first-class array support. What may seem surprising is that Julia is not object-oriented. This is neither an oversight nor a nostalgic return to the days of Fortran 77, but a clear design decision. Julia has type hierarchies and function polymorphism with dispatch on the types of all arguments. For scientific applications (and arguably for some others), this is more useful than OO style method dispatch on a single value.<br><br><p>Another unusual feature of Julia is a metaprogramming system that is very similar to Lisp macros, although it is slightly more complicated by the fact that Julia has a traditional syntax layer, whereas Lisp represents code by data structures.<br><br><p>So far for a summary of the language. The real question is: does it live up to its promises? Before I try to answer that question, I would like to point out that Julia is a young language that is still in flux and for now has almost no development tool support. For many real-life problems, there is no really good solution at the moment but it is clear that a good solution can be provided, it just needs to be done. What I am trying to evaluate is not if Julia is ready for real-life use (it is not), but whether there are any fundamental design problems.<br><br><p>The first question I asked myself is how well Julia can handle non-scientific applications. I just happened to see <a href="http://www.johndcook.com/blog/2012/04/02/why-scipy/">a blog post by John D. Cook</a> explaining why it's preferable to write math in a general-purpose language than to write non-math in a math language. My experience is exactly the same, and that's why I have adopted Python for most of my scientific programming. The point is that any non-trivial program sooner or later requires solving non-math problems (I/O, Web publishing, GUIs, ...). If you use a general-purpose language, you can usually just pick a suitable library and go ahead. With math-only languages such as Matlab, your options are limited, with interfacing to C code sometimes being the only way out.<br><br><p>So is it feasible to write Web servers or GUI libraries in Julia? I would say yes. All the features of general-purpose languages are there or under consideration (I am thinking in particular of namespaces there). With the exception of systems programming (device drivers and the like), pretty much every programming problem can be solved in Julia with no more effort than in most other languages. The real question is if it will happen. Julia is clearly aimed at scientists and engineers. It is probably good enough for doing Web development, but it has nothing to offer for Web developers compared to well-established languages. Will scientists and engineers develop their own Web servers in Julia? Will Web developers adopt Julia? I don't know.<br><br><p>A somewhat related question is that of interfacing to other languages. That's a quick way to make lots of existing code available. Julia has a C interface (which clearly needs better tool support, but I am confident that it will come), which can be used for other sufficiently C-like languages. It is not clear what effort will be required to interface Julia with languages like Python or Ruby. I don't see why it couldn't be done, but I can't say yet whether the result will be pleasant to work with.<br><br><p>The second question I explored is how well Julia is suited to my application domain, which is molecular simulations and the analysis of experimental data. Doing molecular simulation in Julia looks perfectly feasible, although I didn't really implement any non-trivial algorithm yet. What I concentrated on first is data analysis, because that's where I could profit most from Julia's advantages. The kinds of data I mainly deal with are (1) time series and frequency spectra and (2) volumetric data. For time series, Julia works just fine. My biggest stumbling block so far has been volumetric data.<br><br><p>Volumetric data is usually stored in a 3-dimensional array where each axis corresponds to one spatial dimension. Typical operations on such data are interpolation, selection of a plane (2-d subarray) or line (1-d subarray), element-wise multiplication of volume, plane, or line arrays, and sums over selected regions of the data. Using the general-purpose array systems I am familiar with (languages such as APL, libraries such as NumPy for Python), all of this is easy to handle.<br><br><p>Julia's arrays are different, however. Apparently the developers' priority was to make the transition to Julia easy for people coming from Matlab. Matlab is based on the principle that "everything is a matrix", i.e. a two-dimensional array-like data structure. Matlab vectors come on two flavors, row and column vectors, which are actually matrices with a single row or column, respectively. Matlab scalars are considered 1x1 matrices. Julia is different because it has arrays of arbitrary dimension. However, array literals are made to resemble Matlab literals, and array operations are designed to behave as similar as possible to Matlab operations, in particular for linear algebra functions. In Julia, as in Matlab, matrix multiplication is considered more fundamental than elementwise multiplication of two arrays.<br><br><p>For someone used to arrays that are nothing more than data structures, the result looks a bit messy. Here are some examples:<br><br><pre><br>julia&gt; a = [1; 2]<br>[1, 2]<br><br>julia&gt; size(a)<br>(2,)<br><br>julia&gt; size(transpose(a))<br>(1,2)<br><br>julia&gt; size(transpose(transpose(a)))<br>(2,1)<br></pre><br>I'd expect that the transpose of the transpose is equal to the original array, but that's not the case. But what does transpose do to a 3d array? Let's see:<br><pre><br>julia&gt; a = [x+y+z | x=1:4, y=1:2, z = 1:3]<br>4x2x3 Int64 Array:<br>...<br><br>ulia&gt; transpose(a)<br>no method transpose(Array{Int64,3},)<br> in method_missing at base.jl:60<br></pre><br>OK, so it seems this was not considered important enough, but of course that can be fixed.<br><br><p>Next comes indexing:<br><pre><br>julia&gt; a = [1 2; 3 4]<br>2x2 Int64 Array:<br> 1  2<br> 3  4<br><br>julia&gt; size(a)<br>(2,2)<br><br>julia&gt; size(a[1, :])<br>(1,2)<br><br>julia&gt; size(a[:, 1])<br>(2,1)<br><br>julia&gt; size(a[1, 1])<br>()<br></pre><br>Indexing a 2-d array with a single number (all other indices being the all-inclusive range <code>:</code>) yields a 2-d array. Indexing with two number indices yields a scalar. So how do I extract a 1-d array? This generalizes to higher dimensions: if the number of number indices is equal to the rank of the array, the result is a scalar, otherwise it's an array of the same rank as the original.<br><br><p>Array literals aren't that frequent in practice, but they are used a lot in development, for quickly testing functions. Here are some experiments:<br><pre><br>julia&gt; size([1 2])<br>(1,2)<br><br>julia&gt; size([1; 2])<br>(2,)<br><br>julia&gt; size([[1;2] ; [3;4]])<br>(4,)<br><br>julia&gt; size([[1;2] [3;4]])<br>(2,2)<br><br>julia&gt; size([[1 2] [3 4]])<br>(1,4)<br><br>julia&gt; size([[[1 2] [3 4]] [[5 6] [7 8]]])<br>(1,8)<br></pre><br>Can you guess the rules? Once you have them (or looked them up in the Julia manual), can you figure out how to write a 3-d array literal? I suspect it's not possible.<br><br><p>Next, summing up array elements:<br><pre><br>julia&gt; sum([1; 2])<br>3<br><br>julia&gt; sum([1 2; 3 4])<br>10<br></pre><br>Apparently <code>sum</code> doesn't care about the shape of my array, it always sums the individual elements. Then how do I do a sum over all the rows?<br><br><p>I have tried to convert some of my basic data manipulation code from Python/NumPy to Julia, but found that I always spent most of the time fighting against the built-in array operations, which are clearly not made for my kind of application. In some cases a change of attitude may be sufficient. It seems natural to me that a plane extracted from volumetric data should be a 2-d array, but maybe if I decide that should be a 3-d array of "thickness" 1, everything will be easy.<br><br><p>I haven't tried yet, because I know there are cases that cannot be dealt with in that way. Suppose I have a time series of volumetric data that I store in a 4-d array. Obviously I want to be able to apply functions written for static volumetric data (i.e. 3-d arrays) to an element of such a time series. Which means I <i>do</i> need a way to extract a 3-d array out of a 4-d array.<br><br><p>I hope that what I need is there and I just didn't find it yet. Any suggestions are welcome. For now, I must conclude that test-driving Julia is a frustrating experience: the language holds so many promises, but fails for my needs due to superficial but practically very important problems.<br>
 ]]></description> </item><item> <title>Binary operators in Python</title> <link>https://blog.khinsen.net/posts/2012/03/29/Binary-operators-in-Python.html</link> <pubDate>2012-03-29</pubDate> <author>Konrad Hinsen</author> <guid isPermaLink="true">https://blog.khinsen.net/posts/2012/03/29/Binary-operators-in-Python.html</guid> <category><![CDATA[ programming ]]></category> <description><![CDATA[ <p>A two-hour train journey provided the opportunity to watch the video recording of the <a href="http://marakana.com/s/2012_pydata_workshop_panel_with_guido_van_rossum,1091/index.html">Panel with Guido van Rossum</a> at the recent <a href="https://pydataworkshop.eventbrite.com/">PyData Workshop</a>. The lengthy discussion about <a href="http://www.python.org/dev/peps/pep-0225/">PEP 225</a> (which proposes to add additional operators to Python that would enable to have both elementwise and aggregate operations on the same objects, in particular for providing both matrix and elementwise multiplication on arrays with a nice syntax) motivated me to write up my own thoughts about what's wrong with operators in Python from my computational scientist's point of view.</p><p>The real problem I see is that operators map to methods. In Python, <code>a*b</code> is just syntactic sugar for <code>a.__mul__(b)</code>. This means that it's the type of <code>a</code> that decides how to do the multiplication. The method implementing this operation can of course check the type of <code>b</code>, and it can even decide to give up and let <code>b</code> handle everything, in which case Python does <code>b.__rmul__(a)</code>. But this is just a kludge to work around the real weakness of the operators-map-to-methods approach. Binary operators fundamentally require a dispatch on <i>both</i> types, the type of <code>a</code> and the type of <code>b</code>. What <code>a*b</code> <i>should</i> map to is <code>__builtins__.__mul__(a, b)</code>, a global function that would then implement a binary dispatch operation. Implementing that dispatch would in fact be the real problem to solve, as Python currently has no multiple dispatch mechanisms at all.</p><p>But would multiple dispatch solve the issue addressed by PEP 225? Not at all, directly. But it would make some of the alternatives mentioned there feasible. A proper multiple dispatch system would allow NumPy (or any other library) to decide what multiplication of its own objects by a number means, no matter if the number is the first or the second factor.</p><p>More importantly, multiple dispatch would allow a major cleanup of many scientific packages, including NumPy, and even clean up the basic Python language by getting rid of <code>__rmul__</code> and friends. NumPy's current aggressive handling of binary operations is actually more of a problem for me than the lack of a nice syntax for matrix multiplication.</p><p>There are many details that would need to be discussed before binary dispatch could be proposed as a PEP. Of course the old method-based approach would need to remain in place as a fallback, to ensure compatibility with existing code. But the real work is defining a good multiple dispatch system that integrates well with Python's dynamical type system and allows the right kind of extensibility. That same multiple dispatch method could then also be made available for use in plain functions.</p>
 ]]></description> </item><item> <title>Python becomes a platform</title> <link>https://blog.khinsen.net/posts/2012/03/15/Python-becomes-a-platform.html</link> <pubDate>2012-03-15</pubDate> <author>Konrad Hinsen</author> <guid isPermaLink="true">https://blog.khinsen.net/posts/2012/03/15/Python-becomes-a-platform.html</guid> <category><![CDATA[ programming ]]></category> <description><![CDATA[ <p>The recent announcement of <a title="clojure-py" href="https://github.com/halgari/clojure-py">clojure-py</a> made some noise in the <a href="http://clojure.org/">Clojure</a> community, but not, as far as I can tell, in the Python community. For those who haven't heard of it before, clojure-py is an implementation of the Clojure language in Python, compiling Clojure code to bytecode for Python's virtual machine. It's still incomplete, but already usable if you can live with the subset of Clojure that has been implemented.</p><p>I think that this is an important event for the Python community, because it means that Python is no longer just a language, but is becoming a platform. One of the stated motivations of the clojure-py developers is to tap into the rich set of libraries that the Python ecosystem provides, in particular for scientific applications. Python is thus following the path that Java already went in the past: the Java virtual machine, initially designed only to support the Java language, became the target of many different language implementations which all provide interoperation with Java itself.</p><p>It will of course be interesting to see if more languages will follow once people realize it can be done. The prospect of speed through PyPy's JIT, another stated motivation for the clojure-py community, could also get more lanuage developers interested in Python as a platform.</p><p>Should Python programmers care about clojure-py? I'd say yes. Clojure is strong in two areas in which Python isn't. One of them is metaprogramming, a feature absent from Python which Clojure had from the start through its Lisp heritage. The other feature is persistent immutable data structures, for which clojure-py provides an implementation in Python. Immutable data structures make for more robust code, in particular but not exclusively for concurrent applications.</p><p> </p>
 ]]></description> </item><item> <title>Teaching parallel computing in Python</title> <link>https://blog.khinsen.net/posts/2012/02/06/Teaching-parallel-computing-in-Python.html</link> <pubDate>2012-02-06</pubDate> <author>Konrad Hinsen</author> <guid isPermaLink="true">https://blog.khinsen.net/posts/2012/02/06/Teaching-parallel-computing-in-Python.html</guid> <category><![CDATA[ programming ]]></category> <description><![CDATA[ <p>Every time I teach a class on parallel computing with Python using the multiprocessing module, I wonder if multiprocessing is really mature enough that I should recommend using it. I end up deciding for it, mostly because of the lack of better alternatives. But I am not happy at all with some features of multiprocessing, which are particularly nasty for non-experts in Python. That category typically includes everyone in my classes.</p><p>To illustrate the problem, I'll start with a simple example script, the kind of example you put on a slide to start explaining how parallel computing works:</p><br><pre>from multiprocessing import Pool<br>import numpy<br>pool = Pool()<br>print pool.map(numpy.sqrt, range(100))<br></pre><br><p>Do you see the two bugs in this example? Look again. No, it's nothing trivial such as a missing comma or inverted arguments in a function call. This is code that I would actually expect to work. But it doesn't.</p><p>Imagine your typical student typing this script and running it. Here's what happens:</p><br><pre><br>Process PoolWorker-1:<br>Process PoolWorker-2:<br>Traceback (most recent call last):<br> File "/opt/local/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/multiprocessing/process.py", line 232, in _bootstrap<br>Traceback (most recent call last):<br> File "/opt/local/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/multiprocessing/process.py", line 232, in _bootstrap<br> self.run()<br> File "/opt/local/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/multiprocessing/process.py", line 88, in run<br> self._target(*self._args, **self._kwargs)<br> File "/opt/local/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/multiprocessing/pool.py", line 57, in worker<br> task = get()<br> File "/opt/local/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/multiprocessing/queues.py", line 352, in get<br> return recv()<br>UnpicklingError: NEWOBJ class argument has NULL tp_new<br> self.run()<br> File "/opt/local/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/multiprocessing/process.py", line 88, in run<br> self._target(*self._args, **self._kwargs)<br> File "/opt/local/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/multiprocessing/pool.py", line 57, in worker<br> task = get()<br> File "/opt/local/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/multiprocessing/queues.py", line 352, in get<br> return recv()<br>UnpicklingError: NEWOBJ class argument has NULL tp_new<br></pre><br><p>Python experts will immediately see what's wrong: numpy.sqrt is not picklable. This is mostly an historical accident. Nothing makes it impossible or even difficult to pickle C functions such as numpy.sqrt, but since pickling was invented and implemented long before parallel computing, at a time when pickling functions was pretty pointless, so it's not possible. Implementing it today within the framework of Python's existing pickle protocol is unfortunately not trivial, and that's why it hasn't been implemented.</p><p>Now try to explain this to non-experts who have basic Python knowledge and want to do parallel computing. It doesn't hurt of course if they learn a bit about pickling, since it also has a performance impact on parallel programs. But due to restrictions such as this one, you have to explain this right at the start, although it would be better to leave this for the "advanced topics" part.</p><p>OK, you have passed the message, and your students fix the script:</p><br><pre><br>from multiprocessing import Pool<br>import numpy<br><br>pool = Pool()<br><br>def square_root(x):<br>    return numpy.sqrt(x)<br><br>print pool.map(square_root, range(100))<br></pre><br><p>And then run it:</p><br><pre><br>Process PoolWorker-1:<br>Traceback (most recent call last):<br>Process PoolWorker-2:<br>Traceback (most recent call last):<br> File "/opt/local/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/multiprocessing/process.py", line 232, in _bootstrap<br> File "/opt/local/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/multiprocessing/process.py", line 232, in _bootstrap<br> self.run()<br> self.run()<br> File "/opt/local/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/multiprocessing/process.py", line 88, in run<br> File "/opt/local/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/multiprocessing/process.py", line 88, in run<br> self._target(*self._args, **self._kwargs)<br> self._target(*self._args, **self._kwargs)<br> File "/opt/local/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/multiprocessing/pool.py", line 57, in worker<br> File "/opt/local/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/multiprocessing/pool.py", line 57, in worker<br> task = get()<br> File "/opt/local/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/multiprocessing/queues.py", line 352, in get<br> return recv()<br>AttributeError: 'module' object has no attribute 'square_root'<br> task = get()<br> File "/opt/local/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/multiprocessing/queues.py", line 352, in get<br> return recv()<br>AttributeError: 'module' object has no attribute 'square_root'<br></pre><br><p>At this point, even many Python experts would start scratching their heads. In order to understand what is going on, you have to know how multiprocessing creates its processor pools. And since the answer (on Unix systems) is "fork", you have to have a pretty good idea of Unix process creation to see the cause of the error. Which then allows to find a trivial fix:</p><br><pre><br>from multiprocessing import Pool<br>import numpy<br><br>def square_root(x):<br>    return numpy.sqrt(x)<br><br>pool = Pool()<br><br>print pool.map(square_root, range(100))<br></pre><br><p>Success! It works! But... how do you explain this to your students?</p><p>To make it worse, this script works but is still not correct: it has a portability bug because it doesn't work under Windows. So you add a section on Windows process management to the section on Unix process management. In the end, you have spent more time explaining the implementation restrictions in multiprocessing than how to use it. A great way to reinforce the popular belief that parallel computing is for experts only.</p><p>These issues with multiprocessing are a classical case of a <a href="https://en.wikipedia.org/wiki/Leaky_abstraction">leaky abstraction</a>: multiprocessing provides a "pool of worker processes" abstraction to the programmer, but in order to use it, the programmer has to understand the implementation. In my opinion, it would be preferable to have a less shiny API, but one which reflects the implementation restrictions. The pickle limitations might well go away one day (see <a href="http://www.python.org/dev/peps/pep-3154/">PEP 3154</a>, for example), but until this really happens, I'd prefer an API that does not suggest possibilities that don't exist.</p><p>I have actually thought about this myself a long time ago, when designing the API of <a href="http://dirac.cnrs-orleans.fr/ScientificPython/ScientificPythonManual/Scientific.DistributedComputing.MasterSlave-module.html">my own parallel computing framework</a> for Python (which differs from multiprocessing in being designed for distributed-memory machines). I ended up with an API that forces all functions that implement tasks executed in parallel to be methods of a single class, or functions of a single module. My API also contains an explicit "run parallel job now" call at the end. This is certainly less elegant than the multiprocessing API, but it actually works as expected.</p>
 ]]></description> </item><item> <title>A rant about mail clients</title> <link>https://blog.khinsen.net/posts/2011/11/04/A-rant-about-mail-clients.html</link> <pubDate>2011-11-04</pubDate> <author>Konrad Hinsen</author> <guid isPermaLink="true">https://blog.khinsen.net/posts/2011/11/04/A-rant-about-mail-clients.html</guid> <category><![CDATA[ rants ]]></category> <description><![CDATA[ A while ago I described why <a href="http://khinsen.wordpress.com/2011/01/04/bye-bye-ical-welcome-org-mode/">migrated my agendas from iCal to orgmode</a>. To sum it up, my main motivation was to gain more freedom in managing my information: where iCal imposes a rigid format for events and insists on storing them in its own database, inaccessible to other programs, orgmode lets me mix agenda information with whatever else I like in plain text files. Today's story is a similar one, but without the happy end. I am as much fed up with mail clients as I was with iCal, and for much the same reasons, but I haven't yet found anything I could migrate to.<br><br>From an information processing point of view, an e-mail message is not very different from lots of other pieces of data. It's a sequence of bytes respecting a specific format (defined by a handful of standards) to allow its unambiguous interpretation by various programs in the processing chain. An e-mail message can perfectly well be stored in a file and in fact most e-mail clients permit saving a message to a file. Unfortunately, the number of e-mail clients able to open and display correctly such a file is already much smaller. But when it comes to collections of messages, information processing freedom ends completely.<br><br>Pretty much every mail client's point of view is that all of a user's mail is stored in some database, and that it (the client) is free to handle this database in whatever way it likes. The user's only access to the messages is <em>the</em> mail client. The one and only. The only exception is server-based mail databases handled via the IMAP protocol, where multiple clients can work with a common database. If you don't use IMAP, you have no control over how and where your mail is stored, who has access to it, etc.<br><br>What I'd like to do is manage mail just like I manage other files. A mailbox should just be a directory containing messages, one per file. Mailboxes could be stored anywhere in the file system. Mailboxes could be shared through the file system, and backed up via the file system. They could be grouped with whatever other information in whatever way that suits me. I would double-click on a message to view it, or double-click on a mailbox directory to view a summary, sorted in the way I like it. Or I would use command-line tools to work on a message or a mailbox. I'd pick the best tool for each job, just like I do when working with any other kind of file.<br><br>Why all that isn't possible remains a mystery to me. The technology has been around for decades. The good old Maildir format would be just fine for storing mailboxes anywhere in the file system, as would the even more venerable mbox format. But even mail clients that use mbox or Maildir internally insist that all such mailboxes must reside in a single master directory. Moreover, they won't let me open a mailbox from outside, I have to run the mail client and work through its hierarchical presentation of mailboxes to get to my destination.<br><br>Before I get inundated by comments pointing out that mail client X has feature Y from the list above: Yes, I know, there are small exceptions here and there. But unless I have the complete freedom to put my mail where I want it, the isolated feature won't do me much good. If someone knows of a mail client that has all the features I am asking for, plus the features we all expect from a modern mail client, then please do leave a comment!
 ]]></description> </item><item> <title>EuroSciPy 2011</title> <link>https://blog.khinsen.net/posts/2011/08/30/EuroSciPy-2011.html</link> <pubDate>2011-08-30</pubDate> <author>Konrad Hinsen</author> <guid isPermaLink="true">https://blog.khinsen.net/posts/2011/08/30/EuroSciPy-2011.html</guid> <category><![CDATA[ programming ]]></category><category><![CDATA[ science ]]></category> <description><![CDATA[ Another EuroSciPy conference is over, and like last year it was very interesting. Here is my personal list of highlights and comments.<br><br>The two keynote talks were particularly inspiring. On Saturday, <a href="http://mcs.open.ac.uk/mp8/" title="Marian Petre">Marian Petre </a> reported on her studies of how people in general and scientists in particular develop software. The first part of her presentation was about how "expert" design and implement software, the definition of an expert being someone who produces software that actually works, is finished on time, and doesn't exceed the planned budget. The second part was about the particularities of software development in science. But perhaps the most memorable quote of the keynote was Marian's reply to a question from the audience of how to deal with unreasonable decisions coming from technically less competent managers. She recommended to learn how to manage management - a phrase that I heard repeated several times during the discussions along the conference.<br><br>The Sunday keynote was given by <a href="http://fperez.org/" title="Fernando Perez">Fernando Perez</a>. As was to be expected, <a href="http://ipython.org/" title="IPython">IPython</a> was his number one topic and there was a lot of new stuff to show off. I won't mention all the new features in the recently released version 0.11 because they are already discussed in detail <a href="http://stronginference.com/weblog/2011/7/15/innovations-in-ipython.html">elsewhere</a>. What I find even more exciting is the new Web notebook interface, available only directly from the <a href="https://github.com/ipython/ipython">development site at github</a>. A notebook is an editable trace of an interactive session that can be edited, saved, stored in a repository, or shared with others. It contains inputs <em>and</em> outputs of all commands. Inputs are cells that can consist of more than one line. Outputs are by default what Python prints to the terminal, but IPython provides a mechanism for displaying specific types of objects in a special way. This allows to show images (in particular plots) inline, but also to turn SymPy expressions into mathematical formulas typeset in LaTeX.<br><br>A more alarming aspect of Fernando's keynote was his statistical analysis of contributions to the major scientific libraries of the Python universe. In summary, the central packages are maintained by a grand total of about 25 people in their spare time. This observation caused a lot of debate, centered around how to encourage more people to contribute to this fundamental work.<br><br>Among the other presentations, as usual mostly of high quality, the ones that impressed me most were <a href="http://www.imp.ac.at/research/andrew-straw/" title="Andrew Straw">Andrew Straw's</a> presentation of <a href="http://www.ros.org/">ROS</a>, the Robot Operating System, <a href="http://cbsu.tc.cornell.edu/staff/myers/">Chris Myers'</a> presentation about <a href="http://sloppycell.sourceforge.net/">SloppyCell</a>, and <a href="http://www.chimie-paristech.fr/annuaire/spip.php?article328">Yann Le Du's</a> talk about large-scale machine learning running on a home-made GPU cluster. Not to forget the numerous posters with lots of more interesting stuff.<br><br>For the first time, EuroSciPy was complemented by domain-specific satellite meetings. I attended <a href="http://www.euroscipy.org/blogentry/4556">PyPhy</a>, the Python in Physics meeting. Physicists are traditionally rather slow in accepting new technology, but the meeting showed that a lot of high-quality research is based on Python tools today, and that Python has also found its way into physics education at various universities.<br><br>Finally, conferences are good also because of what you learn during discussions with other participants. During EuroSciPy, I discovered a new scientific journal called <a href="http://www.openresearchcomputation.com/">Open Research Computation</a> , which is all about software for scientific research. Scientific software developers regularly complain about the lack of visibility and recognition that their work receives by the scientific community and in particular by evaluation and grant attribution committees. A dedicated journal might just be what we need to improve the situation. I hope this will be a success.<br>
 ]]></description> </item><item> <title>Executable Papers</title> <link>https://blog.khinsen.net/posts/2011/06/03/Executable-Papers.html</link> <pubDate>2011-06-03</pubDate> <author>Konrad Hinsen</author> <guid isPermaLink="true">https://blog.khinsen.net/posts/2011/06/03/Executable-Papers.html</guid> <category><![CDATA[ programming ]]></category><category><![CDATA[ science ]]></category> <description><![CDATA[ The last two days I participated in the "Executable Papers workshop" at this year's <a href="http://www.iccs-meeting.org/" title="International Conference on Computational Science">ICCS conference</a>. It was not just another workshop among the many ICCS workshops. The participants had all submitted a proposal to the "<a href="http://www.executablepapers.com/">Executable Paper Grand Challenge</a>" run by Elsevier, one of the biggest scientific publishers. On the first day, the nine finalists presented their work, and on the second day, the remaining accepted proposals were presented.<br><br>The term "executable papers" stands for the expected next revolution in scientific publishing. The move from printed journals to electronic on-line journals (or a combination of both) has changed little for authors and readers. It is the libraries that have seen the largest impact because they now do little more than paying subscription fees. Readers obtain papers as PDF files directly from the publishers' Web sites. The one change that does matter to scientists is that most journals now propose the distribute "supplementary material" in addition to the main paper. This can in principle be any kind of file, in practice it is mostly used for additional explanations, images, and tables, i.e. to keep the main paper shorter. Occasionally there are also videos, a first step towards exploring the new possibilities opened up by electronic distribution. The step to executable papers is a much bigger one: the goal is to integrate computer-readable data and executable program code together with the text part of a paper. The goals are a richer reader experience (e.g. interactive visualizations), verifiability of results by both referees and readers (by re-running part of the computations described in the paper), and re-use of data and code in later work by the same or other authors. There is some overlap in these goals with the "<a href="http://reproducibleresearch.net/index.php/Main_Page">reproducible research</a>" movement, whose goal is to make computational research reproducible by providing tools and methods that permit to store a trace of everything that entered into some computational procedure (input data, program code, description of the computing environment) such that someone else (or even the original author a month later) can re-run everything and obtain the same results. The new aspect in executable papers is the packaging and distribution of everything, as well as the handling of bibliographic references.<br><br>The proposals' variety mostly reflected the different background of the presenters. A mathematician documenting proofs obviously has different needs than an astrophysicist simulating a supernova on a supercomputer. Unfortunately this important aspect was never explicitly discussed. Most presenters did not even mention their field of work, much less what it implies in terms of data handling. This was probably due to the enormous time pressure; 15 to 20 minutes for a presentation plus demonstration of a complex tool was clearly not enough.<br><br>The proposals could roughly be grouped into three categories:<br><ul><br>	<li>Web-based tools that permit the author to compose his executable paper by supplying data, code, and text, and permit the reviewer and reader to consult this material and re-run computations.</li><br>	<li>Systems for preserving the author's computational environment in order to permit reviewers and readers to use the author's software with little effort and without any security risks.</li><br>	<li>Semantic markup systems that make parts of the written text interpretable by a computer for various kinds of processing</li><br></ul><br>Some proposals covered two of these categories but with a clear emphasis on one of them. For the details of each propsal, see the <a href="http://www.sciencedirect.com/science/journal/18770509">ICCS proceedings</a> which are freely available.<br><br>While it was interesting to see all the different ideas presented, my main impression of the Executable Paper Workshop is that of a missed opportunity. Having all those people who had thought long and hard about the various issues in one room for two days would have been a unique occasion to make progress towards better tools for the future. In fact, none of the solutions presented cover the needs of the all the domains of computational science. They make assumptions about the nature of the data and the code that are not universally valid. One or two hours of discussion might have helped a lot to improve everyone's tools.<br><br>The implementation of my own proposal, which addresses the questions of how to store code and data in a flexible, efficient, and future-proof way, is available <a href="http://dirac.cnrs-orleans.fr/plone/software/activepapers">here</a>. It contains a multi-platform binary (MacOS, Linux, Windows, all on the x86 platform) and requires version 6 of the Java Runtime Environment. The source code is also included, but there is no build system at the moment (I use a collection of scripts that have my home-directory hard-coded in lots of places). There is, however, a tutorial. Feedback is welcome!<br>
 ]]></description> </item><item> <title>Text input on mobile devices</title> <link>https://blog.khinsen.net/posts/2011/05/02/Text-input-on-mobile-devices.html</link> <pubDate>2011-05-02</pubDate> <author>Konrad Hinsen</author> <guid isPermaLink="true">https://blog.khinsen.net/posts/2011/05/02/Text-input-on-mobile-devices.html</guid> <category><![CDATA[ mobile computing ]]></category> <description><![CDATA[ <p>I have been using mobile (pocket size) computers for about 15 years, starting with the Palm Pilot. Currently I use an Android smartphone (Samsung Galaxy S). While mobile devices are mostly used for consulting rather than for entering information, text entry has always been a hot topic of debate.</p><br><p>Apple's Newton Messagepad, probably the first mobile computing device in the modern sense, pursued the ambitious goal of handwriting recognition. It was both an impressive technical achievement and a practical failure. I don't think anyone ever managed to use the Newton's handwriting recognition satisfactorily in daily life.</p><br><p>The Palm Pilot had a more modest but also more achievable goal: its Graffiti technology was based on single letter recognition with simplified letter shapes. It took a while to become fluent with Graffiti, but many people managed and I don't remember anyone complainig about. the nearning curve.</p><br><p>I don't remember when I first saw a miniature QWERTY keyboard on the screen of a mobile device, but it may well have been on one of the first iPhones. I was definitely not enthusiastic about it. The keys are much too small for touch-typing, and the layout was already a bad choice for desktop computer keyboards. The only argument in its favor is familiarity, but is that a good enough reason to cripple oneself for a long time to come?</p><br><p>When I got my Android phone, I was rapidly confronted with this issue in practice. Samsung left me the choice between the standard keyboard and Swype. Both had the same problem: too small keys for my fingers. I turned to the Android market and found many more QWERTY keyboards. And... Graffiti, my old friend from my Palm days. What a relief!</p><br><p>Of course, my phone is not a Palm. The Biggest difference is that the Palm had a stylus whereas today's smartphones are meant to be manipulated withthe fingers. But Graffiti works surprisingly well without a stylus. I find that I can write about equally well wïth the index or the thumb. Graffiti definitely is a good choice for Android, especially for Palm veterans.</p><br><p>Recently I discovered another alternative input and I like it enough that I might end up preferring it over Graffiti. It's called MessagEase and it consists of a 3x3 grid of comfortably large keys that display the 9 most frequent characters. The remaining characters, plus punctuation etc., is available by drawing lines outward from the center of a key. The technique doesn't require much time to master,  but writing fluently requires a lot of practice because the layout needs to be memorized.</p><br><p>I started using MessagEase about two weeks ago and have reached about the same speed I ge	 with Graffiti. I wrote this whole article with MessagEase as a real-life exercise. Time will tell if I actually get faster than with Graffiti, but MessagEase definitely is a serious candidate for mobile texting in the post-QWERTY era. If you have an Android phone or an iPhone, give it a try.</p>
 ]]></description> </item><item> <title>Bye bye iCal, welcome org-mode</title> <link>https://blog.khinsen.net/posts/2011/01/04/Bye-bye-iCal-welcome-org-mode.html</link> <pubDate>2011-01-04</pubDate> <author>Konrad Hinsen</author> <guid isPermaLink="true">https://blog.khinsen.net/posts/2011/01/04/Bye-bye-iCal-welcome-org-mode.html</guid> <category><![CDATA[ emacs ]]></category> <description><![CDATA[ I have been using Macintosh computers since 2003, and overall I have been happy with the personal information management (PIM) tools provided by Apple: AddressBook, Mail, Safari (for bookmark management). The one tool I have never liked is iCal. Its user interface is fine for consulting my agenda, but entering information is too complicated and the todo-list management is particularly clumsy. But more importantly, I regularly found myself wanting to add information for which no entry field was provided. I ended up putting it into the "notes" section, or leave it out. Another unplesant feature of iCal is that all the information is stored in a complex proprietary database, making synchronization between several computers impossible except through cloud-based server solutions such as Apple's <a href="http://www.me.com/">MobileMe</a> (quite expensive) or <a href="http://fruux.com/">fruux</a> (much nicer in my opinion, but it still requires trusting your data to a cloud service).<br><br>Being unhappy with a tool for an important task implies looking for better options, but I didn't find anything that I liked. Until one day I discovered, mostly by accident, the <a href="http://orgmode.org/">org-mode</a> package that has been distributed with Emacs for a while. org-mode is one of those pieces of software that is so powerful that it is difficult to describe to someone who has never used it. Basically, org-mode uses plain text files with a special lightweight markup syntax for things like todo items or time stamps (but there is a lot more), and then provides sophisticated and very configurable functions for working with this data. It can be used for keeping agendas, todo lists, journals, simple databases such as bookmark lists, spreadsheets, and much more. Most importantly, all of these can coexist in a single text file if you want, and the contents of this file can be structured in any way you like. You can even add pieces of executable code and thus use org-mode for literal programming, but that's a topic for another post.<br><br>To be more concrete, my personal information database in org-mode consists of several files at the top level: <code>work.org</code> for organizing my workday, <code>home.org</code> for tasks and appointments related to private life, <code>research.org</code> for notes about research projects, <code>programming.org</code> for notes (mostly bookmarks) about software development, etc. Inside my <code>work.org</code>, there is a section on research projects, one on teaching, one on my editorial work for <a href="http://cise.aip.org/">CiSE</a>, one for refereeing, etc. Inside each of these sections, there are agenda entries (seminars, meetings, courses etc.) and todo entries with three priority levels and optional deadlines. Any of them can be accompanied by notes of any kind, including links, references to files on my disk, and even executable shell commands. There is no limit to what you store there.<br><br>In October 2010 I started the transition from iCal to org-mode. Initially I entered all data twice, to make sure I could continue to rely on iCal. After a week I was confident enough to enter everything just once, using org-mode. I then transferred all agenda items for 2011 to org-mode and decided to stop using iCal on Januray 1, 2011. That day has arrived, and the iCal icon has disappeared from my dock. Without any regrets.<br><br>Conclusion: If you need a powerful PIM system and you don't fear Emacs, have a look at org-mode.<br>
 ]]></description> </item><item> <title>The future of Python</title> <link>https://blog.khinsen.net/posts/2010/07/19/The-future-of-Python.html</link> <pubDate>2010-07-19</pubDate> <author>Konrad Hinsen</author> <guid isPermaLink="true">https://blog.khinsen.net/posts/2010/07/19/The-future-of-Python.html</guid> <category><![CDATA[ programming ]]></category> <description><![CDATA[ I have received a number of questions and remarks about my <a href="http://dirac.cnrs-orleans.fr/plone/Members/hinsen/presentations/scientific-computing/EuroSciPy_2010_Keynote.pdf/view">keynote talk at EuroSciPy 2010</a>, ranging from questions about technical details to an inquiry about the release date of Python 4.0! Rather than writing lengthy replies to everyone, I try to address all these issues here.<br><br>First of all, my intentions behind the keynote were<br><ol><br><br>	<li>Encourage scientists to look at new tools and developments that I believe to be important in the near future (Python 3, Cython) and at others that might become important to scientific applications (JIT compilers, alternative implementations).</li><br><br>	<li>Make computational scientists think about future commodity hardware (which is what we use most of the time) and its implications for programming, in particular the generalization of massively parallel computing.</li><br><br>	<li>Show that easy-to-use parallel programming paradigms, in particular deterministic ones, exist today. Computational scientists need to realize that MPI and OpenMP are not the last word on parallel programming.</li><br><br>	<li>Make my ideas concrete by showing how they could be implemented in Python.</li><br><br></ol><br><br>My "Python 4.0" is completely fictitious and will probably never exist in exactly that form. However, it is important to realize that it <em>could</em> be implemented right now. With the GIL-free Python implementations (Jython, IronPython), it would even be rather straightforward to implement. For CPython, any implementation not removing the GIL would probably be too inefficient to be of practical interest.<br><br>Most of the ingredients for implementing my "Python 4.0" are well-known and have already been used in other languages or libraries:<br><ul><br>	<li>The "declarative concurrency" programming paradigm has been used in <a href="http://en.wikipedia.org/wiki/Oz_(programming_language)">Oz</a> and <a href="http://en.wikipedia.org/wiki/Flow_Java">FlowJava</a>, but to the best of my knowledge not in any mainstream programming language. It is explained very well in the book <em><a href="http://www.info.ucl.ac.be/~pvr/book.html">Concepts, Techniques, and Models of Computer Programming</a>,</em> by Peter van Roy and Seif Haridi, and also in the freely downloadable essay <em><a href="http://www.info.ucl.ac.be/~pvr/VanRoyChapter.pdf">Programming paradigms for dummies</a></em>. Basically, it is the functional paradigm extended with annotations that identify computations to be done in parallel. Remove those annotations, and you get plain functional programs that yield the same results. Declarative concurrency is free of deadlocks and race conditions, which I think is a critical property for any parallel programming paradigm to be considered for a high-level language such as Python. Another nice feature of declarative concurrency is that data parallelism, including nested data parallelism, is a special case that can be implemented on top of it. Data parallelism is a useful paradigm for many scientific applications.</li><br>	<li><a href="http://en.wikipedia.org/wiki/Future_(programming)">Futures</a> are asynchronous tasks provided as library functions for various languages. A Python library for futures is the subject of <a href="http://www.python.org/dev/peps/pep-3148/">PEP 3148</a>; an implementation is already available.</li><br>	<li>The importance of effect-free functions for all kinds of code transformations (automatic or not) is widely recognized. It is equally recognized that a useful program needs to have effects.  The two basic approaches to dealing with this contradiction are (a) allow effects, but make it as easy as possible not to use them to encourage a mostly effect-free style and (b) design a language without effects (pure functional language) and provide a mechanism to put in effects with special annotation or syntax but clearly as an exceptional feature. The best-known language in the second category is Haskell with its use of monads for controlling effects. Most functional languages are in the first category.<br>	<li>Efficient data structures for functional programs have been a subject of research for quite a while and quite a few good ones are known. It would be straightforward to replace Python's tuple implementation by something more efficient in typical functional settings, or to add an efficient immutable dictionary implementation. The standard reference is Chris Osaki's book <a href="http://www.amazon.com/Purely-Functional-Structures-Chris-Okasaki/dp/0521663504/ref=sr_1_1?ie=UTF8&amp;s=books&amp;qid=1279556920&amp;sr=1-1"><em>Purely Functional Data Structures</em></a>.</li><br><br></ul><br><br>Futures may seem to provide most of what declarative concurrency promises, but this is not quite true. Futures are objects representing computations. They have a method that client code must call to wait for the result and retrieve it. Since waiting is an explicit operation on a standard object, it is easy to create a situation in which two futures wait for each other: a deadlock. This can only be avoided by not having futures accessible as standard objects. The language implementation must recognize futures as special and insert a wait call before any access to the value of the result. For this reason, declarative concurrency cannot be implemented as a library.<br><br>Another important condition for implementing declarative concurrency with futures is that code inside a future must be effect-free. Otherwise multiple concurrently running futures can modify the same object and create a race condition.<br><br>Probably the only truly original contribution in my "Python 4.0" scenario is the dynamically verified effect-free subset of the Python language. Most languages, even functional ones, provide no way for a compiler or a run-time system to verify that a given function is effect-free. Haskell is perhaps the only exception in having a static type system that can identify effect-free code. In Python, that is not a viable approach because everything is dynamic. But why not provide at least a run-time check for effect-free code where useful? It's still better to have a program crash with an exception saying "you did I/O in what should have been an effect-free function" than get wrong results silently.<br><br>Here is an outline of how such an approach could be implemented. Each function and method would have a flag saying "I am supposed to be effect-free." In my examples, this flag is set by the decorator @noeffects, but other ways are possible. Built-in functions would of course have that flag set correctly as well. As soon as the interpreter enters a function marked as effect-free, it goes into "functional mode" until it returns from that function again. In functional mode, it raises an exception whenever an unflagged function or method is called.<br><br>Some details to consider:<br><ul><br>	<li>Effect-free functions may not contain global or nonlocal statements. Probably the only way to enforce this is to have special syntax for defining effect-free functions and methods (rather than a decorator) and make those statements syntactically illegal inside.</li><br><br>	<li>It would be much more useful to have a "referentially transparent" subset rather than an "effect-free" subset, but this is also much harder to implement. A referentially transparent function guarantees to return the same result for the same input arguments, but may modify mutable objects that it has created internally. For example, a typical matrix inversion function allocates an array for its result and then uses an imperative algorithm that modifies the elements of that array repeatedly before returning it. Such a function can be used as an asynchronous task without risk, but its building blocks cannot be safely run concurrently.</li><br></ul><br><br>Finally, a comment on a minor issue. I have been asked if the "async" keyword is strictly necessary. The answer is no, but it makes the code much more readable. The main role of async is to write a function call without having it executed immediately. The same problem occurs in callbacks in GUI programming: you have to specify a function call to be executed at a later time. The usual solution is a parameter-free lambda expression, and that same trick could be used to make async a function rather than a keyword. But readability suffers a lot.<br>
 ]]></description> </item><item> <title>EuroSciPy 2010</title> <link>https://blog.khinsen.net/posts/2010/07/12/EuroSciPy-2010.html</link> <pubDate>2010-07-12</pubDate> <author>Konrad Hinsen</author> <guid isPermaLink="true">https://blog.khinsen.net/posts/2010/07/12/EuroSciPy-2010.html</guid> <category><![CDATA[ programming ]]></category><category><![CDATA[ science ]]></category> <description><![CDATA[ This weekend I attended the <a href="http://www.euroscipy.org/conference/euroscipy2010">EuroSciPy 2010</a> conference in Paris, dedicated to scientific applications of the programming language Python. This was the third EuroSciPy conference, but the US-based SciPy conference has been a regular event for many years already, and recently SciPy India joined the crowd. It looks like Python is becoming ever more popular in scientific computing. Next year, EuroSciPy will take place in Paris again.<br><br>There were lots of interesting presentations and announcements, and the breaks provided a much appreciated opportunity for exchanges between the participants. I won't try to provide an exhaustive summary, but rather list my personal highlights. Obviously this choice reflects my personal interests more than the quality of the presentations, and I will even list things that were not presented but that I learned about from other participants during the breaks.<br><br><strong>Teaching</strong><br><br>The <a href="http://www.euroscipy.org/talk/876">opening keynote</a> was given by <a href="http://vefur.simula.no/~hpl/">Hans-Petter Langtangen,</a> who is best known for his books about Python for scientific computing. His <a href="http://www.amazon.com/Scientific-Programming-Computational-Science-Engineering/dp/3642024742/ref=sr_1_2?ie=UTF8&amp;s=books&amp;qid=1250102359&amp;sr=1-2">latest book</a> is a textbook for a course on scientific programming for beginning science students, and the first part of his keynote was about this same course that he is teaching at the University of Oslo. As others have noted as well, he observed that the students have no problem at all with picking up Python and using it productively in science. The difficulties with using Python are elsewhere: it is hard to convince the university professors that Python is a good choice of programming language for such a course!<br><br>Another important aspect of his presentation was the observation that teaching scientific programming to beginning science students provides more than just training in some useful technique. Converting equations into programs and running them also provides a much better insight into the structure and applicability of the equations. Computational science thus helps to better educate future scientists.<br><br><strong>Reproducible research</strong><br><br>The <a href="http://reproducibleresearch.net/index.php/Main_Page">reproducible research</a> movement has the goal of improving the standards in computational science. At the moment, it is almost always impossible to reproduce published computational results from the information provided by the authors. Making these results reproducible requires a careful recording of what was calculated using which version of which software running on which machine, and of course making this information available along with the publication.<br><br>At EuroSciPy, <a href="http://www.euroscipy.org/talk/1960">Andrew Davison</a> presented <a href="http://neuralensemble.org/trac/sumatra/">Sumatra</a>, a Python library for tracking this information (and more) for computational procedures written in Python. The library is in an early stage, with more functionality to come, but those interested in reproducible research should check it out now and contribute to its development.<br><br><a href="http://www.euroscipy.org/talk/2393">Jarrod Millman</a> addressed the same topic in his presentation of the plans for creating a Foundation for Mathematical and Scientific Computing, whose goal is to fund development of tools and techniques that improve computational science.<br><br><strong>NumPy and Python 3</strong><br><br>As a couple of active contributors to the NumPy project were attending the conference, I asked about the state of the porting effort to Python 3. The good news is that the port is done and will soon be released. Those who have been waiting for NumPy to be ported before starting to port their own libraries can go to work right now: check out the <a href="http://svn.scipy.org/svn/numpy/trunk/">NumPy Subversion repository</a>, install, and use!<br><br><strong>Useful maths libraries</strong><br><br>Three new maths libraries that were presented caught my attention: <a href="http://www.euroscipy.org/talk/2045">Sebastian Walter</a>'s talk about algorithmic differentiation contained demos of <a href="http://github.com/b45ch1/algopy">algopy</a>,  a rather complete library for algorithmic differentiation in Python. During the Lightning talks on the last day, two apparently similar libraries for working with uncertain numbers (numbers with error bars) were shown: <a href="http://packages.python.org/uncertainties/">uncertainties</a>, by Eric Lebigot, and upy, by Friedrich Romstedt. Both do error propagation and take correlations into account. Those of us working with experimental data or simulation results will appreciate this.<br><br>There was a lot more interesting stuff, of course, and I hope others will write more about it. I'll just point out that the slides for my own keynote about the future of Python in science are available <a href="http://dirac.cnrs-orleans.fr/plone/Members/hinsen/presentations/scientific-computing/EuroSciPy_2010_Keynote.pdf/view">from my Web site</a>. And of course express my thanks to the organizing committee who invested a lot of effort to make this conference a big success!<br><br>
 ]]></description> </item><item> <title>Science and free will</title> <link>https://blog.khinsen.net/posts/2010/07/01/Science-and-free-will.html</link> <pubDate>2010-07-01</pubDate> <author>Konrad Hinsen</author> <guid isPermaLink="true">https://blog.khinsen.net/posts/2010/07/01/Science-and-free-will.html</guid> <category><![CDATA[ science ]]></category> <description><![CDATA[ The question if living beings, in particular those of our own species, possess "free will", and how it works if it exists, has recently become fashionable again. The new idea that brought the topic back into discussion was that our sense of free will might just be an illusion. According to this idea, we would be machines whose fate is entirely determined by the laws of physics (which might themselves be deterministic or not), even though we perceive ourselves as actors who pursue goals and take decisions that are not even in principle predictable by a physical analysis of our bodies, no matter at what level of detail.<br><br>The topic itself is an old one, perhaps as old as humanity. I won't go into its philosophical and religious aspects, but limit myself to the scientist's point of view: is free will compatible with scientific descriptions of our world? Perhaps even necessary for such descriptions? Or, on the contrary, in contradiction to the scientific approach? Can the scientific method be used to understand free will or show that it's a useless concept from the past?<br><br>What prompted me to write this post is a <a href="http://www.pnas.org/content/107/10/4499">recent article by Anthony Cashmore in PNAS</a>. In summary, Cashmore says that the majority of scientists do not believe in the existence of free will any more, and that society should draw conclusions from this, in particular concerning the judicial system, whose concept of responsibility for one's acts is based on a view of free will that the author no longer considers defendable. But don't take my word for it, read the article yourself. It's well written and covers many interesting points.<br><br>First of all, let me say that I don't agree at all with Cashmore's view that the judicial system should be reformed based on the prevailing view of today's scientists. I do believe that a modern society should take into account scientific findings, i.e. scientific hypotheses that have withstood a number of attempts at falsification. But mere beliefs of a small subpopulation, even if they are scientists, are not sufficient to justify a radical change of anything. As I will explain below, the question "do human beings possess free will" does not even deserve the label "scientific hypothesis" at this moment, because we have no idea of how we could answer it based on observation and experiment. We cannot claim either to be able to fully understand human behavior in terms of the laws of physics, which would allow us to call free will an unnecessary concept and invoke Occam's razor to get rid of it. Therefore, at this time, the existence of free will remains the subject of beliefs and scientists' beliefs are worth no more than anyone else's.<br><br>There is also a peculiar circularity to any argument about what "should" be done as a consequence of the non-existence of free will: if that hypothesis is true, nobody can decide anything! If humans have no free will, then societies don't have it either, and our judicial system is just as much a consequence of the laws of nature as my perceived decision to take coffee rather than tea for breakfast this morning.<br><br>Back to the main topic of this post: the relation between science and free will. It starts with the observation of a clear conflict. Science is about identifying regularities in the world that surrounds us, which permit the construction of detailed and testable theories. The first scientific theories were all about deterministic phenomena: given the initial state of some well-defined physical system (think of a clockwork, for example), the state of the system at any time in the future can be predicted with certainty. Later, stochastic phenomena entered the scientific world view. With stochasticity, the detailed behavior of a system is no longer predictable, but certain average properties still are. For example, we can predict how the temperature and pressure of water will change when we heat it, even though we cannot predict how each individual molecule will move. It is still a subject of debate whether stochastic elements exist in the fundamental laws of nature (quantum physics being the most popular candidate), or if they are merely a way of describing complex systems whose state we cannot analyze in detail due to insufficient resources. But scientists agree that a scientific theory may contain two forms of causality: determinism and stochasticity.<br><br>Free will, if it exists, would have to be added as a third form of causality. But it is hard to see how this could be done. The scientific method is based on identifying conditions from which exact predictions can be made. The decisions of an agent that possesses free will are by definition unpredictable, and therefore any theory about a system containing such an agent would be impossible to verify. Therefore the scientific method as we know it today cannot possibly take into consideration the existence of free will. Obviously this makes it impossible to examine the existence of free will as a scientific hypothesis. It also means that a hard-core scientist, who considers the scientific method as the only way to establish truth, has to deny the existence of free will, or else accept that some important aspects of our universe are forever inaccessible to scientific investigation.<br><br>However, there is another aspect to the relation between science and free will, which I haven't seen discussed yet anywhere: the existence of free will is in fact a requirement for the scientific method! Not as part of a system under scientific scrutiny, but as part of the scientist who runs an investigation. Testing a scientific hypothesis requires at the very least <em>observing</em> a specific phenomenon, but in most cases also <em>preparing</em> a well-defined initial state for some system that will then become the subject of observation. A scientist <em>decides</em> to create an experimental setup to verify some hypothesis. If the scientist were just a complex machine whose behavior is governed by the very same laws that he believes to be studying, then his carefully thought-out experiment is nothing but a particularly probable outcome of the laws of nature. We could still draw conclusions from observing it, of course, but these observations then only provide anecdotical evidence that is no more relevant than what we get from passively watching things happen around us.<br><br>In summary, our current scientific method supposes the existence of free will as an attribute of scientists, but also its absence from any system subjected to scientific scrutiny. This poses limits to what scientific investigation can yield when applied to humans.
 ]]></description> </item><item> <title>Eclipse experiences</title> <link>https://blog.khinsen.net/posts/2010/01/19/Eclipse-experiences.html</link> <pubDate>2010-01-19</pubDate> <author>Konrad Hinsen</author> <guid isPermaLink="true">https://blog.khinsen.net/posts/2010/01/19/Eclipse-experiences.html</guid> <category><![CDATA[ programming ]]></category> <description><![CDATA[ A few months ago I decided to take a closer look at <a href="http://www.eclipse.org/">Eclipse</a>, since several people I know seemed to be quite fond of it. I had tried it earlier on my old iBook G4, but quickly abandoned it because it was much too slow. But my new MacBook Pro should be able to handle it.<br><br>Last week I finally decided to retire my Eclipse installation. I didn't remove it yet, since it might be useful for some specific tasks that I have deal with rarely (such as analyzing someone else's big C++ code). But I don't use it any more for my own work. Here's a summary of my impressions of Eclipse, the good and the bad.<br><br>In terms of features, Eclipse is as impressive as it looks. Anything you might wish for in an IDE is there, either in the base distribution or in the form of a plugin - there are hundreds if not thousands of those. And contrary to what one might expect, all those features are relatively easy to get used to. The user interface is very systematic and the most frequent functions are easy to spot. In terms of user interface design, I would call Eclipse a success.<br><br>However, in terms of usability it turned out to be a disappointment. Basically there are two major issues: Eclipse is a resource hog, and it isn't as stable as I expect an IDE to be.<br><br>The two resources that Eclipse can't get enough of is CPU time and disk space. Even on a brand-new machine (and not a low-end one at that), starting Eclipse takes a good ten seconds and I get to see the Macintosh's spinning colour wheel quite often. What's worst is that the spinning wheel prevents me from typing, at unpredictable moments. This is not acceptable for an IDE. I don't care if it takes a break in background compilation now and then, but I want to be able to type when <strong>I</strong> want. Execution times for various command can also vary unpredictably. Rebuilding all my projects took about a minute typically, but once I waited for 15 minutes for no apparent reason.<br><br>In terms of disk space, Eclipse is less of a resource hog, but it creates and updates impressive amounts of data, again for no clear reason. I noticed this because I make incremental backups regularly. Just starting and quitting Eclipse, with no action in between, resulted in a few MB of files to backup again. It's not that I can't live with that, but is this really necessary?<br><br>Finally, stability. I had only a single crash in which I lost data (the most recently entered code), which is not so bad for a big application (unfortunately...). But I had Eclipse hanging very often, and displaying verbose yet unintellegible error messages almost daily. All this is not reassuring, and together with the spinning-wheel issue this is what made me abandon Eclipse in the end.<br><br>Now I am a 100% Emacs user again, with no regrets. Emacs may look old-fashioned, and have some fewer high-powered features, but it is reliable and fast.
 ]]></description> </item><item> <title>Scientific computing needs deterministic programming paradigms</title> <link>https://blog.khinsen.net/posts/2009/09/09/Scientific-computing-needs-deterministic-programming-paradigms.html</link> <pubDate>2009-09-09</pubDate> <author>Konrad Hinsen</author> <guid isPermaLink="true">https://blog.khinsen.net/posts/2009/09/09/Scientific-computing-needs-deterministic-programming-paradigms.html</guid> <category><![CDATA[ programming ]]></category><category><![CDATA[ science ]]></category> <description><![CDATA[ Programmers, scientific and otherwise, spend a lot of time discussing which programming languages, libraries, and development tools to use. In such discussions, the notion of a programming paradigm is rarely mentioned, and yet it is a very fundamental one. It becomes particularly important for parallel and concurrent programming, where the most popular languages and libraries do not necessarily provide the best programming paradigm. In this post, I will explain what a programming paradigm is and why its choice matters more than the choice of a language.<br><br>A programming paradigm defines a general approach to writing programs. It defines the concepts and abstractions in terms of which a program is expressed. For example, a programming paradigm defines how data is described, how control flow is handled, how program elements are composed, etc. Well-known programming paradigms are <a href="http://en.wikipedia.org/wiki/Structured_programming">structured programming</a>, <a href="http://en.wikipedia.org/wiki/Object-oriented_programming">object-oriented programming</a>, and <a href="http://en.wikipedia.org/wiki/Functional_programming">functional programming</a>.<br><br>The implementation of a programming paradigm consists of a programming language, its runtime system, libraries, and sometimes coding conventions. Some programming languages are optimized for a specific paradigm, whereas others are explicitly designed to support multiple paradigms. Paradigms that the language designer did not have in mind can sometimes be implemented by additional conventions, libraries, or preprocessors.<br><br>The list of programming paradigms that have been proposed and/or used is already quite long (see the <a href="http://en.wikipedia.org/wiki/Programming_paradigm">Wikipedia entry</a>, for example), but the ones that are practically important and significantly distinct are much less numerous. A good overview and comparison is given in the book chapter "<a href="http://www.info.ucl.ac.be/~pvr/VanRoyChapter.pdf">Programming paradigms for dummies</a>" by <a href="http://www.info.ucl.ac.be/~pvr/cvvanroy.html">Peter van Roy</a>. I will concentrate on one aspect discussed in van Roy's text (look at section 6 in particular), which I consider of particular relevance for scientific computing: determinism.<br><br>A deterministic programming paradigm is one in which every possible program has a fully deterministic behaviour: given the same input, it executes its steps in the same order and produces the same output. This is in fact what most of us would intuitively expect from a computer program. However, there are useful programs that could not be written with this restriction. A Web server, for example, has to react to external requests which are outside of its control, and take into account resource usage (e.g. database access) and possible network errors in deciding when and in which order to process requests. This shows that there is a need for non-deterministic programming paradigms. For the vast majority of scientific applications, however, determinism is a requirement, and a programming paradigm that enforces determinism is a big help in avoiding bugs. Most scientific applications that run serially have been written using a deterministic programming paradigm, as implemented by most of the popular programming languages.<br><br>Parallel computing has changed the situation significantly. When several independent processors work together on the execution of an algorithm, fully deterministic behavior is no longer desirable, as it would imply frequent synchronizations of all processors. The precise order in which independent operations are executed is typically left unspecified by a program. What matters is that the output of the program is determined only by the input. As long as this is guaranteed, it is acceptable and even desirable to let compilers and run-time systems optimize the scheduling of individual subtasks. In Peter van Roy's classification, this distinction is called "observable" vs. "non-observable" non-determinism. A programming paradigm for scientific computing should permit non-determinism, but should exclude observable non-determinism. While observable non-determinism makes the implementation of certain programs (such as Web servers) possible, it also opens the way to bugs that are particularly nasty to track down: deadlocks, race conditions, results that change with the number of processors or when moving from one parallel machine to another one.<br><br>Unfortunately, two of the most popular programming paradigms for parallel scientific applications do allow observable non-determinism: message passing, as implemented by the MPI library, and multi-threading. Those who have used either one have probably suffered the consequences. The problem is thus well known, but the solutions aren't. Fortunately, they do exist: there are several programming paradigms that encapsulate non-determinism in such a way that it cannot influence the results of a program. One of them is widely known and used: OpenMP, which is a layer above multi-threading that guarantees deterministic results. However, OpenMP is limited to shared-memory multiprocessor machines.<br><br>For the at least as important category of distributed-memory parallel machines, there are also programming paradigms that don't have the non-deterministic features of message passing, and they are typically implemented as a layer above MPI. One example is the <a href="http://www.bsp-worldwide.org/">BSP</a> model, which I have presented in an <a href="http://www2.computer.org/portal/web/csdl/doi/10.1109/MCSE.2007.117">article</a> in the magazine <a href="http://www.computer.org/cise">Computing in Science and Engineering</a>. Another example is the parallel skeletons model,<a href="http://www2.computer.org/portal/web/csdl/doi/10.1109/MCSE.2009.57"> presented by Joël Falcou in the same magazine</a>. Unfortunately, these paradigms are little known and not well supported by programming tools. As a consequence, most scientific applications for distributed-memory machines are written using the message passing paradigm.<br><br>Finally, a pair of programming paradigms discussed by van Roy deserves special mention, because it might well become important in scientific computing in the near future: functional programming and declarative concurrency. I have written about functional programming <a href="http://khinsen.wordpress.com/2009/06/23/functional-programming-for-scientific-computing/">earlier</a>; its main advantage is the possibility to apply mathematical reasoning and automatic transformations to program code, leading to better code (in the sense of correctness) and to better optimization techniques. Declarative concurrency is functional programming plus annotations for parallelization. The nice feature is that these annotations (not very different in principle from OpenMP pragmas) don't change the result of the program, they only influence its performance on a parallel machine. Starting from a correct functional program, it is thus possible to obtain an equivalent parallel one by automatic or manual (but automatically verified) transformations that is guaranteed to behave identically except for performance. Correctness and performance can thus be separated, which should be a big help in writing correct and efficient parallel programs. I say "should" because this approach hasn't been used very much, and isn't supported yet by any mainstream programming tools. This may change in a couple of years, so you might want to watch out for developments in this area.<br>
 ]]></description> </item><item> <title>Sheldrake&#039;s New Science of Life</title> <link>https://blog.khinsen.net/posts/2009/08/25/Sheldrakes-New-Science-of-Life.html</link> <pubDate>2009-08-25</pubDate> <author>Konrad Hinsen</author> <guid isPermaLink="true">https://blog.khinsen.net/posts/2009/08/25/Sheldrakes-New-Science-of-Life.html</guid> <category><![CDATA[ science ]]></category> <description><![CDATA[ One of the books I read during my summer vacation is the recently published second edition of Rupert Sheldrake's "A New Science of Life". It is one of the most controversial books in science, having been both praised and condemned; a review of the first edition in the renowned science journal <em>Nature</em> concluded that this book should be burnt!<br><br>The question that Sheldrake addresses in this book is where form comes from. What defines the arrangements of atoms in a molecule? Or in a crystal? Why do proteins fold into their characteristic structures? How do biological molecules assemble into cells? And how to cells divide and specialize to form an embryo?<br><br>The standard reply to these questions you can find in science textbooks is that all these forms come from the fundamental interactions of physics. Molecules are simply energetically favorable arrangements of atoms. Proteins fold in such a way that free energy is minimized. Cells assemble as a result of complex attractive interactions between its constituents, which ultimately can be reduced to fundamental physics. Embryos develop according to a "genetic program" stored in the fecundated egg's DNA.<br><br>As Sheldrake rightly emphasizes, even though it may come as a surprise to most non-experts, these affirmations cannot be verified. They express a common belief among practicing scientists, and they are compatible with everything we know about nature, but they may well be wrong. We simply cannot verify them because the fundamental equations of physics can be solved only for very simple systems. Even for one of the simplest molecules, water, we cannot predict the arrangement of its atoms directly from the basic principles of physics. What we use in practice are approximations, but these approximations have been selected <em>because</em> they permit to predict the known molecular structures. We cannot use such approximations to verify more fundamental problems.<br><br>Sheldrake proposes an alternative theory, based on what he calls "morphogenetic fields". From my point of view as a physicist, the name is not very well chosen because these entities do not correspond to what a physicist would call a field, but of course this term may be perfectly clear to biologists. It's a minor point because Sheldrake explains this concept very clearly in his book. In summary, his theory says that forms exist because they have existed before; atoms, molecules, and cells arrange themselves into patterns that they  "remember" from the past. His morphogenetic fields are a giant database of forms that the universe keeps around forever.<br><br>The main "problem" with this theory is that if it is right, even just approximately, then standard science, from physics to biology, is very much wrong. It is probably for this reason that his book has attracted so much criticism from the science establishment. Otherwise, there is little one could criticize: Sheldrake explains his theory and its consequences for chemistry and biology, and he proposes a large number of experimental verifications that would permit to test it. This is science at its best. Of course his theory may turn out to require modifications, or even be completely wrong, but that is true of <em>any</em> scientific theory when it is first formulated.<br><br>In fact, I recommend this book to anyone interested in the scientific process because of its detailed discussion of how scientific discovery works. I haven't seen many books accessible to non-specialists that explain the limits of verifiability of a scientific theory, for example. Nor have I seen any other book that makes the distinction between verified theories and widely accepted but untested beliefs so clear as Sheldrake does. Even if you don't care about his theory, you can gain a lot from reading this book.<br>
 ]]></description> </item><item> <title>Functional programming for scientific computing</title> <link>https://blog.khinsen.net/posts/2009/06/23/Functional-programming-for-scientific-computing.html</link> <pubDate>2009-06-23</pubDate> <author>Konrad Hinsen</author> <guid isPermaLink="true">https://blog.khinsen.net/posts/2009/06/23/Functional-programming-for-scientific-computing.html</guid> <category><![CDATA[ programming ]]></category> <description><![CDATA[ With the increasing importance of parallel computers, ranging from multi-core desktop machines to massively parallel machines such as IBM's BlueGene, functional programming could well become an important technique for scientific software development, as it facilitates program transformations (including those for automatic or semi-automatic parallelization) considerably. It also appeals to the mathematical bend of many scientists in making it possible to apply mathematical reasoning to computer programs. The downside: there is a steep learning curve for those familiar with traditional programming (called "imperative").<br><br>I have written an introduction to functional programming for scientists for the July issue of <a href="http://www.computer.org/cise"><em>Computing in Science and Engineering</em></a>. It is also available (free access) via IEEE's <em>Computing Now</em> portal: <a href="http://www2.computer.org/portal/web/computingnow/0609/whatsnew/cise">http://www2.computer.org/portal/web/computingnow/0609/whatsnew/cise</a><br><br>While I don't expect functional programming to be adopted rapidly by computational scientists, I am convinced that ten years from now, it will be an essential item in everyone's toolbox. Better start preparing yourself now!
 ]]></description> </item><item> <title>Static typing and code clutter</title> <link>https://blog.khinsen.net/posts/2009/05/12/Static-typing-and-code-clutter.html</link> <pubDate>2009-05-12</pubDate> <author>Konrad Hinsen</author> <guid isPermaLink="true">https://blog.khinsen.net/posts/2009/05/12/Static-typing-and-code-clutter.html</guid> <category><![CDATA[ programming ]]></category> <description><![CDATA[ Among the many characteristics that distinguish programming languages, static vs. dynamic typing is one of the most debated ones. The main advantages claimed by the advocates of static typing are that compile-time type checks make code more robust and that static typing allows a compiler to do better optimizations. The dynamic programming camp points out the simplicity and flexibility of a language that requires no type declaration and that permits a piece of code to handle data objects defined well after it was written. Both sides are right and the choice is ultimately one of personal preference.<br><br>I have used various programming languages over the years, including both statically typed and dynamically typed ones. But when given a choice, I have always preferred dynamic typing. Since 1995, my main programming language has been <a href="http://www.python.org/">Python</a>, and more recently I have started to use <a href="http://clojure.org/">Clojure</a>. One of the reasons for this preference is something that I have never seen expressed before: static typing often adds visual clutter to the code that makes it harder to read.<br><br>An important property of any non-trivial computer program is its clarity to human readers. Both verification of a program's correctness and the overall utility of a piece of code in a context of changing requirements depend on this. Well-written specifications and unit tests help as well, but if you want my advice on the quality of a piece of code, or if you want my help with modifying it, my judgement will mostly be based on its clarity. If it's an effort to understand what's going on, I wouldn't want to work with it.<br><br>This criterion for code quality immediately translates into a criterion for programming languages: they should be able to express as many concepts of software engineering as possible in a direct, explicit way and without imposing any clutter or obfuscation. Static type systems often get in the way, either by imposing clutter or by encouraging a less clear programming style.<br><br>In my examples, I will use the languages <a href="http://www.haskell.org/">Haskell</a> (static typing) and <a href="http://clojure.org/">Clojure</a> (dynamic typing) for illustration. Haskell has one of the best type systems available at the moment, so if Haskell can't avoid the problems that I point out, it is likely that no other current language will do a better job. Clojure is a good comparison because like  Haskell it is designed for a functional programming style. Of course, it also helps that I am reasonably familiar with both languages.<br><br><strong>Example 1: abstract data types</strong><br><br>The idea behind abstract data types is that the concrete representation of some data structure should be hidden from client code, which accesses the data structure only through a set of interface functions. Let's look at how this is typically implemented in Haskell, using the <a href="http://web.engr.oregonstate.edu/~erwig/pfp/">PFP library</a> for probabilistic programming as the example (just because I happen to know it, many other libraries could serve the same purpose). In PFP, a probability distribution is represented by an abstract data type <code>Dist a</code> defined as<br><pre>newtype Dist a = D {unD :: [(a,ProbRep)]}</pre><br>This says that internally, <code>Dist</code> is a list of <code>(a, ProbRep)</code> pairs. The single constructor <code>D</code> converts such a list to the abstract data type <code>Dist</code>, whereas <code>unD</code> does the inverse: it makes the contents of a <code>Dist</code> value accessible for inspection.<br><br>The problem with this is that all of the implementation code for PFP is littered with <code>D</code> and <code>unD</code>, although they don't do anything and add nothing to the clarity of the code. They are there only to make sure that the signature of the functions contains the abstract type <code>Dist a</code> instead of the internal representation <code>[(a,ProbRep)]</code>. For the reader of the PFP code trying to understand how it works, this is clutter. There are also a couple of functions that exist only for dealing with the artificial distinction between <code>Dist a</code> and <code>[(a,ProbRep)]</code>, for example<br><pre>sizeD :: Dist a -&gt; Int<br>sizeD = length . unD</pre><br>which replaces the list function <code>length</code> (familiar to every Haskell programmer) by a special version whose purpose the reader has to remember.<br><br>A Clojure library that is essentially equivalent to PFP (look at the <a href="http://code.google.com/p/clojure-contrib/source/browse/trunk/src/clojure/contrib/probabilities/finite_distributions.clj">source code</a>) is much shorter, and in my opinion much clearer. It represents a probability distribution by a map (known in other languages as a dictionary or an associative array) and directly uses Clojure's map operations to work on it. No visual overhead, no clutter. Of course, as static typing advocates would be quick to point out, no protection of the internal representation either: client code could directly manipulate the maps used to represent distributions, potentially creating maps that are not valid probability distributions. I have never run into such a problem in 15 years of using dynamically typed languages, but in principle it is possible.<br><br>It would be possible to avoid the code obfuscation due to abstract data types by recognizing that abstract data types are an interface issue and not a type issue. A language could provide an explicit declaration of an interface for a module where the function signatures would be given with the abstract data type, even though the concrete representation is used in the implementations. The compiler could verify the coherence of everything. But I haven't seen anything like this in any statically typed language.<br><br>Note also that something very similar could be implemented in Clojure: A couple of macros would provide wrappers around the exported functions that add type verification at runtime. However, this says more about the advantage of having a powerful macro system than about the advantages of dynamic typing.<br><br><strong>Example 2: monads</strong><br><br>A monad is package consisting of a data structure (or, more precisely, certain properties that a data structure must have) and two functions <code>bind</code> and <code>result</code>. A subclass of monads also has a special value called <code>zero</code> and a subclass of this subclass has one more function called <code>plus</code>. All these definitions must obey certain rules to make a valid monad.<br><br>In Haskell, there is a typeclass <code>Monad</code> that defines <code>bind</code> (called <code>&gt;&gt;=</code>) and result (called <code>return</code>), and another typeclass <code>MonadPlus</code> that defines <code>mzero</code> and <code>mplus</code>. A monad is defined by providing instances for concrete data types. When monadic operations are used, the type inference system identifies the data type and selects the corresponding operations. From the client's point of view, a monad thus is defined by a type.<br><br>There are a few drawbacks of this setup:<br><ul><br>	<li>It is not possible to define a monad with a <code>zero</code> but no <code>plus</code>. This is a technical detail (<code>MonadPlus</code> could well be split into two typeclasses), but it's still a limitation in practical Haskell programming that is due to the rigidity of a type system.</li><br>	<li>It is impossible to have two monads for the same data type, although sometimes this would make sense. For example, there are (at least) two practically relevant ways to define <code>plus</code> for the list monad.</li><br>	<li>It is cumbersome to use the same monad operations for two different concrete data structures that are similar enough in behaviour to be used with the same monad definition.</li><br></ul><br>In Clojure, monads are values, not types. In client code, a monad is selected explicitly by the programmer by surrounding the monadic code by a <code>with-monad</code> form that specifies the monad to be used. Usually the monad is named explicitly, but since monads are values, they can also be represented by a variable. A Clojure monad can be used with any data type that the definitions of the monadic operations accept, and any number of monads can be defined for a data type.<br><br>In Clojure it is the data structure that almost disappears from monad handling; the constraints on the monadic values are given only as documentation for the human reader. As with other aspects of dynamic typing, this provides more flexibility and less protection.<br><br>For standard monads, I'd say that the clarity of code is roughly equivalent in Haskell and Clojure. Haskell gains a bit in making the data structure explicit, Clojure gains a bit in making the monad explicit at the point of use. Both work pretty well.<br><br>This changes when monad transformers come into play. In the Haskell world, this is perhaps the most frightening concept to newcomers. Monad transformers are surrounded by an aura of mystery, they are only for the real experts. I think that this is at least in part due to the complexity of defining monad transformers in a type system.<br><br>Here's how Haskell defines the list monad transformer; only the relevant parts are shown:<br><pre>newtype ListT m a = ListT { runListT :: m [a] }<br><br>instance (Monad m) =&gt; Monad (ListT m) where<br>    return a = ListT $ return [a]<br>    m &gt;&gt;= k  = ListT $ do<br>        a &lt;- runListT m<br>        b &lt;- mapM (runListT . k) a<br>        return (concat b)</pre><br>As in the case of abstract data types, there is a data type definition for <code>ListT</code> that does nothing but introduce a new notation for the type <code>m[a]</code>. <code>ListT</code> and <code>runListT</code> are just notation converters that don't actually do anything useful. But unlike in the case of abstract data types, they are indispensable here. Monads <em>are</em> types, and therefore there has to be a new type to make a new monad. It doesn't help that the name <code>runListT</code> is a particularly bad choice: it suggest an action where there is none.<br><br>The definitions of <code>return</code> and <code>&gt;&gt;=</code> aren't masterpieces of clarity either. It takes a careful analysis of each function and its (inferred, and thus unwritten) type to understand which monad is being used where.<br><br>For comparison, here is the corresponding monad transformer in Clojure, again reduced to the basics:<br><pre>(defn sequence-t [m]<br>   (monad<br>     [m-result (with-monad m<br>	         (fn [v]<br>		   (m-result (list v))))<br>      m-bind   (with-monad m<br>		 (fn [mv f]<br>                   (domonad<br>                     [a mv<br>                      b (m-map f a)]<br>		     (flatten b))))]))</pre><br>Since monads are values, monad transformers are simply functions that take a monad argument and return another monad. It is also clear at a glance that inside the definition of each monad operation, all references to monad operations are to be interpreted in the inner monad. This doesn't make monad transformers trivial to understand, of course, but it is a lot clearer.
 ]]></description> </item><item> <title>Monads in Clojure</title> <link>https://blog.khinsen.net/posts/2009/04/22/Monads-in-Clojure.html</link> <pubDate>2009-04-22</pubDate> <author>Konrad Hinsen</author> <guid isPermaLink="true">https://blog.khinsen.net/posts/2009/04/22/Monads-in-Clojure.html</guid> <category><![CDATA[ programming ]]></category> <description><![CDATA[ One of my hobby projects over the last months has been the exploration of monads. Monads are packages consisting of a data structure and associated control structures that are used as abstractions in functional programming. They were popularized by the Haskell language, where they play a central role in introducing side effects (such as I/O) in a controlled way into a language that is otherwise purely functional.<br><br>Since I was also exploring <a href="http://clojure.org/">Clojure</a>, an interesting new dialect of Lisp that strongly encourages a purely functional programming style (but doesn't enforce it), I decided to explore monads by writing a <a>monad library</a> for Clojure. My experience is that monads are quite useful in Clojure as well, and that once you get used to monads, you see occasions for using them almost everywhere. If you have been hesitating to tackle monads seriously, I can only encourage you to go on!<br><br>I have also written a monad tutorial for Clojure programmers, which I published on the <a title="OnClojure blog" href="http://onclojure.com/">OnClojure</a> blog. It consists of four parts:<br><ol><br>	<li><a href="http://onclojure.com/2009/03/05/a-monad-tutorial-for-clojure-programmers-part-1/">Part 1</a> introduces the concept of monads and illustrates it with the identity and maybe monads.</li><br>	<li><a href="http://onclojure.com/2009/03/06/a-monad-tutorial-for-clojure-programmers-part-2/">Part 2</a> explains the importance of <code>m-result</code> using the sequence monad as an example. It also covers the monad laws.</li><br>	<li><a href="http://onclojure.com/2009/03/23/a-monad-tutorial-for-clojure-programmers-part-3/">Part 3</a> is about m-zero and m-plus, and explains the state monad.</li><br>	<li><a href="http://onclojure.com/2009/03/23/a-monad-tutorial-for-clojure-programmers-part-4/">Part 4</a> covers the probability monad and monad transformers.</li><br></ol><br>I hope that this tutorial facilitates a first contact with monads for those who are more familiar with Lisp syntax than with Haskell syntax.
 ]]></description> </item> </channel> </rss>