An open letter to software engineers criticizing Neil Ferguson's epidemics simulation code
Dear software engineers,
Many of you were horrified at the sight of the C++ code that Neil Ferguson and his team wrote to simulate the spread of epidemics. I feel with you. The only reason why I am less horrified than you is that I have seen a lot of similar-looking code before. It is in fact quite common in scientific computing, in particular in research projects that have been running for many years. But like you, I don't have much trust in that code being a faithful and trustworthy implementation of the epidemiological models that it is supposed to implement, and I don't want to defend bad code in science.
However, many of your specific criticisms show a lack of familiarity with today's academic research. This code is not the sole result of 13 years of tax-payer-funded research. The core of that research is building and applying the model it implemented by the code, the code itself is merely a means to this end. The scientists who wrote this horrible code most probably had no training in software engineering, and no funding to hire software engineers. And the senior or former scientists who decided to give tax-payer money to this research group are probably even more ignorant of the importance of code for science. Otherwise they would surely have attributed money for software development, and verified the application of best practices.
But the main message of this letter is something different: it's about your role in this story. That's of course a collective you, not you the individual reading this letter. It's you, the software engineering community, that is responsible for tools like C++ that look as if they were designed for shooting yourself in the foot. It's also you, the software engineering community, that has made no effort to warn the non-expert public of the dangers of these tools. Sure, you have been discussing these dangers internally, even a lot. But to outsiders, such as computational scientists looking for implementation tools for their models, these discussions are hard to find and hard to understand. There are lots of tutorials teaching C++ to novices, but I have yet to see a single one that starts with a clear warning about the dangers. You know, the kind of warning that every instruction manual for a microwave oven starts with: don't use this to dry your dog after a bath. A clear message saying "Unless you are willing to train for many years to become a software engineer yourself, this tool is not for you."
As a famous member of your community famously said, software is eating the world. That gives you, dear software engineers, a lot of power in modern society. But power comes with responsibility. If you want scientists to construct reliable implementations of models that matter for public health decisions, the best you can do is make good tools for that task, but the very least you must do is put clear warning signs on tools that you do not want scientists to use - always keeping in mind that scientists are not software engineers, and have neither the time nor the motivation to become software engineers.
Consider what you, as a client, expect from engineers in other domains. You expect cars to be safe to use by anyone with a driver's license. You expect household appliances to be safe to use for anyone after a cursory glance at the instruction manuals. It is reasonable then to expect your clients to become proficient in your work just to be able to use your products responsibly? Worse, is it reasonable to make that expectation tacitly?
Some of you have helped with a first round of code cleanup, which I think is the most constructive attitude you can adopt in the short term. But this is not a sustainable approach for the future. We can't ask software experts for a code review every time we do something important. We computational scientists need you software engineers to help us build a better future for computer-aided research. Which means pretty much all research, because software has been eating science as well for a while. Can we count on your help?
PS added 2020-05-19T10:30: This post has provoked a lively discussion not only in the comments below but also on Twitter. There are way too many comments for me to reply to each one individually, so I decided to address recurrent topics in this follow-up.
Many people seem to have read my post as putting the main responsibility for the problems related to the cited simulation code on software engineers. This was most certainly not my intention. Scientists, policy makers, and journalists have all contributed to a less than satisfactory outcome. My open letter is clearly addressed at a particular group of people (software engineers criticizing the Imperial College Covid-19 simulations on the basis of code quality) and clearly states its focus on the role of software technology, which is what the target audience seems to overlook. A focus is always an arbitrary choice of an author for the sake of brevity or clarity. A glance at the rest of my blog should suffice to show that I do consider computational scientists responsible for their technological choices and their consequences. However, my main intention was not assigning blame for events in the past, but outline what needs to change to prevent similar events in the future.
The car analogy was another frequent target of critical comments. Cars are a mature technology, in which many professions (engineers, workers, mechanics, driving instructors, drivers, etc.) have well-defined roles and everyone involved has a general understanding of the role of everyone else. Software is an immature technology in which roles remain fuzzy and everyone has an even fuzzier view of which other roles exist and who fills them. The discussion of my open letter has provided ample evidence for this all-encompassing fuzziness. What we collectively need to work on is turning software into a mature technology. That requires all stakeholders to make their own role views explicit and then negotiate shared role definitions with everyone else. Several commenters have pointed out the emergence of research software engineers (RSEs) as a sign for progress, and I completely agree. But even the role of RSEs remains fuzzy at this time. Should they work a collaborators on research projects, with a particular specialization? Or as occasional consultants or service providers to researchers? Their interaction with the software engineering universe is even less clear. For now it is mostly one-way in that RSEs bring software technology from the outside into research labs. What my letter argues for is an action in the opposite direction: make software technology evolve to adapt to the specific needs of scientists. A big problem is culture clash. In academia, scientists are traditionally on top of the power pyramid and are used to everyone else working for them (even though the top position is now held by managers, but that's a different story). In the tech world, it's software engineers who are kings and used to everyone else, including their clients, obeying their directives. In the worst case, RSEs might find themselves trapped in the valley between two power pyramids. In the ideal case (from my point of view), they will be diplomats working towards a merger of the two kingdoms, with a simultaneous transformation into a democracy.
Comments retrieved from Disqus
- cd:
I have been involved in both sides of this. And my code for academic research purposes was shit. It was written to get the job done. I gave no thought to performance, maintainability or anything else for that matter it wasn't even structured.
When I got a job as a professional I got a real culture shock. The standards that are required are orders of magnitude higher.
You might say well scientists have to do other bits of research to as well as write the code. And that is true. But it also pains me to say that before becoming a professional software engineer I also worked as scientist in a commercial company. And again the standard of research and development was much, much higher.
Academia is sloppy and peer review is sloppy.
- Brian L. McMichael:
This is like trying to build a house without any previous experience and then blaming professional homebuilders for not making it easier for commonfolk to nail 2x4's together.
- David Sarma:
For the type of software that's under discussion (a concrete realization of a mathematical model), what the scientist cares about is the mathematical model, not the realization of it. This is why "software quality" is shunned as a concern: ideally, it should NOT be something that one has to be concerned about. The ideal scenario would be an algorithmic translation of the mathematical model into computer instructions, with no human there to provide inconsistency and bugs into the process.
The direction that things are headed are pointed to by projects like CVXPY / CVXR. We want a compiler for mathematical language, whose output we for the most part don't have to look at or care about, in the same sense that programmers do not for the most part inspect the assembly language output of their programs, and criticize them for being poorly organized, verbose and unreadable monstrosities. The *solvers* that the model uses of course should be under the most intense scrutiny by the most skilled software engineers... but this goes beyond the scope of the scientific part of the project, in the same sense that we depend on linear algebra libraries working correctly, but modeling greenhouse gases is NOT linear algebra.
In other scenarios, flipping to the dual marks the maturation of a field (ex. "classical" renderers transitioning to physically-based rendering), the end of certain classes of conflict and stress (caustic situations and antagonistic relationships), and the ability to focus on content rather than technology (telling good stories vs attaining photorealism). (Other side effects are, deprecations and job loss, industry-wide collapse in some cases, or transition into other business models.) The injection of constraint solvers into mainstream software engineering (in the manner that Rust does) will likely lead to similar outcomes: the end of certain classes of free-for-all improvisation, and better ability to focus on the content under discussion.
- Konrad Hinsen:
Thanks for pointing out that there are indeed some developments pointing in the right direction!
- Konrad Hinsen:
- Undercover modeller:
I think there is a point that is being missed. This software is essentially repurposed software. Its software built for academic purposes being repurposed as business/nation critical software for making decisions that affect life and death decisions for thousands of people and affect the livelihoods of many millions of people.
I write business critical models as a living using software engineering processes that I've been taught over the years. However, if I was asked to write, say, safety critical software for an plane. I would not apply the same processes, nor know what processes should be applied.
The issue lies in those who commissioned the software, and to a certain and lesser extent, the academics who built it, who should have known that using academic software development techniques was inappropriate for business critical software that might have such a major impact on people's lives.
- Konrad Hinsen:
That's an interesting remark. Yes, the software has been repurposed. But: that happens all the time with research software. The small function written for the exploration of a dataset ends up in community-managed software and then maybe in industrial applications. Nobody ever commissions software in academia. It's very much bottom-up.
- Undercover modeller:
Alas that is true. That's why we never incorporate open source components into our models without either rewriting it or subjecting it to our own testing program.
- Undercover modeller:
- Konrad Hinsen:
- Colin Gillespie:
So while I sort of agree with your argument, I do think that academics hold much of the blame.
An analogous situation is statistics. Ask any statistician that to perform a vaguely complex analysis requires training and experience, yet many scientists are happy to just copy and paste code/analysis from random parts of the internet.
In building software, the REF (run by academics), actively snubs contributions to software. Instead, they are encouraged to have the "Facebook" type model, publish often and fast. How often are papers retracted if the software is wrong or has a bug?
- Michael Höhle:
First page of the OpenBugs Manual - http://www.openbugs.net/Man... https://uploads.disquscdn.c...
- Konrad Hinsen:
Excellent - thanks for this example!
- Konrad Hinsen:
- Brian Sides:
There is a department of Computing at Imperial College London
https://www.imperial.ac.uk/...
Where they teach computer programming"Welcome to the Department of Computing
Computers are the most significant and exciting technological innovations of the last hundred years. In the future, they will play an even more considerable role in medicine, the sciences, industry, communication and the arts. It's safe to say that the science of Computing will remain a vitally important part of modern civilisation and will be responsible for many of the most important changes in the world in which we live.Career prospects
Our graduates have the highest average salary for a computing degree in the UK and have gone into a range of careers including Media, Software, Finance and Research with employers such as Google, Microsoft, Facebook, Amazon and Bloomberg. A career in Computing opens the door to a wide range of careers."Yet over a period of more than 20 years a pandemic computer model was developed.
This is the same pandemic model used for Neil Ferguson.s previous predictions
That were so far off. 2001 mad cow disease leading to Six millions cattle and sheep slaughtered.
Millions were spent buying vaccines against swine flu in 2009 .If this was some internal test program put together quickly . Then you might expect this quality of code. But even then some bad practices have been employed.
I have emailed some in charge of the computing department asking for comment on the code. But no reply.
Obviously the code was not developed by the computer programming department.
Those developing the pandemic model thought they were so clever they did not need bother with things like documentation and testing or checking with those who know how to programme.The crude method of projecting numbers forward is as questionable as the code.
If I write a computer program to calculate how many chip shops will be in my small town. If like Neil Ferguson report 9 says It is Exponential doubling every 5 days.
The program will predict in six month there will be over 8 billion chip shops in my small town.
Computers do not have brains, They can not know anything ,that 8 billion chip shops in a small town is impossible, - elgato:
As a physicist that development software for my own research and for fun I couldn't agree more with this letter. I code in c++ and HPC mainly gpus. In academia there is some grade of mis appreciation for developing good code. Scientist, that like me, try to make things better we are seen as people that lose time instead of producing results. Arrogance is also a problem. Learning the idiomstics of a given programing language is easy but writing a maintainable code is not. Long time ago after many years of programming I learnt design patterns which changed my way to code. From there I moved to more in depth about software and how to. But the truth is that in school nobody bothered to tell us about it. We learn by simple doing with no formal education whatsoever which is bad, really bad. This piece of code is only an example of many code out in the wild used by people in day to day bases, just this one as it happens it might affect decision of policies makers. Also, for software developers in here, not all the scientist produce the piece of crap that it's being discuss here, please don't put all the people in the same bag.
- Brian Sides:
The original code was written in 'c' not 'C++' there are some Fortran functions that are supported by the 'c' library (some think the code was written originally in Fortran and ported to 'c' ) The code has now been ported to 'C++' and split into multiple files but with out using the object orientated features . The code is still mostly just 'c'. Bugs have been found during the conversion process.
The code was written over a period of more than 20 years. Many thousands of man hours.
at the end they had produced one single file of 15,000 lines of code
that is less that 2 lines of code a day
The code is undocumented with a host of single letter variables.
Data is read and written with out error checking , data is not verified there is no file signature or checksum.
It is very simplistic simulation code.
All code needs to be tested. Important code needs to be independently tested.These are highly qualified highly paid people . They had a team working on this.
There has been a large investment. Where was the management over site.It is clear from comments by Neil Ferguson that he thought that it was thousands of lines of undocumented code he kept in his head . Was not a problem , he was kind of proud of it,
As well as the code the method of taking some data of questionable choice then making a many assumptions and applying these to a limited simulation with a small set of real world statistics . That in no ware takes into account the way these and other factors interact. Is very questionable.
There is no excuse . Sage have failed to check the model had been properly tested.
The predictions from this faulty model have misinformed the Government
and led to this ill informed Lock down - sde-2243:
I think, it consists of few different topics, hardly mixable.
1) When I buy a car, I might get (and might not get) some warnings. However nobody expects car manufacturer to teach me how to drive. This is a skill, and I spent literally months and thousands of dollars honing this skill. Still, even I got to the level i can participate in racing events, I would not be arrogant enough to try to drive 18-wheeler. Or bus. And if i try, i would *not* blame others for collision.
Somehow we think that because we have a computer, we possess skills necessary to develop software. Or, if we learn a language, we learn software engineering. This is wrong: this is an acquired skill. Junior engineers coming from college spent years learning how to develop robust systems quickly. Using analogy, I have a chef knife, so why I cannot cook like a Chef? Oh, and by the way -- there was not a single word of warning on the knife when I bought it. There was no video how to hold it properly, where I should use chef knife, peeling knife, and so on. Not a squeak on how to maintain it, how to wash, how to store, how to sharpen.
2) The world is changing quickly. What used to be highly-professional activity quickly becoming a side-skill for people professional in different area, being it biologists, physicists, or computational scientists. Apparently, there is *an emerging market* for development tools for this non-specialists.
However it is hardly reasonable to expect these languages and tools to come from industry *evolution.* [At some point Niklaus Wirth was asked why he does not participate in language standardization. He answered that he is teaching. To teach students, he need a modern language. So he creates one. Standardization is needed by industry -- so let industry do standardization.] Industry does not know what academia needs. And does not care -- justly. But the market means that some company, some group of people might start to work on product that is needed for this market, and start to sell it. [Stephen Wolfram's Mathematica is a great example of such product.]
3) Why this solution cannot emerge from academia? There are computer science / software engineering departments. So, why do you want somebody else to solve your problems, instead of stopping in your colleague's office?
Again, we have examples. Some quite interesting facts in mathematics are proved by software. It was academia that developed tools for proofing, and created validation means for this tools. In fact, academia is more interested in formal validation of programs than software industry (as whole). So, if it is possible for mathematicians, why not for others?
- David Frenk:
C++ isn't the problem. Academic researchers write terrible code in every language they use. Python is pretty much the most user-friendly language imaginable, and most academic python code is spaghetti too. Peer review (especially in an open source context wherever possible), and better software engineering training for academics who need to write code are the best solutions here.
- Jef “Credible Hulk” Spaleta:
when the peer review of the code itself become as important to career advancement as the scientific results publication...things will get better. Otherwise, it won't. Academic researchers by and large are not incentived to write maintainable code. For projects with a large enough budget, you start seeing staff engineers hired to maintain critical codebases, but if the researcher is writing it, its really not expected to be maintainable.
And while there is an effort put into peer review of the published articles that appear in scientific journals the same effort is not usually required for the digital artifacts (the software) that was used to produce the results expressed in the articles. As it stands career advancement is not predicated on being proficient at producing readable, reusable robust code. Publish or perish doesn't generally apply to the software.. it is what it is.
- Konrad Hinsen:
That is indeed an important point, but it's also important to realize that improving the situation is not easy. Reviewing scientific code upon publication requires (1) accepted standards for code quality and (2) reviewers compensated in some way for the significant effort that code review represents. Which is one of the reasons why I ask for better tools: to reduce the effort in code reviews.
- Konrad Hinsen:
- Jef “Credible Hulk” Spaleta:
- Michael:
Coming from a non-CS academic background, I disagree with you. This sentence in particular:
It’s also you, the software engineering community, that has made no effort to warn the non-expert public of the dangers of these [C++] tools.
does not match my experience at all. Every conversation with a scientist that touched upon C++ I can remember has amounted to "yeah, but C++ is very hard, let me use MATLAB/Python/R instead." If you want to convince yourself, go around university departments and poll graduate students on whether or not they think C++ is easy. This will render the fact that you cannot identify "clear warnings" irrelevant, since if everyone agrees that C++ is not easy, that provides evidence those warnings from the software community are coming through to non-experts.That does not mean Ferguson's code doesn't shed light on a serious problem. The code would not have looked much different in MATLAB/Python/R. The problem is not the language, but the inherent practices that should be involved in developing scientific code: version control, unit tests, documentation, reproducibility. Most academic groups, however, provide little to no training or emphasis on the importance of these tools. The value is instead placed on publishing papers. The purpose of most academic code is usually to be just good enough to plot the graphs that are used for a paper. This is the case even in fields that should be CS-minded, such as applied mathematics. You will never get tenure for writing good code, and your graduate students have no incentive to write good code -- unless, of course, they want to solve a real problem. But then they often go to industry.
If you want to avoid this, educate the older academics. This is already happening. Software engineering, in my view, is extremely transparent about your main gripe: C++ is not for beginners.
- Ariel Fogel:
Thanks for writing this. As someone who left software development to go back to academia in a field not directly related to CS, one thing I notice is that there's often not a ton of time to implement best practices unless that's part of the culture of the lab (read: unless your lab is led by a computer scientist who is well trained in software engineering).
Even if there was enough time, more often than not you're writing one-off scripts or things that are what I referred to as a spike in a developer context. And unfortunately, sometimes those spikes continue to get developed. But a lot of times they aren't b/c research is inherently about taking well-informed stabs at the unknown and seeking to uncover something new. It's hard to know when it's worthwhile to start with best practices or the tech debt is high enough that it necessitates a refactor. And even more difficult if you're trying to get funding for that. I'm not sure my lab would be able to write grants that also ensure we TDD every, or even some, pieces of research we produce.
And that's with me having been exposed to software development practices and having access to some professional programmers who help with our research, which most of my peers haven't and don't. Speaking of which, I'm going to go back to writing crappy code now :)
- David Hicks:
I've come here from Hacker News where there's a little outrage going on right now ...
I think computational scientists are increasingly going to need to get code reviewed by experts, particularly in areas where that code affects public policy. There are a bunch of ways this might be achieved, and publishing of source code openly under a FOSS license could help here. But it may be that you need to pay people to build models (or pay people to design some sort of extensible model framework) for you.
To look at your analogy here - "You expect cars to be safe to use by anyone with a driver’s license."
Yes, I do. But I don't expect to be able to go to the Ford factory, pick up some tools and make a car that meets road use regulations without some training. By using C++ you've wandered in and had a go with an arc welder, and now you're annoyed with us at the result?- Konrad Hinsen:
"I think computational scientists are increasingly going to need to get code reviewed by experts"...
Let me translate: "We, the software industry, set the rules by which everyone has to play for using computers. If scientists want to do computations, they will have to consult with us and pay us for that."
That's "software is eating the world" at its best. And that's exactly what my open letter is arguing against. You may of course disagree, as this is a question of policy, but then there is no need for further discussion: you and me have conflicting interests.
- David Hicks:
I'd also like to ask, respectfully, if you would resent the suggestion of getting an Architect to look over the plans you drew up for a house you were building? And paying them to do so?
Software Engineering is a skilled profession, we spend a lifetime learning, practising and perfecting it, but it's somehow wrong to suggest that you might want to consult with someone to help get it right?
- Konrad Hinsen:
Houses are like cars: mature technologies where all the roles are well defined. There are no "house planning for dummies" books that lure people into designing their own house without help from an architect.
I am perfectly fine for software engineering to become as mature as architecture, and left to qualified professionals. But computational scientists need to be able to do their job autonomously. Which is not the case as long as badly designed systems programming languages are almost inevitable for implementing scientific models.- David Hicks:
So now we should set the rules, and be gatekeepers of knowledge? I'm getting very mixed messages here.
I'm not a C++ coder by trade, very often, so I'll leave them to answer your criticism that it's poorly designed.
You're asking that computational scientists be able to produce work as well as experienced software engineers can, with no training and with no oversight, without engaging with experienced people to help build out your models, and certainly without paying for any of their insight. Why do you think that should even be possible? Are you haranguing chemical engineers because anyone should be able to build an oil distillation column and it's their fault yours blew up?
Our discipline is almost uniquely open, you can learn, you can build, we give access to tools and platforms, we share amongst ourselves and with anyone that wants to learn. But that doesn't mean that after reading a couple of intros to C++ you're going to make flawless programs and frankly I find it arrogant that you think you should be able to just bypass the training and achieve comparable results. There's a reason your university has a whole department for computer science.
- Konrad Hinsen:
I am not asking that computational scientists should be able to do zero-effort software engineering. They should be able to develop and evaluate scientific models on their own, using tools designed by software engineers. Much like ordinary people write letters using word processing software.
To give an example for how this could work (I am not saying this particular approach will work, but I think it's worth investigating): design a stack of ever more specialized DSLs, with a general-purpose programming language at the bottom and each successive layer on top of it specializing towards a scientific application domain. Most scientists could then work most of the time at a level they can manage on their own. When they hit the limits of their DSL, they'd work with RSEs on a more appropriate DSL for their specific problems.
However, what I outlined above is not a technology fix. Those DSLs should each correspond to a role and a competence profile. It's not just a software stack with layers of abstractions introduced to facilitate maintenance by teams of people who have basically all the same profile. Another important point is interoperability. Lots of specialized DSLs can only work in practice if the epidemiology DSL can interoperate with the statistics DSL and the ODE DSL.
- Konrad Hinsen:
- David Hicks:
- Konrad Hinsen:
- David Hicks:
I strongly disagree with your (mis)characterisation there, particularly as I suggested publishing open source as a way to get more eyes on the code.
We don't set the rules, clearly, as you can find and use a good many of our tools for free, in whichever way you want, as demonstrated here. But if you're not trained or experienced you're not always going to get the best results, and perhaps you should be looking for outside help.
Me, I don't expect to be any good at arc welding without some help and training either.
(edit - I don't even expect to be any good at writing software without getting other people to review it!)- Ondřej Čertík:
I think Konrad is arguing for domain scientists to be able to write software by themselves, without needing CS experts (whether paid or open source) to help fix up their code. I agree with that 100%.
- David Hicks:
I would argue that anyone producing software that is going to be relied upon for published scientific results, particularly scientific results that are used to inform public policy, should have such software reviewed by peers, and probably a wider audience than that if the peers are similarly non-expert.
You might not wish to involve CS 'experts', (and this isn't really CS, but Software Eng) but perhaps some of the habits of such people should be explored. I wouldn't dream of deploying something that hadn't had other eyes on it.
I agree in the abstract that it's a good thing to create tools for scientists to need as little assistance as possible, and it looks like you're working towards that end - good stuff :)
But I also think that fundamentally, to produce good software, you need more than one person and you need experienced eyes. It's in the nature of the game.
- Ondřej Čertík:
David, thanks for the comment -- I agree that one should not work in isolation and the more reviews the better. At the same time I like what Konrad said below that computational scientists need to be able to do their job autonomously. It's not mutually exclusive, we should strive for both.
- Ondřej Čertík:
Yes I agree that it's always good to have more than just one person to look over any code.
- Ondřej Čertík:
- David Hicks:
- Ondřej Čertík:
- David Hicks:
- Konrad Hinsen:
- Ondřej Čertík:
Thanks for the post Konrad. I have couple thoughts on this. One is, that Fortran would be a great fit for this kind of code, and one thing I am planning with the LFortran (https://lfortran.org/) compiler to do once it is more mature is to give "pedantic" (so to say) warnings or even errors on code constructs that should not be used, even though they are perfectly legal Fortran. From little things like enforcing "implicit none" and not allowing "implied save" or not specifying a precision for floating point and other typical pitfalls. And in the long run, I am hoping the compiler can detect a lot more constructs that should be discouraged, such as using pointers instead of allocatable arrays and even things like every time a subroutine has a side effect or when a global variable is declared, the compiler could give a warning, and you must put in some kind of a comment documenting / acknowledging that's what you really want .That way I believe the compiler with excellent warning and error messages can greatly help teach non-expert programmers how to write higher quality code. Part of this is also that in Debug mode, it should check absolutely everything, from integers wrapping around, to any kind of memory issues such as dangling pointers. I think all of this can technically be done.
However, ultimately this goes much beyond just better compilers, and that is the main point of your blog post I think. I personally like C++ for things like writing compilers, but for scientific computing I think it's not great, because every big C++ code that I have seen requires to have CS experts on the team to keep fixing up issues that the domain scientists make. As you also imply in your post.
Fortran is much better suited, but currently it is falling short on its mission, it's lacking tooling, the compiler quality is not great, does not run on modern hardware such as GPUs, etc. I am trying to fix all that, see e.g., some of our recent efforts:
https://ondrejcertik.com/bl...
But this is something that should have been done 20 years ago, because even if we are 100% successful in our vision, it will still take 5 to 10 years before Fortran achieves it.
But I think it goes even beyond that. Even with a language that is better suited for numerical programming, and an excellent compiler that can guide the user to write using the "best practices", I think one also needs to adopt "modern social practices", which is to post the code as open source at GitHub or GitLab, and build a community around it.
Summary: I think there is a huge opportunity to provide high quality tools for domain scientists to use and we have a long way to go.
- Themos:
The NAG Fortran compiler can check array bounds, integer overflow, undefined variables, dangling pointers, memory leaks and more. But getting unreliable numbers faster and cheaper has been a siren's call few can resist.
https://wg5-fortran.org/N19... addresses Fortran vulnerabilities. Documents exist for other languages.
In my view, the fundamental problem is that (non-CS) research codes are not derived from specifications. Huge parameter spaces abound and they are not explored adequately.
Without careful tuning of incentives, I can't see how we will end up in a better place.
- Ondřej Čertík:
@disqus_BXzvDTvCEf:disqus thanks for the comment. Indeed we use the NAG compiler, it's great and the number of things it can catch is awesome. In my comment above I suggest we explore ways how to go even beyond what the NAG compiler can currently catch. Thanks for the link to the N1965 document.
- Ondřej Čertík:
- Konrad Hinsen:
Thanks for your comments Ondřej. Your work on improving Fortran is very much in line with what I think we (computational science) need. And I certainly agree about developing best practices, which is fortunately already going on.
- Themos:
- orca:
this is quite off the mark. the author sets the bar too low for himself by criticizing the most easily (and to be fair legitimately) dismissed criticisms of the Imperial College model by software engineers. Here's a better laid-out critique that the OP doesn't speak to:
The Imperial College modelers released the source code a couple of days ago to the model that shut down the world economy. It's not the original
model code but was rather original source code turned over to volunteer
programmers who re-wrote it so that is more readable. I have done some
model review of financial models in the past but without the source code
I would not be able to do a full review of the Imperial College model.
Now that we have the source code (sort of), I can.
Any such model ought to have been independently reviewed before it is ever
used for real policy decisions. Policy analysis is awash in models but
no one ever really checks them. Going forward, health policy makers
should ask for and disclose independent validation of any model before
using its results to make recommendations of any consequence.
Normally, model reviews are long technical documents but there would also be a
summary section. Here's what I think a summary should have looked like.
...
Overall conclusion: this model cannot be relied on to guide coronavirus policy.
Even if the documentation, coding, and testing problems were fixed, the
model logic is fatally flawed, which is evidenced by its poor
forecasting performance.https://www.facebook.com/sc...
- Konrad Hinsen:
This is a very different critique that I actually mostly agree with. Policy decisions should indeed be based not just on "science", but on trustworthy scientific findings. How to do that in an emergency is of course a different question again.
- Konrad Hinsen:
- MaxSchumacher:
The analogy to cars is flawed, because C++ isn't an end product for untrained users, if you want to stick to the car industry, then C++ is a blowtorch, a tool used by professionals. The scientists shouldn't have used tools they don't understand and base policy recommendations on the output of a blackbox they cannot reason about; admitting ignorance is vastly better than pretending to understand.
I don't believe in the perfect separation of model and implementation: you learn about the world once the code is running and results are produced. One can argue that if you cannot build it, you don't understand it.
- Konrad Hinsen:
We seem to agree that C++ is not an end user product. But show me a single C++ tutorial aimed at novices that clearly says so! How are scientists supposed to realize that they don't understand a product if all the descriptions of that product tell them "don't worry, it's easy"?
- MaxSchumacher:
nobody in the history of the world has ever uttered the phrase "
"don't worry, it's easy" to refer to C++ It is a famously complex and large language.Plenty of C++ books talk about how to write good code and how to use the language, violating those recommendations is akin to putting your dog in the microwave.
The basics of software quality aren't arcane knowledge uniquely accessible to greybeards, you'll find them in countless entry-level books and blog posts:
- use descriptive names for variables and functions
- try to keep functions small
- use comments for difficult spots
- document your work
- test your code vigorously
- get a least one review on the code
- use a version control systemI wouldn't conduct brain surgery and, after failing miserably, complain to the people making the scalpel: "Hey! You should have put a warning label on this!"
- Konrad Hinsen:
Me neither. The people I'd complain to are the authors of "Brain surgery for dummies", as well as brain surgeons performing live on television, explaining their techniques. The problem is not proposing power tools, but advertising them to non-specialists.
- boromict cumbordor:
first hit for "c++" "don't worry" "it's easy": https://books.google.com/bo...
- Konrad Hinsen:
- MaxSchumacher:
- Konrad Hinsen: