Think of all the things you hate about using computers in doing research. Software installation. Getting your colleagues’ scripts to work on your machine. System updates that break your computational code. The multitude of file formats and the eternal need for conversion. That great library that’s unfortunately written in the wrong language for you. Dependency and provenance tracking. Irreproducible computations. They all have something in common: they are consequences of the difficulty of composing digital information. In the following, I will explain the root causes of these problem. That won’t make them go away, but understanding the issues will perhaps help you to deal with them more efficiently, and to avoid them as much as possible in the future.
Yesterday a blog post by Cyrille Rossant entitled “Moving away from HDF5” caught my eye. My own tendency at the moment is to use HDF5 more and more, so I was interested in why someone else would want to do the opposite. Here is my conclusion after reading his post, plus some ideas about where scientific data management is or should be heading in my opinion.
We all know that software deployment in a research environment can be a pain, but knowing this as a fact is not quite the same as experiencing it in reality. Over the last days, I spent way more time that I would have imagined on what sounds like a simple task: installing a scientific application written in Python on a Linux machine for use by a group of students in a training session. Here is an outline of the difficulties, in the hope that it will (1) help others who face similar problems and (2) contributes a little bit to improving the situation.