# Posts from 2009-05

## Static typing and code clutter

I have used various programming languages over the years, including both statically typed and dynamically typed ones. But when given a choice, I have always preferred dynamic typing. Since 1995, my main programming language has been Python, and more recently I have started to use Clojure. One of the reasons for this preference is something that I have never seen expressed before: static typing often adds visual clutter to the code that makes it harder to read.

An important property of any non-trivial computer program is its clarity to human readers. Both verification of a program's correctness and the overall utility of a piece of code in a context of changing requirements depend on this. Well-written specifications and unit tests help as well, but if you want my advice on the quality of a piece of code, or if you want my help with modifying it, my judgement will mostly be based on its clarity. If it's an effort to understand what's going on, I wouldn't want to work with it.

This criterion for code quality immediately translates into a criterion for programming languages: they should be able to express as many concepts of software engineering as possible in a direct, explicit way and without imposing any clutter or obfuscation. Static type systems often get in the way, either by imposing clutter or by encouraging a less clear programming style.

In my examples, I will use the languages Haskell (static typing) and Clojure (dynamic typing) for illustration. Haskell has one of the best type systems available at the moment, so if Haskell can't avoid the problems that I point out, it is likely that no other current language will do a better job. Clojure is a good comparison because like Haskell it is designed for a functional programming style. Of course, it also helps that I am reasonably familiar with both languages.

**Example 1: abstract data types**

The idea behind abstract data types is that the concrete representation of some data structure should be hidden from client code, which accesses the data structure only through a set of interface functions. Let's look at how this is typically implemented in Haskell, using the PFP library for probabilistic programming as the example (just because I happen to know it, many other libraries could serve the same purpose). In PFP, a probability distribution is represented by an abstract data type

`Dist a`

defined asnewtype Dist a = D {unD :: [(a,ProbRep)]}

This says that internally,

`Dist`

is a list of `(a, ProbRep)`

pairs. The single constructor `D`

converts such a list to the abstract data type `Dist`

, whereas `unD`

does the inverse: it makes the contents of a `Dist`

value accessible for inspection.The problem with this is that all of the implementation code for PFP is littered with

`D`

and `unD`

, although they don't do anything and add nothing to the clarity of the code. They are there only to make sure that the signature of the functions contains the abstract type `Dist a`

instead of the internal representation `[(a,ProbRep)]`

. For the reader of the PFP code trying to understand how it works, this is clutter. There are also a couple of functions that exist only for dealing with the artificial distinction between `Dist a`

and `[(a,ProbRep)]`

, for examplesizeD :: Dist a -> Int

sizeD = length . unD

which replaces the list function

`length`

(familiar to every Haskell programmer) by a special version whose purpose the reader has to remember.A Clojure library that is essentially equivalent to PFP (look at the source code) is much shorter, and in my opinion much clearer. It represents a probability distribution by a map (known in other languages as a dictionary or an associative array) and directly uses Clojure's map operations to work on it. No visual overhead, no clutter. Of course, as static typing advocates would be quick to point out, no protection of the internal representation either: client code could directly manipulate the maps used to represent distributions, potentially creating maps that are not valid probability distributions. I have never run into such a problem in 15 years of using dynamically typed languages, but in principle it is possible.

It would be possible to avoid the code obfuscation due to abstract data types by recognizing that abstract data types are an interface issue and not a type issue. A language could provide an explicit declaration of an interface for a module where the function signatures would be given with the abstract data type, even though the concrete representation is used in the implementations. The compiler could verify the coherence of everything. But I haven't seen anything like this in any statically typed language.

Note also that something very similar could be implemented in Clojure: A couple of macros would provide wrappers around the exported functions that add type verification at runtime. However, this says more about the advantage of having a powerful macro system than about the advantages of dynamic typing.

**Example 2: monads**

A monad is package consisting of a data structure (or, more precisely, certain properties that a data structure must have) and two functions

`bind`

and `result`

. A subclass of monads also has a special value called `zero`

and a subclass of this subclass has one more function called `plus`

. All these definitions must obey certain rules to make a valid monad.In Haskell, there is a typeclass

`Monad`

that defines `bind`

(called `>>=`

) and result (called `return`

), and another typeclass `MonadPlus`

that defines `mzero`

and `mplus`

. A monad is defined by providing instances for concrete data types. When monadic operations are used, the type inference system identifies the data type and selects the corresponding operations. From the client's point of view, a monad thus is defined by a type.There are a few drawbacks of this setup:

- It is not possible to define a monad with a
`zero`

but no`plus`

. This is a technical detail (`MonadPlus`

could well be split into two typeclasses), but it's still a limitation in practical Haskell programming that is due to the rigidity of a type system. - It is impossible to have two monads for the same data type, although sometimes this would make sense. For example, there are (at least) two practically relevant ways to define
`plus`

for the list monad. - It is cumbersome to use the same monad operations for two different concrete data structures that are similar enough in behaviour to be used with the same monad definition.

In Clojure, monads are values, not types. In client code, a monad is selected explicitly by the programmer by surrounding the monadic code by a

`with-monad`

form that specifies the monad to be used. Usually the monad is named explicitly, but since monads are values, they can also be represented by a variable. A Clojure monad can be used with any data type that the definitions of the monadic operations accept, and any number of monads can be defined for a data type.In Clojure it is the data structure that almost disappears from monad handling; the constraints on the monadic values are given only as documentation for the human reader. As with other aspects of dynamic typing, this provides more flexibility and less protection.

For standard monads, I'd say that the clarity of code is roughly equivalent in Haskell and Clojure. Haskell gains a bit in making the data structure explicit, Clojure gains a bit in making the monad explicit at the point of use. Both work pretty well.

This changes when monad transformers come into play. In the Haskell world, this is perhaps the most frightening concept to newcomers. Monad transformers are surrounded by an aura of mystery, they are only for the real experts. I think that this is at least in part due to the complexity of defining monad transformers in a type system.

Here's how Haskell defines the list monad transformer; only the relevant parts are shown:

newtype ListT m a = ListT { runListT :: m [a] }

instance (Monad m) => Monad (ListT m) where

return a = ListT $ return [a]

m >>= k = ListT $ do

a <- runListT m

b <- mapM (runListT . k) a

return (concat b)

As in the case of abstract data types, there is a data type definition for

`ListT`

that does nothing but introduce a new notation for the type `m[a]`

. `ListT`

and `runListT`

are just notation converters that don't actually do anything useful. But unlike in the case of abstract data types, they are indispensable here. Monads *are*types, and therefore there has to be a new type to make a new monad. It doesn't help that the name

`runListT`

is a particularly bad choice: it suggest an action where there is none.The definitions of

`return`

and `>>=`

aren't masterpieces of clarity either. It takes a careful analysis of each function and its (inferred, and thus unwritten) type to understand which monad is being used where.For comparison, here is the corresponding monad transformer in Clojure, again reduced to the basics:

(defn sequence-t [m]

(monad

[m-result (with-monad m

(fn [v]

(m-result (list v))))

m-bind (with-monad m

(fn [mv f]

(domonad

[a mv

b (m-map f a)]

(flatten b))))]))

Since monads are values, monad transformers are simply functions that take a monad argument and return another monad. It is also clear at a glance that inside the definition of each monad operation, all references to monad operations are to be interpreted in the inner monad. This doesn't make monad transformers trivial to understand, of course, but it is a lot clearer.

Tags: computational science, computer-aided research, emacs, mmtk, mobile computing, programming, proteins, python, rants, reproducible research, science, scientific computing, scientific software, social networks, software, source code repositories, sustainable software

By month: 2023-11, 2023-10, 2022-08, 2021-06, 2021-01, 2020-12, 2020-11, 2020-07, 2020-05, 2020-04, 2020-02, 2019-12, 2019-11, 2019-10, 2019-05, 2019-04, 2019-02, 2018-12, 2018-10, 2018-07, 2018-05, 2018-04, 2018-03, 2017-12, 2017-11, 2017-09, 2017-05, 2017-04, 2017-01, 2016-05, 2016-03, 2016-01, 2015-12, 2015-11, 2015-09, 2015-07, 2015-06, 2015-04, 2015-01, 2014-12, 2014-09, 2014-08, 2014-07, 2014-05, 2014-01, 2013-11, 2013-09, 2013-08, 2013-06, 2013-05, 2013-04, 2012-11, 2012-09, 2012-05, 2012-04, 2012-03, 2012-02, 2011-11, 2011-08, 2011-06, 2011-05, 2011-01, 2010-07, 2010-01, 2009-09, 2009-08, 2009-06, 2009-05, 2009-04