Thursday, June 19, 2014

The Safyness of Static Typing

I like static (manifest) typing. This may come as a shock to those who have read other posts of mine, but it is true. I certainly am more comfortable with having a MPWType1FontInterper *interpreter rather than id interpreter. Much more comfortable, in fact, and this feeling extends to Xcode saying "0 warnings" and the clang static analyzer agreeing.

Safety

The question though is: are those feelings actually justified? The rhetoric on the subject is certainly strong, and very rigid/absolute. I recently had a Professor of Computer Science state unequivocally that anyone who doesn't use static typing should have their degree revoked. In a room full of Squeakers. And that's not an extreme or isolated case. Just about any discussion on the subject seems to quickly devolve into proponents of static typing claiming absolutely that dynamic typing invariably leads to programs that are steaming piles of bugs and crash left and right in production, whereas statically typed programs have their bugs caught by the compiler and are therefore safe and sound. In fact, Milner has supposedly made the claim that "well typed programs cannot go wrong". Hmmm...

That the compiler is capable of catching (some) bugs using static type checks is undeniably true. However, what is also obviously true is that not all bugs are type errors (for example, most of the 25 top software errors don't look like type errors to me, and neither goto fail; nor Heartbleed look like type errors either, and neither do the top errors in my different projects), so having the type-checker give our programs a clean bill of health does not make them bug free, it eliminates a certain type or class of bugs.

With that, we can take the question from the realm of religious zealotry to the realm of reasoned inquiry: how many bugs does static type checking catch?

Alas, this is not an easy question to answer, because we are looking for something that is not there. However, we can invert the question: what is the incidence of type-errors in dynamically typed programs, ones that do not benefit from the bug-removal that the static type system gives us and should therefore be steaming piles of those type errors?

With the advent of public source repositories, we now have a way of answering that question, and Robert Smallshire did the grunt work to come up with an answer: 2%.

The 2%

He talks about this some more in the talk titled The Unreasonable Effectiveness of Dynamic Typing, which I heartily recommend. However, this isn't the only source, for example there was a study with the following title: An experiment about static and dynamic type systems: doubts about the positive impact of static type systems on development time (pdf), which found the following to be true in experiments: not only were development times significantly shorter on average with dynamically typed languages, so were debug times.

So all those nasty type errors were actually not having any negative impact on debug times, in fact the reverse was true. Which of course makes sense if the incidence of type errors is even near 2%, because then other factors are almost certain to dominate. Completely.

There are more studies, for example on generics: Do developers benefit from generic types?: an empirical comparison of generic and raw types in java. The authors found a documentation benefit, no error-fixing benefits and a negative impact on extensibility.

Others have said it more eloquently than I can:

Some people are completely religious about type systems and as a mathematician I love the idea of type systems, but nobody has ever come up with one that has enough scope. If you combine Simula and Lisp—Lisp didn’t have data structures, it had instances of objects—you would have a dynamic type system that would give you the range of expression you need.
Even stringent advocates of strong typing such as Uncle Bob Martin, with whom I sparred many a time on that and other subjects in comp.lang.object have now come around to this point of view: yeah, it's nice, maybe, but just not that important, and in fact he has actually reversed his position, as seen in this video of him debating static typing with Chad Fowler.

Truthiness and Safyness

What I find interesting is not so much whether one or the other is right/wronger/better/whatever, but rather the disparity between the vehemence of the rhetoric, at least on one side of the debate ("revoke degrees!", "can't go wrong!") and both the complete lack of empirical evidence for (there is some against) and the lack of magnitude of the effect.

Stephen Colbert coined the term "truthiness" for "a "truth" that a person making an argument or assertion claims to know intuitively 'from the gut' or because it 'feels right' without regard to evidence, logic, intellectual examination, or facts." [Wikipedia]

To me it looks like a similar effect is at play here: as I notice myself, it just feels so much safer if the computer tells you that there are no type errors. Especially if it is quite a bit of effort to get to that state, which it is. As I wrote, I notice that effect myself, despite the fact that I actually know the evidence is not there, and have been a long-time friendly skeptic.

So it looks like static typing is "safy": people just know intuitively that it must be safe, without regard to evidence. And that makes the debate both so heated and so impossible to decide rationally, just like the political debate on "truth" subjects.

Discuss on Hacker News.

22 comments:

  1. "I recently had a Professor of Computer Science state unequivocally that anyone who doesn't use static typing should have their degree revoked."

    Holy cow.

    I guess he should call up Caltech and ask them to revoke Knuth's degree, because TAOCP is (for the most part) not statically typed.

    Maybe we should call up Eindhoven, too, to see if they can revoke Dijkstra's degree posthumously, because he didn't use static typing, either.

    ReplyDelete
  2. I think that most people tend to think in types, anyway. Types are the assumptions you make about your data.

    I expect a string input, or at least something it makes sense to write to a console. I'll give you something you can multiply.

    Just documenting that sort of thing in the interface makes it much easier to understand, because I'm no longer guessing, or deriving the properties of some value from natural language descriptions, which tend to get very long.

    I also think types can lead to abstractions that would be very difficult to use without them, and a powerful type system can lead to new programming styles.

    It's also possible to encode notions of trust into types, so for instance, you can't use an unescaped string where an escaped one is expected. It's just that people use plain strings for far, far too many things. If you don't mean "arbitrary stream of characters", then String is not a good enough type.

    I'm not saying that dynamic languages are unusable. They're not. But the notion of types you're using seems somewhat limited.

    ReplyDelete
  3. The 'Top 25 Software Errors' listed are not the most common errors; just the most 'dangerous'. Many of the bugs relate to security concerns.

    It's possible to have strongly typed programming languages without ridiculous type declarations cluttering things up. Look as Haskell and ML-derived languages such as OCaml and F#. We need to do away with object inheritance. Then our programs can know what kind of thing a type is. Types can then be inferred using Hindley-Milner style algorithms. That's removes much of the clutter from function signatures.

    My preference for strong-typing is about writing bug-free, literate code which can be confidently refactored with editor tools. The last time I emphasised ease of refactoring to a dynamic fanboy I was told that code should be "re-written every 3 years anyway". I've yet to read a defence of Ruby/Python/Clojure, that even admits of the need to keep complex code working for decades, with constands feature updates.

    "how many bugs does static type checking catch?"
    1. The research needs to be redone to treat ADT-style types (Haskell, F#) separately from object type systems.
    2. It would be a huge mistake to assume that catching bugs is the main point behind static typing. Consider ease of refactoring and added expressiveness as equally important.

    ReplyDelete
  4. Heartbleed is actually a prime example of a type error. Here's someone who rewrote the code from the unsafe language C to the safe typed systems language ATS to demonstrate that Heartbleed would have been caught as a type error:

    http://bluishcoder.co.nz/2014/04/11/preventing-heartbleed-bugs-with-safe-languages.html

    ReplyDelete
  5. I'm more and more thinking that combining strong type systems and "shape of data" checks could be a good way of ensuring safety. IE. check that an input is a string, and that this string is between n and m characters with only authorized chars. Same goes for more complex data structures (vectors, maps, lists, enum, etc.). Maybe some of those checks would need to be done at runtime though.

    ReplyDelete
  6. My preference for typed languages has nothing to do with how many bugs remain in the program, it's when I find certain (trivial) bugs.

    I frequently do very silly things that blow in a very obvious way (thankfully). I prefer if the syntax highlighting tells me about that error rather than having to start the program.

    This is primarily a big deal in GUI applications, that often involve a lot of clicking until you even get to the newly written code and where TDD doesn't work all that great. Scripts and web applications work much better with dynamic typing.

    Also note that ALL the injection and XSS bugs listed in that top 25 are type errors. The problem is that "String" is abused as a very universal type, while actually something like "user input" and "website output" should be different types so that they cannot simply be concatenated. But that's strongly versus weakly typed not static versus dynamic. Using "int" and "String" almost always navigates around the safe type system.

    ReplyDelete
  7. This comment has been removed by the author.

    ReplyDelete
  8. This old essay by Steve Yegge is food for thought: Is Weak Typing Strong Enough.

    ReplyDelete
  9. Hi there,
    Few things:

    1) Java is irrelevant to the debate on static type systems.

    Its generics derive from parametric polymorphism in functional languages. Google for this, and parametricity instead:
    http://ttic.uchicago.edu/~dreyer/course/papers/wadler.pdf
    http://dl.dropboxusercontent.com/u/7810909/media/doc/parametricity.pdf

    In a proper type system like Haskell, or ML, the benefits here are enormous; it gives you free theorems which allow you to know a great deal about the functionality without even seeing the implementation. Java's syntactic penalty is non-existent in Haskell.

    2) If you think of types as mini-bug-catchers, you're not really understanding what they do. While they certainly rule out incalculable incorrect programs, they are more valuable as a means to design and reason about software: you can truly _know_ what something does, not guess, or believe, or think. Given that maintenance takes vastly more time than initial creation, this is (IMHO) utterly indispensable.

    3) Computer Science is more concerned with maths and engineering than social studies, although these have their place. There's no reason to confer a mystical Sciencey authority on them above all else; studies vary wildly in their rigour and reproducibility, and software development is a notorious difficult area here, given the huge number of unrepeatable variables. Again, there's a place for it, but they are unlikely to be suitable for broad sweeping conclusions.

    Maths is correct before science gets out of bed; there is provably, mathematically more you can know about software, less that can go wrong, and less you need to fit in your head in statically type-checked code.

    Does this matter in practice? Well, let's have that conversation! My experience says yes, enormously; maybe yours doesn't.

    I'd like it though, if we could at least put aside distractions like Java-as-strawman, oddly restrictive definitions of "type", and bald appeals to authority.

    ReplyDelete
  10. I am not sure you know what a type system is.

    Also you confuse type correct with soundness

    ReplyDelete
  11. One difficulty in checking the impact of static typing on software robustness is that quite often, the same problem is solved quite differently in a statically typed and a dynamically typed language. This is most visible at the extremes: a Haskell programmer is likely to design his types first and write the rest of the code to fit them. At the other extreme, a Python programmer often writes classes made to fit into an existing "duck typing" framework (e.g. Python's informal sequence protocol).

    ReplyDelete
  12. Milner did indeed claim that well-typed programs can't "go wrong", and he was absolutely correct. It's just that he didn't mean what it sounds like. It just means that the operational semantics for the language can't get in a state where no evaluation rule applies to a term.

    I think it's most helpful to consider a modern static type system (like those in ML or Haskell) to be a lightweight formal modelling tool integrated with the compiler. Using it provides a couple of benefits:

    1. Parametric polymorphism protects abstractions; this helps you reason about the program as you are designing the model.

    2. Your code is automatically checked for conformance to the model you designed in the types.

    If you only consider a type system as a tool to find obvious screw-ups, they don't sound quite as valuable as they can be.

    They also don't automatically confer the benefits I mentioned; you have to learn to think of them as a modeling tool and understand how to take advantage of that aspect. I think the studies you quote don't take that factor into account, assuming they're the studies I think you're quoting. They compare statically typed languages and dynamically typed languages as *tools to translate a model to an executable implementation of it* rather than as *tools to develop a correct model* in the first place.

    From the viewpoint of formal modeling and static verification, ML and Haskell are the 'dynamic, lightweight, high power-to-weight ratio' tools of the trade. They're the "dynamic languages" of the modeling world, and they happen to also be great programming languages as well.

    ReplyDelete
  13. @Marcel: "most of the 25 top software errors don't look like type errors to me"

    I disagree: most, if not all, of those component interaction and resource management bugs should be totally prevented by the type system. If your type system isn't sufficiently expressive to describe and enforce the necessary checks and constraints, get yourself a better one. Frankly, it scares me silly the number of professional developers who absolutely believe that stuff like this is a Good Thing:

    int i = 0;

    It's not: it is worse than useless. Look, we can already see `i` is an integer (the `0` is kind of a giveaway). But what is `i` minimum bound; what is its maximum? Are there any numbers in-between those bounds that it should also avoid (e.g. should only even integers be accepted)? If the developer wishes it to have unrestricted range, that's fine, but how then should it behave if it exceeds a machine-/runtime-imposed limitation such as MAXINT? And so on.

    And yet, functional languages, and even some imperative imperative OO languages like Eiffel, have been thinking about, and addressing, these sorts of things for years - and doing it without stupid insipid ints infesting your code like roaches.

    Using a type system that expresses such constraints doesn't necessarily mean that all errors will be caught at compile-time; for example, no compiler can perform a length check on externally supplied data that won't be received until run-time. However, encoding such constraints within the type definition means that the compiler can check as much as is practical at compilation time, and bake the remaining checks into the executable so that they are automatically applied wherever and whenever such checks should be performed at run-time. i.e. The developer is no longer required to include explicit tests at every point in the program code where new data enters the system for the first time, or where existing variables are rebound.

    I suspect lot of problems might disappear amazingly quickly if static-vs-dynamic religionists (especially looking at you, C and Ruby) are sent out of the room for a few hours. For all their big talk and foot-stamping bluster, they rarely seem to possess any genuine understanding of types and type systems, and seem utterly disinterested in educating themselves either. But all their noise makes it terribly hard for the grownups to talk.

    ReplyDelete
  14. @ Anonymous: "I'm more and more thinking that combining strong type systems and "shape of data" checks could be a good way of ensuring safety. IE. check that an input is a string, and that this string is between n and m characters with only authorized chars."

    Absolutely. Bounds checking is precisely the sort of thing a real type system should do. Note that "real type system" here means pay no attention to C. Hell, C doesn't even have a type system: it has a handful of compiler directives for allocating memory on the stack, and delusions of competence for everything else. C++, Java, etc. also come from the same pants-down school of language design; ignore them too.

    In fact, I'd go further: having come up by way of dynamic imperative "scripting" languages, but now in the language design game myself and discovering all the weird and wonderful declarative idioms also available, I now wonder why imperative languages don't run (e.g.) H-M checks as standard. The only difference should be: do they prevent you running a program until all warnings/errors are cleared, or let you freely proceed and just ask if you'd like to know more about any concerns found? Indeed, such checks shouldn't even wait till the full, formal compilation stage: a lot of this feedback really should be reaching the user as they're typing that code, as that's when it's of greatest value to them.

    Even the most open, dynamic languages like Python and Ruby would greatly benefit from the presence of first-class features for applying type and constraint checks at inputs and interfaces. The weakly, dynamically typed language I'm developing provides only two 'basic' data types (strings and lists). Most of its expressive power instead comes from declarative 'type specifiers', which apply coercions and verifications as needed. For example, want to make sure your input value is a list containing exactly four whole/decimal numbers between 0 and 100? Simple:

    list (number (0, 100), 4, 4)

    Which can be named and documented for easy reuse:

    R> define type (CMYK color, list (number (0, 100), 4, 4))

    R> ^^ (100, 50, 5, 0)
    R> CMYK color
    # (100, 50, 5, 0)

    R> ^^ Process Red
    R> CMYK color
    ERROR: The 'cmyk color' type rule can't coerce the following text to cmyk color: "Process Red"

    And used new rules' interface definitions to ensure all input values are valid before they're passed to the implementation:

    define rule (
      product appearance (
        (variety name, non-empty text, "a short description..."),
        (variety color, CMYK color, "another short description...")
      ),
      ... [[ implementation goes here ]],
      ... [[ other documentation and metadata goes here ]]
    )

    So this one signature not only provides run-time checks of input/output values, but also a key piece of auto-generated human-readable documentation, and eventually entry assistance and auto-completion in the editor, and even auto-generated GUI input forms for end users.

    Every time I drop back into Python I find myself increasingly wishing it let me annotate its function signatures in even half this detail. No more ad-hoc 'assert'/'if not...raise' guards. No more docstring-based parameter descriptions useless to the interpreter. There is so much extra value you can extract once your type information is formally encoded as a first-class language structures, fully introspectable and applicable across all stages of authoring and execution. Low-hanging fruit, and then some...

    ReplyDelete
  15. I think nobody can reasonably say that the static typing is safe. As you well noted, static typing is not 100% self-sufficient, that is the pipe-dream of Java developers.

    What I can say is that static typing is safer while code using static types would typically run as fast as or faster than dynamically-typed (is there such a thing?) code. Where possible, of course.

    ReplyDelete
  16. The 2% is terribly misleading. Apples and oranges.

    Open your mind with this: Consider an assignment where you must write a 16 hour project without a compiler or a run time environment. Clone yourself and put them in two separate rooms with plenty of caffeine where one of them develops with a statically typed language in an IDE with a solid type checker and the other one develops with a dynamically typed language in a sleek code editor. At the end of the assignment, the respective programs are examined for bugs. Which one are you OBJECTIVELY more confident about? There's no question that the dynamically typed language is going to allow for typos and things that the statically typed IDE would complain at you while you are coding, etc.

    The reason why type bugs only account for 2% is because they are obvious at RUN TIME. They are identified and fixed before the code is ever released. You have to hunt for them at run time because they are not obvious at code WRITE TIME. So you see, there is a filter bias in the 2% statistic because the loss is actually seen during the development process. Developers need to bang their head against their keyboard many times in order to get those bugs down to 2% in order to be able to release a product that actually works. With great tools for type checking and intellisense that only typed languages enable, these kinds of bugs practically fix themselves while the code is written and there is no head banging to get what becomes an even better result of 0% for this class of bugs.

    In short, the 2% statistic is misleading because it does not represent the pain. It represents what has come after the pain.

    ReplyDelete
  17. @David Pesta:

    Exactly: your scenario is so hypothetical that it's more like apples and slide projectors. The point of dynamic languages is that they are dynamic, that is that the distinction between compile-time and run-time is at most blurry and ideally non-existent.

    So saying you're not allowed to run the program is silly, it's like a static language not being allowed to run the type checker until run-time. Or having to develop your program without the assistance of an IDE, using only PowerPoint.

    The way you develop programs in a dynamic language is that you continuously run the program as you are developing/extending, which gives you much higher bandwidth of feedback than even the best IDE can provide. In theory, having additional feedback from static checks is useful, in practice that tends to be subsumed by the dynamic feedback you get from running it.

    So the 2% is not misleading at all.


    ReplyDelete
  18. Thank you for your response. It's not apples and slide projectors. Hear me out. My example provided an abstraction to conceptually simplify a real phenomenon that is actually much more detailed and tedious to illustrate perfectly. Your feedback did expose a weakness in the communication of the point I was making through that abstraction. Thank you.

    The reality in dynamic programming is, like you say, not where you sit down and write the whole thing out without the aid of rerunning it often to help guide the development process. That is true. In fact, the advantage of this iterative development process is also available for those working with static typed languages (compiled or interpreted). The added advantage of the static language is that you less often need to rerun the program to gain the assurance that types aren't breaking everywhere as you proceed. So, while I agree with you that my abstraction wasn't an entirely accurate example where you've got all the extra debugging to do at the end of the project, it still helped me introduce the point that you've got all this extra overhead distributed throughout the development process. All of my other conclusions still apply.

    You yourself just said that dynamic programming can overcome my illustrated weakness by rerunning the program often for iterative development. But that plays right into my point: it is something extra that you must do more often to make up for the lack of type safety. I am not speaking out of lack of experience with this. I've been working with both static and dynamic languages for years.

    ReplyDelete
  19. @David:

    > is that you less often need to rerun the program

    That is exactly the opposite of the approach you want from a dynamic language. You want to "rerun" the program as often as possible, because that gives you an incredibly high rate + bandwidth + quality of feedback. I've implemented a little program called "CodeDraw", which does "live" programming by rerunning the program on every keystroke. The difference is indescribable, you really have to experience it. Would it be nice to also have static types? Probably, but they don't matter that much, and when it comes to having this experience or static types, it's just no contest.

    ReplyDelete
  20. Hi Marcel. What you said stuck in my mind over the last few months. You invited me to experience something you created called "CodeDraw". Is this something that is available online somewhere? I'm interested in taking a look at it. Many thanks.

    ReplyDelete
  21. Contrary to this post, there is now moderate empirical evidence at a large scale correlating type safety with better code quality. See https://dl.acm.org/doi/pdf/10.1145/3126905

    ReplyDelete
  22. @Caleb: Good find, but it turns out that the purported findings of that study were invalid, as discussed in the OOPSLA/TOPLAS 2019 paper "On the Impact of Programming Languages on Code Quality".

    https://2019.splashcon.org/details/splash-2019-oopsla/75/On-the-Impact-of-Programming-Languages-on-Code-Quality


    "This reanalysis uncovers a number of serious flaws that reduce the number of languages with an association with defects down from 11 to only 4. Moreover, the practical effect size is exceedingly small. These results thus undermine the conclusions of the original study. Correcting the record is important, as many subsequent works have cited the 2014 article and have asserted, without evidence, a causal link between the choice of programming language for a given task and the number of software defects. Causation is not supported by the data at hand; and, in our opinion, even after fixing the methodological flaws we uncovered, too many unaccounted sources of bias remain to hope for a meaningful comparison of bug rates across languages."

    https://dl.acm.org/doi/10.1145/3340571

    ReplyDelete