Monday, April 14, 2014

cc -Osmartass

I have to admit I am a bit startled to see pople seriously (?) advocate exploitation of "undefined behavior" in the C standard to just eliminate that code altogether, arguing that undefined means literally anything is OK. I've certainly seen it justified many times. Apart from being awful, this idea smacks of hubris on part of the compiler writers.

The job of the compiler is to do the best job it can at turning the programmer's intent into executable machine code, as expressed by the program. It is not to show how clever the optimizer writer is, how good at lawyering the language standard, or to wring out a 0.1% performance improvement on <benchmark-of-choice>, at least not when it conflicts with the primary goal.

For let's not pretend that these optimizations are actually useful or significant: Proebsting's law shows that all compiler optimizations have been at best 1/10th as effective at improving performance as hardware advances, and recent research suggests that even that may be optimistic.

That doesn't mean that I don't like my factor 2 or 3 improvement in code performance for codes where basic optimizations apply. But almost all of those performance gains come at the lowest levels of optimization, the more sophisticated stuff just doesn't bring much if any additional benefit. (There's a reason Apple recommends -Os and not -O3 as default). So don't get ahead of yourselves, other non-compiler optimizations can often achieve 2-3 orders of magnitude improvement, and for a lot of Objective-C code, for example, the compiler's optimizations barely register at all. Again: perspective!

Furthermore, the purpose of "undefined behavior" was (not sure it still is) to be inclusive, so for example compilers for machines with slightly odd architectures could still be called ANSI-C without having to do unnatural things on that architecture in order to conform to over-specification. Sometimes, undefined behavior is needed for programs to work.

So when there is integer overflow, for example, that's not a license to silently perform dead code elimination at certain optimization levels, it's license to do the natural thing on the platform, which on most platforms these days is let the integer overflow, because that is what a C programmer is likely to expect. In addition, feel free to emit a warning. The same goes for optimizing away an out of bounds array access that is intended to terminate a loop. If you are smart enough to figure out the out-of-bounds access, warn about it and then proceed to emit the code. Eliminating the check and turning a terminating loop into an infinite loop is never the right answer.

So please don't do this, you're not producing value: those optimizations will cease to "help" when programmers "fix" their code. You are also not producing value: any additional gains are extremely modest compared to the cost. So please stop doing, certainly stop doing it on purpose, and please carefully evaluate the cost/benefit ratio when introducing optimizations that cause this to happen as a side effect...and then don't. Or do, and label them appropriately.

3 comments:

Unknown said...

This is not right. When the program does undefined behaviour, it is not that the compiler is sabotaging the program on purpose; but rather that this is a case that it is not able or willing to handle. It is like when you throw your smartphone in the water: your smartphone will invoke undefined behaviour and will most likely fail to work properly. But that is not a design choice of the manufacturer with a kill-switch or whatever, this is just outside of the range of proper usages of your smartphone.

Anonymous said...

The linked article was sarcastic, which is clear if you read some later blog posts by that author.

Anonymous said...

Undefined behaviour is one of the defining characteristics of C, and C compiler writers' particular attitude towards optimization one of the reasons behind the language's enduring popularity. If you want Java, or D, or Pascal, or Rust, you know where to find them.