Thursday, June 26, 2014

How to Swiftly Destroy a $370 Million Dollar Rocket with Overflow "Protection"

Apple's new Swift programming language has been heavily promoted as being a safer alternative to Objective-C, with a much stronger emphasis on static typing, for example. While I am dubious about the additional safety of static typing, I argue that it produces far more safyness than actual safety, this post is going to look at a different feature: overflow protection.

Overflow protection means that when an arithmetic operation on an integer exceeds the maximum value for that integer type, the value doesn't wrap around as it does on most CPU ALUs, and by extension C. Instead the program signals an exception and since Swift has no exception handling the program crashes.

While this looks a little like the James Bond anti theft device in For Your Eyes Only, which just blows up the car, the justification is that the program should be protected from operating on values that have become bogus. While I understand the reasoning, I am dubious that it really is safer to have every arithmetic operation on integers and every conversion from higher precision to lower in the entire program become a potential crash site, when before those operations could never crash (except for division by zero).

While it would be interesting to see what evidence there is for this argument, I can give at least one very prominent example against it. On June 4th 1996, ESA's brand new Ariane 5 rocket blew up during launch, due to a software fault, with a total loss of US $370 million, apparently one of the most expensive software faults in history. What was that software fault? An overflow protection exception triggered by a floating point to (short) integer conversion.

The resulting core-dump/diagnostics were then interpreted by the next program in line as valid data, causing effectively random steering inputs that caused the rocket to break up (and self destruct when it detected it was breaking up).

What's interesting is that almost any other handling of the overflow apart from raising an exception would have been OK and saved the mission and $370 million. Silently truncating/clamping the value to the maximum permissible range (which some in the static typing community incorrectly claim was the problem) would have worked perfectly and was the actual solution used for other values.

Even wraparound might have worked, at least there would have been only one bogus transition after which values would have been mostly monotonic again. Certainly better than effectively random values.

Ideally, the integer would have just overflowed into a higher precision as in a dynamic language such as Smalltalk, or even Postscript. Even JavaScript's somewhat wonky idea that all numbers are floats, but some just don't know it yet would have been better in this particular case. Considering the limitations of the hardware those languages weren't options, but nowadays the required computational horsepower is there.

In Ada you at least could potentially trap the exception generated by overflow, but in Swift the only protection is to manually trace back the inputs of every arithmetic operation on integers and enforce ranges for all possible combinations of inputs that do not result in that operation overflowing. For any program with external inputs and even slightly complex data paths and arithmetic, I would venture to say that that is next to impossible.

The only viable method for avoiding arithmetic overflow is to not use integer arithmetic with any external input, ever. Hello JavaScript!

You can try the Ada code with GNAT, or online:

with Ada.Text_IO,Ada.Integer_Text_IO;
use Ada.Text_IO,Ada.Integer_Text_IO;
procedure Hello is
  b : FLOAT;
  a : INTEGER;
end Hello;
You can watch your Swift playground crash using the following code:

var a = 2
var b:Int16
for i in 1..100 {
Note that neither the Ada nor Swift compilers have static checks that detect the overflow, even when all the information is statically available, for example in the following Swift code:

var a:UInt8
a = 254
a += 2
What's even worse is that the -Ofast flag will remove the checks, the integer will just wrap around. Optimization flags in general should not change visible program behavior, except for performance. Or maybe this is good, since it looks like we need that flag to get decent performance at all, we also remove the overflow crashers...

Discuss here or on Hacker News.


Levi Brown said...

What Radar number did you file so we can dupe it?

Marcel Weiher said...

Excellent point: 17472835

Anonymous said...

On the space shuttle FCOS would trap all arithmetical overflows and underflows and "correct" to prevent crashes, e.g. a divide by zero would be corrected by setting the result to the largest magnitude of the variable type with the correct sign, log the error, and continue execution. Even with that we had to protect against such errors by explicit checks:
IF single_float_divisor < 10E-8
single_float_divisor = 10E-8
proceed with division