Thursday, September 11, 2014

iPhone 6 Plus and The End of Pixels

It's been a long time coming. NeXTStep in 1989 featured DisplayPostscript, and therefore a device independent imaging model that meant you did not specify graphics in pixels, but rather in physical units. The default was a variant of the printer's point at 1/72nd of an inch, which happened to be close the the typical pixel resolution of displays at the time. However, 1 point never meant 1 pixel, it meant 1/72nd of an inch, and the combination of floating point coordinates and transformation matrices meant you could use pretty much any unit you wanted. When NeXT bought Apple, it brought this imaging model with it, although with some modifications due to Adobe intransigence about licensing and the addition of anti-aliasing.

However, despite the device-independent APIs, we still have pixel-based content, and "pixel-accurate" graphics. This has made less and less sense over time, with retina displays making pixel-accuracy moot (no more screen fonts!) scaled modes making it impossible and both iOS 7 and OS X 10.10 going for a more geometric look. Still, the design community has resisted, talking about @3 pixel art etc.

No more.

The iPhone 6 Plus has a 1920x1080 panel, but the simulator renders at 3x. These two resolutions don't match and so the pixels will need to be downsampled to the display resolution. Whether that is accomplished by downsampling pixel art (which happens automagically with Quartz and the proper device transform set) or as a separate step that downsamples the entire rendered framebuffer doesn't matter (much). Either way, there are no more "pixel perfect" pre-rendered designs.

Device-independent graphics, here we come at last. We're only a quarter century late.

Update: "Its 401 PPI display is the first display I’ve ever used on which, no matter how close I hold it to my eyes, I can’t perceive the pixels. " - John Gruber (emphasis mine)

Wednesday, September 10, 2014

collect is what for does

I recently stumbled on Rob Napier's explanation of the map function in Swift. So I am reading along yadda yadda when suddenly I wake up and my eyes do a double take:
After years of begging for a map function in Cocoa [...]
Huh? I rub my eyes, probably just a slip up, but no, he continues:
In a generic language like Swift, “pattern” means there’s a probably a function hiding in there, so let’s pull out the part that doesn’t change and call it map:
Not sure what he means with a "generic language", but here's how we would implement a map function in Objective-C.
#import <Foundation/Foundation.h>

typedef id (*mappingfun)( id arg );

static id makeurl( NSString *domain ) {
  return [[[NSURL alloc] initWithScheme:@"http" host:domain path:@"/"] autorelease];

NSArray *map( NSArray *array, mappingfun theFun )
  NSMutableArray *result=[NSMutableArray array];
  for ( id object in array ) {
    id objresult=theFun( object );
    if ( objresult ) {
       [result addObject:objresult];
  return result;

int main(int argc, char *argv[]) {
  NSArray *source=@[ @"", @"", @"" ];
  NSLog(@"%@",map(source, makeurl ));

This is less than 7 non-empty lines of code for the mapping function, and took me less than 10 minutes to write in its entirety, including a trip to the kitchen for an extra cookie, recompiling 3 times and looking at the qsort(3) manpage because I just can't remember C function pointer declaration syntax (though it took me less time than usual, maybe I am learning?). So really, years of "begging" for something any mildly competent coder could whip up between bathroom breaks or during a lull in their twitter feed?

Or maybe we want a version with blocks instead? Another 2 minutes, because I am a klutz:

#import <Foundation/Foundation.h>

typedef id (^mappingblock)( id arg );

NSArray *map( NSArray *array, mappingblock theBlock )
  NSMutableArray *result=[NSMutableArray array];
  for ( id object in array ) {
    id objresult=theBlock( object );
    if ( objresult ) {
       [result addObject:objresult];
  return result;

int main(int argc, char *argv[]) {
  NSArray *source=@[ @"", @"", @"" ];
  NSLog(@"%@",map(source, ^id ( id domain ) {
    return [[[NSURL alloc] initWithScheme:@"http" host:domain path:@"/"] autorelease];

Of course, we've also had collect for a good decade or so, which turns the client code into the following, much more readable version (Objective-Smalltalk syntax):
NSURL collect URLWithScheme:'http' host:#('' '') each path:'/'.

As I wrote in my previous post, we seem to be regressing to a mindset about computer languages that harkens back to the days of BASIC, where everything was baked into the language, and things not baked into the language or provided by the language vendor do not exist.

Rob goes on the write "The mapping could be performed in parallel [..]", for example like parcollect? And then "This is the heart of good functional programming." No. This is the heart of good programming.

Having processed that shock, I fly over a discussion of filter (select) and stumble over the next whopper:

It’s all about the types

Again...huh?? Our map implementation certainly didn't need (static) types for the list, and all the Smalltalkers and LISPers that have been gleefully using higher order techniques for 40-50 years without static types must also not have gotten the memo.

We [..] started to think about the power of functions to separate intent from implementation. [..] Soon we’ll explore some more of these transforming functions and see what they can do for us. Until then, stop mutating. Evolve.
All modern programming separates intent from implementation. Functions are a fairly limited and primitive way of doing so. Limiting power in this fashion can be useful, but please don't confuse the power of higher order programming with the limitations of functional programming, they are quite distinct.

Tuesday, September 9, 2014

No Virginia, Swift is not 10x faster than Objective-C

About a month ago, Jesse Squires published a post titled Apples to Apples, documenting benchmark results that he claims show Swift now with a roughly 10x performance advantage over Objective-C. Although completely bogus, the post was retweeted by Chris Lattner (who should know better, and was supposedly mostly interested in highlighting the improvements in the Swift optimizer, rather than the bogus comparison) and has now been referenced a number of times as background knowledge as to the state of Swift. More importantly, though the actual mistake Jesse makes is pretty basic and not that interesting, it does point to some deeper misunderstandings about performance and language that I at least do find interesting.

So what's the mistake? Ironically, given the post's title, is that he is comparing apples to oranges, so to speak. The following table, which shows the time to sort an array of 10000 numbers 10 times in millisecond, illustrates the problem:

NSNumbernative integer
Jesse compared the two versions highlighted, so native Swift integers with Objective-C NSNumber object wrappers. All times are for binaries with optimization enabled, the machine was a 13" MBR with 2.9 GHz Intel Core i7 and 8GB of RAM. The integer sort was done using a C integer array and the system qsort() function. When you compare apples to apples, Objective-C has a roughly 2x edge with NSNumbers and is around 18% slower for native integers, at least when using qsort()

Why the 18% disadvantage? The qsort() function is made generically applicable to different types of arrays using a function pointer parameter for the comparison function that itself is parametrized using pointers to the elements to be compared. This means there is a per-comparison overhead of one function call and two pointer dereferences per comparison. That overhead overwhelms the actual comparison operation, which is a single machine instruction on most processors.

Swift, on the other hand, appears to produce a version of the sort function that is specialized to the integer type, with the comparison function inlined to the generated function so there is no function call or pointer dereference overhead. That is very clever and a Good Thing™ for performance. Sort of. The drawback is that this breaks separate compilation, because the functions actually have to be combined during the compile/link process every time it is used (I assume there is caching going on so we only got one per type combination).

Apart from making the compiler/linker slower , possibly significantly so (like C++ headers, though I presume they use LLVM bitcode to optimize the process), it also likely bloats the executable, causing cache and memory pressure. So it's a tradeoff, as usual, and while I think having the ability to specialize at compile-time is good, not being able to control it is not.

Objective-C doesn't have this ability to automagically adapt a function or method to parameters, if you want inlining the relationship has to be known at definition not at point of use. However, if the benefit of inlining is only 21% for the most primitive type, a machine integer, then it is clear that that the set of types for which compile-time specialization is beneficial at all is small.

Cocoa of course already provides specialized collection classes for the byte and unichar types, NSData and NSString respectively. I never quite understood why this wasn't extended to the other primitive types, particularly integer and float/double. On the other hand, the omission never bothered me much, I just implemented those classes myself in MPWFoundation. MPWRealArray even has support for DisplayPostscript binary object sequences, it's that old!

Both MPWRealArray and the corresponding MPWIntArray classes are small and fairly trivial to implement, and once I have them, using a specialized integer or real array is at least as convenient as using an NSArray, just a lot faster. They could also be quite a bit smaller than they are, sharing code either via subclassing or poor-man's generic programming via include files. Once I have a nice OO interface, I can swap out the implementation for something really quick like a dual-pivot integer sort I found in Java-land and adapted to C. (It is surprising just how similar they are at that level). With that sort, the test time drops to 0.56 ms, so 42% faster than the Swift version and almost twice as fast as the system qsort() function.

So the takeaway is that if you are using NSNumber objects in performance-sensitive code: stop. This is always a mistake. The native number types for Objective-C are int, float, double and friends, not NSNumber. After all, how do you perform arithmetic? Either directly on a primitive or by unboxing the NSNumber and then performing arithmetic on the primitive and then reboxing. Use primitive scalar types as much as possible where they make sense.

A second takeaway is that the question "which language is faster" doesn't really make sense, a more relevant and interesting question is "does this language make it hard/possible/easy to write fast code". Objective-C lets you write really fast code, if you want to, because it has the low-level chops and an understandable performance model. Swift so far can achieve reasonable performance at times, ludicrously bad at other times (especially with the optimizer turned off, which hardly fazes Objective-C), with as far as I can tell fairly little predictability or control. Having 10% faster (or slower) performance for code I don't particularly care about is not worth nearly as much as the knowledge that I can get the 1-5% of code that I do care about in shape no matter what. Swift is definitely not there yet, and given the direction it is taking I am not sure whether it will allow that kind of control, at least in comprehensible ways.

A third point is something more general about language. The whole argument that NSNumber and NSArray are "built in" somehow and int is not displays a lack of understanding of Objective-C that to me seems staggering. Even more so, the whole idea that you must only use what comes provided with Cocoa and are not allowed to build your own flies in the face of modern language design, throwing us back to the times of BASIC (Cathy Doser, in the comments):

I had added graphics primitives to Dartmouth Basic around 1976 and developed an X-Y pen-plotter to carry out graphics commands mixed in with the text being sent to Teletype terminals.
The idea is that is that a language is a bundle of features, or to put it linguistically, a language is a list of words to be used as is.

Both C and Pascal introduced me to a new notion: that languages are not lists of words, but means of constructing your own words. For example, C did/does not have I/O as a language feature. I/O was just another set of functions placed in a library that you included just like any of your own functions/libraries. And there were two sets of them, the stdio package and the raw Unix I/O.

At around the same time I was introduced to both top-down and bottom-up programming. Both assume there is a recursive de-composition of the problem at hand (assuming the problem sufficiently complex to warrant it).

In bottom-up programming, you build up the vocabulary (the procedures and functions) that are necessary to succinctly describe your top-level problem, and then you describe your program in terms of that vocabulary you created. In top-down programming, you start at the other end and write your top-level program in terms of the vocabulary you wish you had to optimally describe the problem. Then you fill in the blanks.

In both, you define your own language to fit the problem, then you solve the problem using the language you defined. You would not add plotting commands to the language, you would either add plotting commands as a library or, if that were not possible, a way of adding plotting commands as a library. You would not look at whether plotting comes with the "standard library" or not. To quote Guy Steele in Growing a Language:

This is the nub of what I want to say. A language design can no longer be a thing. It must be a pattern—a pattern for growth—a pattern for growing the pattern for defining the patterns that programmers can use for their real work and their main goal.
So build your own libraries, your own abstractions. It's easy, fun and useful. It's the heart of Domain Driven Design, probably the most productive and effective software construction technique we as an industry have come up with to date. See what abstractions you can build easily and which ones are hard. Analyze the latter and you have started on the road to modern language design.

Saturday, August 30, 2014

So how are those special Swift initializers working out?

If you're building a UIView subclass that needs to set up a mess of subviews this can get old really quick. Best option I've found so far? Just initialize them with a default value like you would a regular variable. Now the compiler's off your back and and you can move on with your life, or at least what's left of it after choosing software development as a career.
Justin Driscoll

This is something people who create elaborate mechanisms to force people to "Do the Right Thing" never seem to understand: they hardly ever achieve what they are trying to achieve. Instead, people will do the minimal amount of work to get the compiler off their backs. Compare Java's checked exceptions.

Friday, July 11, 2014


I just took my car to its biennial TüV inspection and apart from the tires that had simply worn out everything was A-OK, nothing wrong at all. Kind of surprising for a 7 year old mechanical device that has been used: daily commute from Mountain View to Oakland, tight cornering in the foothills, shipped across the Atlantic twice and now that it is back in its native country, occasional and sometimes prolonged sprints at 200 km/h. All that with not all that much maintenance, because the owner is not exactly a car nut.

Cars used to not be nearly this reliable, and getting there wasn't easy, it took the industry both plenty of time and a lot of effort. It's not that the engineers didn't know how to build reliable cars, but making them reliable and keeping them affordable and still allowing car companies to turn a profit, that was hard.

One particular component is the alternator belt, which had to be changed so frequently that engine compartments were specially designed to make the belt easily accessible. That's no longer the case, and the characteristic screeching sound of a worn belt is one that I haven't heard in a long time.

My late dad, who was in the business, told me how it went down, at least at Volkswagen. As other problems had been whittled away over the decades, alternator belts were becoming a real issue on the reliability reports compiled by motoring magazines, and the engineers were tasked with the job of fixing the problem. And fix it they did: they came up with a design that would "never" break or wear out, and no I don't know the details of how that was supposed to work.

Problem was: it was a tad expensive. Much more expensive than the existing solution and simply too expensive for the price bracket they were aiming for (this may seem odd to outsiders considering the total cost of a car, but pennies matter). Which of course was one reason why they had put up with unreliable belts for so long. Then word came in that the Japanese had solved the problem as well, and were offering it on their cheap(er) models. Next auto-show, they went to the both of one of those Japanese companies and popped the hood.

The engineers scoffed: the design the Japanese was cheaper because it was much, much more primitive than the one they had come up with, and it would, in fact, also wear out much more quickly. But exactly how much more quickly would it wear out? In other words, what was the expected lifetime of this cheaper, inferior alternator belt design?

About the expected lifetime of the car.

Ahh. As far as I can tell, the Japanese design or variants thereof conquered the world. I can't recall the last time I heard the screech of a worn out belt, engine compartments these days are not designed with accessibility in mind and cars are still affordable, although changing the belt if it does break will cost more in labor because of the less accessible placement.

What do alternator belts have to do with software development? Probably nothing, but to me at least, the situation reminds me of the one I write about in The Safyness of Static Typing. I am actually with those commenters who scoffed at the idea that the safety benefit of static typing is only around 2%, because theoretically having a tight specification of possible values checked at compile-time absolutely should bring a greater benefit.

For example, when static typing and protocols were introduced to Objective-C, I absolutely expected them to catch my errors, so I was quite surprised when it turned out that in practice they didn't: because I could actually compile/run/test my code without having to specify static types, by the time I added static types the code simply no longer had type errors, because the vast majority of those were caught by running it. The dynamic safety also helped, because instead of a random crash, I got a nice clean error message "object abc doesn't understand message xyz".

My suspicion is that although dynamic typing and the practices that go with it may only be, let's say, 50% as good at catching type errors as a good static type system, they are actually 98% effective at catching real world type errors. So if static type systems are twice as good, they would be 196% effective at catching real world type errors, which just like the perfect, german-engineered alternator belts, is simply more than is actually needed (96% more with my hypothetical numbers).

There are obviously other factors at play, but I think this may account for a good part of the perceived discrepancy.

What do you think? Comments welcome here or on Hacker News.

Saturday, June 28, 2014

Compiler Writers Gone Wild: ARC Madness

In this week's episode of CWGW: This can't possibly crash, yet crash it does.

In a project I am currently working on, the top crash for the last week or so has been the following NSOutlineView delegate method:

- (BOOL)outlineView:(NSOutlineView *)outlineView isGroupItem:(id)item
    return NO;
The team had been ignoring it, because it just didn't make any sense and they had other things to do. (Fortunately not too many other crashes, the app is pretty solid at this point). When they turned to me, I was also initially puzzled, because all this should do on x86 is stuff a zero into %eax and return. This cannot possibly crash[1], so everyone just assumed that the stack traces were off, as they frequently are.

Fortunately I had just looked at the project settings and noticed that we were compiling with -O0, so optimizations disabled, and my suspicion was that ARC was doing some unnecessary retaining. That suspicion turned out to be on the money, otool -Vt revealed that ARC had turned our innocuous return NO; into the following monstrosity:

-[SomeOutlineViewDelegeate outlineView:isGroupItem:]:
00000001001bfdb0        pushq   %rbp
00000001001bfdb1        movq    %rsp, %rbp
00000001001bfdb4        subq    $0x30, %rsp
00000001001bfdb8        leaq    -0x18(%rbp), %rax
00000001001bfdbc        movq    %rdi, -0x8(%rbp)
00000001001bfdc0        movq    %rsi, -0x10(%rbp)
00000001001bfdc4        movq    $0x0, -0x18(%rbp)
00000001001bfdcc        movq    %rax, %rdi
00000001001bfdcf        movq    %rdx, %rsi
00000001001bfdd2        movq    %rcx, -0x30(%rbp)
00000001001bfdd6        callq   0x10027dbaa             ## symbol stub for: _objc_storeStrong
00000001001bfddb        leaq    -0x20(%rbp), %rdi
00000001001bfddf        movq    $0x0, -0x20(%rbp)
00000001001bfde7        movq    -0x30(%rbp), %rsi
00000001001bfdeb        callq   0x10027dbaa             ## symbol stub for: _objc_storeStrong
00000001001bfdf0        leaq    -0x20(%rbp), %rdi
00000001001bfdf4        movabsq $0x0, %rsi
00000001001bfdfe        movl    $0x1, -0x24(%rbp)
00000001001bfe05        callq   0x10027dbaa             ## symbol stub for: _objc_storeStrong
00000001001bfe0a        movabsq $0x0, %rsi
00000001001bfe14        leaq    -0x18(%rbp), %rax
00000001001bfe18        movq    %rax, %rdi
00000001001bfe1b        callq   0x10027dbaa             ## symbol stub for: _objc_storeStrong
00000001001bfe20        movb    $0x0, %r8b
00000001001bfe23        movsbl  %r8b, %eax
00000001001bfe27        addq    $0x30, %rsp
00000001001bfe2b        popq    %rbp
00000001001bfe2c        retq
00000001001bfe2d        nopl    (%rax)
Yikes! Of course, this is how ARC works: it generates an insane amount of retains and releases (hidden inside objc_storeStrong()), then relies on a special optimization pass to remove the insanity and leave behind the necessary retains/releases. Turn on the "standard" optimization -Os and we get the following, much more reasonable result:
-[WLTaskListsDataSource outlineView:isGroupItem:]:
00000001000e958a        pushq   %rbp
00000001000e958b        movq    %rsp, %rbp
00000001000e958e        xorl    %eax, %eax
00000001000e9590        popq    %rbp
00000001000e9591        retq
Much better!

It isn't clear why those retains/releases were crashing, all the objects involved looked OK in the debugger, but at least we will no longer be puzzled by code that can't possibly crash...crashing, and therefore have a better chance of actually debugging it.

Another issue is performance. I just benchmarked the following equivalent program:


@interface Hi:NSObject {}
-(BOOL)doSomething:arg1 with:arg2;

@implementation Hi
-(BOOL)doSomething:arg1 with:arg2
  return NO;

int main( int argc, char *argv[] ) 
  Hi *hi=[Hi new];
  for (int i=0;i < 100000000; i++ ) {
    [hi doSomething:hi with:hi];
  return 0;
On my 13" MBPR, it runs in roughly 0.5 seconds with ARC disabled and in 13 seconds with ARC enabled. That's 26 time slower, meaning we now have a highly non-obvious performance model, where performance is extremely hard to predict and control. The simple and obvious performance model was one of the main reasons Objective-C code tended to actually be quite fast if even minimal effort was expended on performance, despite the fact that some parts of Objective-C aren't all that fast.

I find the approach of handing off all control and responsibility to the optimizer writers worrying. My worries stem partly from the fact that I've never actually had that work in the past. With ARC it also happens that the optimizer can't figure out a retain/release isn't needed, so you need to sprinkle a few __unsafe_unretains throughout your code (not many, but you need to figure out which).

Good optimization has always been something that needed a human touch (with automatic assistance), the message "just trust the compiler" doesn't resonate with me. Especially since, and this is the other part I am worried about, compiler optimizations have been getting crazier and crazier, clang for example thinks there is nothing wrong with producing two different values for de-referencing the same pointer (at the same time, with no stores in-between (source:

int main() {
  int *p = (int*)malloc(sizeof(int));
  int *q = (int*)realloc(p, sizeof(int));
  *p = 1;
  *q = 2;
  if (p == q)
    printf("%d %d\n", *p, *q);
I tested this with clang-600.0.34.4 on my machine and it also gives this non-sensical result: 1 2. There are more examples, which I also wrote about in my post cc -Osmartass. Of course, Swift moves further in this direction, with expensive default semantics and reliance on the compiler to remove the resulting glaring inefficiencies.

In what I've seen reported and tested myself, this approach results in differences between normal builds and -Ofast-optimized builds of more than a factor of 100. That's not close to being OK, and it makes code much harder to understand and optimize. My guess is that we will be looking at assembly a lot more when optimizing Swift than we ever did in Objective-C, and then scratching our heads as to why the optimizer didn't manage to optimize that particular piece of code.

I fondly remember the "Java optimization" WWDC sessions back when we were supposed to rewrite all our code in that particular new hotness. In essence, we were given a model of the capabilities of HotSpot's JIT optimizer, so in order to optimize code we had to know what the resulting generated code would be, what the optimizer could handle (not a lot), and then translate that information back into the original source code. At that point, it's simpler to just write yourself the assembly that you are trying to goad the JIT into emitting for you. Or portable Macro Assembler. Or object-oriented portable Macro Assembler.

Well it could if the stack had previously reached its limit
Discuss here or on HN

Thursday, June 26, 2014

How to Swiftly Destroy a $370 Million Dollar Rocket with Overflow "Protection"

Apple's new Swift programming language has been heavily promoted as being a safer alternative to Objective-C, with a much stronger emphasis on static typing, for example. While I am dubious about the additional safety of static typing, I argue that it produces far more safyness than actual safety, this post is going to look at a different feature: overflow protection.

Overflow protection means that when an arithmetic operation on an integer exceeds the maximum value for that integer type, the value doesn't wrap around as it does on most CPU ALUs, and by extension C. Instead the program signals an exception and since Swift has no exception handling the program crashes.

While this looks a little like the James Bond anti theft device in For Your Eyes Only, which just blows up the car, the justification is that the program should be protected from operating on values that have become bogus. While I understand the reasoning, I am dubious that it really is safer to have every arithmetic operation on integers and every conversion from higher precision to lower in the entire program become a potential crash site, when before those operations could never crash (except for division by zero).

While it would be interesting to see what evidence there is for this argument, I can give at least one very prominent example against it. On June 4th 1996, ESA's brand new Ariane 5 rocket blew up during launch, due to a software fault, with a total loss of US $370 million, apparently one of the most expensive software faults in history. What was that software fault? An overflow protection exception triggered by a floating point to (short) integer conversion.

The resulting core-dump/diagnostics were then interpreted by the next program in line as valid data, causing effectively random steering inputs that caused the rocket to break up (and self destruct when it detected it was breaking up).

What's interesting is that almost any other handling of the overflow apart from raising an exception would have been OK and saved the mission and $370 million. Silently truncating/clamping the value to the maximum permissible range (which some in the static typing community incorrectly claim was the problem) would have worked perfectly and was the actual solution used for other values.

Even wraparound might have worked, at least there would have been only one bogus transition after which values would have been mostly monotonic again. Certainly better than effectively random values.

Ideally, the integer would have just overflowed into a higher precision as in a dynamic language such as Smalltalk, or even Postscript. Even JavaScript's somewhat wonky idea that all numbers are floats, but some just don't know it yet would have been better in this particular case. Considering the limitations of the hardware those languages weren't options, but nowadays the required computational horsepower is there.

In Ada you at least could potentially trap the exception generated by overflow, but in Swift the only protection is to manually trace back the inputs of every arithmetic operation on integers and enforce ranges for all possible combinations of inputs that do not result in that operation overflowing. For any program with external inputs and even slightly complex data paths and arithmetic, I would venture to say that that is next to impossible.

The only viable method for avoiding arithmetic overflow is to not use integer arithmetic with any external input, ever. Hello JavaScript!

You can try the Ada code with GNAT, or online:

with Ada.Text_IO,Ada.Integer_Text_IO;
use Ada.Text_IO,Ada.Integer_Text_IO;
procedure Hello is
  b : FLOAT;
  a : INTEGER;
end Hello;
You can watch your Swift playground crash using the following code:

var a = 2
var b:Int16
for i in 1..100 {
Note that neither the Ada nor Swift compilers have static checks that detect the overflow, even when all the information is statically available, for example in the following Swift code:

var a:UInt8
a = 254
a += 2
What's even worse is that the -Ofast flag will remove the checks, the integer will just wrap around. Optimization flags in general should not change visible program behavior, except for performance. Or maybe this is good, since it looks like we need that flag to get decent performance at all, we also remove the overflow crashers...

Discuss here or on Hacker News.