Wednesday, March 18, 2015

Why overload operators?

One of the many things that's been puzzling me for a long time is why operator overloading appears to be at the same time problematic and attractive in languages such as C++ and now Swift. I know I certainly feel the same way, it's somehow very cool to massage the language that way, but at the same time the thought of having everything redefined underneath me fills me with horror, and what little I've seen and heard of C++ with heavy overloading confirms that horror, except for very limited domains. What's really puzzling is that binary messages in Smalltalk, which are effectively the same feature (special characters like *,+ etc. can be used as message names taking a single argument), do not seem to not have either of these effects: they are neither particularly attractive to Smalltalk programmers, nor are their effects particularly worrisome. Odd.

Of course we simply don't have that problem in C or Objective-C: operators are built-in parts of the language, and neither the C part nor the Objective part has a comparable facility, which is a large part of the reason we don't have a useful number/magnitude hierarchy in Objective-C and numeric/array libraries are't that popular: writing [number1 multipliedBy:number2] is just too painful.

Some recent articles and talks that dealt with operator overloading in Apple's new Swift language just heightened my confusion. But as is often the case, that heightened confusion seems to have been the last bit of resistance that pushed through an insight.

Anyway, here is an example from NSHipster Matt Thompson's excellent post on Swift Operators, an operator for exponentiation wrapping the pow() function:

func ** (left: Double, right: Double) -> Double {
    return pow(left, right)
}
This is introduced as "the arithmetic operator found in many programming languages, but missing in Swift [is **]". Here is an example of the difference:
pow( left, right )
left ** right
pow( 2, 3 )
2 ** 3
How come this is seen as an improvement (and to me it does)? There are two candidates for what the difference might be: the fact that the operation is now written in infix notation and that it's using special characters. Do these two factors contribute evenly or is one more important than the other. Let's look at the same example in Smalltalk syntax, first with a normal keyword message and then with a binary message (Smalltalk uses raisedTo:, but let's stick with pow: here to make the comparison similar):
left pow: right.
left ** right.
2 pow: 3.
2 ** 3.
To my eyes at least, the binary-message version is no improvement over the keyword message, in fact it seems somewhat worse to me. So the attractiveness of infix notation appears to be a strong candidate for why operator overloading is desirable. Of course, having to use operator overloading to get infix notation is problematic, because special characters generally do not convey the meaning of the operation nearly as well as names, conventional arithmetic aside.

Note that dot notation for message sends/method calls does not really seem to have the same effect, even though it could technically also be considered an infix notation:

left.pow( right)
left ** right
2.pow( 3 )
2 ** 3
There is more anecdotal evidence. In Chris Eidhof's talk on functional swift, scrub to around the 10 minute mark. There you'll find the following code with some nested and curried function calls:
let result = colorOverlay( overlayColor)(blur(blurRadius)(image))
"This does not look to nice [..] it gets a bit unreadable, it's hard to see what's going on" is the quote.
let result = colorOverlay( overlayColor)(blur(blurRadius)(image))
Having a special compose function doesn't actually make it better
let myFilter = composeFilters(blur(blurRadius),colorOverlay(overlayColor))
let result = myFilter(image)
Infix to the rescue! Using the |>operator:
let myFilter = blur(blurRadius) |> colorOverlay(overlayColor)
let result = myFilter(image)
Chris is very fair-minded about this, he mentions that due to the special characters involved, you can't really infer what |> means from looking at the code, you have to know, and having many of these sorts of operators makes code effectively incomprehensible. Or as one twitter use put it: Like most things in engineering, it's a trade-off, though my guess is the trade-off would shift if we had infix without requiring non-sensical characters.

Built in
I do believe that there is another factor involved, one that is more psychologically subtle having to do with the idea of language as a (pre-defined) thing vs. a mechanism for building your own abstractions that I mentioned in my previous post on Swift performance.

In that post, I mentioned BASIC as the primary example of the former, a language as a collection of built-in features, with C and Pascal as (early) examples of the latter, languages as generic mechanisms for building your own features. However, those latter languages don't treat all constructs equally. Specifically, all the operators are built-in, not user-definable over -overridable. They also correspond closely to those operations that are built into the underlying hardware and map to single instructions in assembly language. In short: even in languages with a strong "user-defined" component, there is a hard line between "user-defined" and "built-in", and that line just happens to map almost 1:1 to the operator/function boundary.

Hackers don't like boundaries. Or rather: they love boundaries, the overcoming of. I'd say that overloaded operators are particularly attractive (to hacker mentalities, but that's probably most of us) in languages where this boundary between user-defined and built-in stuff exists, and therefore those overloaded operators let you cross that boundary and do things normally reserved for language implementors.

If you think this idea is too crazy, listen to John Siracusa, Guy English and Rene Ritchie discussing Swift language features and operator overloading on Debug Podcast Number 49, Siracusa Round 2, starting at 45:45. I've transcribed a bit below, but I really recommend you listen to the podcast, it's very good:

  • 45:45 Swift is a damning comment on C++ [benefits without the craziness]
  • 46:06 You can't do what Swift did [putting basic types in the standard library] without operator overloading. [That's actually not true, because in Swift the operators are just syntax -> but it is exactly the idea I talked about earlier]
  • 47:50 If you're going to add something like regular expressions to the language ... they should have some operators of their own. That's a perfect opportunity for operator overloading
  • 48:07 If you're going to add features to the language, like regular expressions or so [..] there is well-established syntax for this from other languages.
  • 48:40 ...or range operators. Lots of languages have range operators these days. Really it's just a function call with two different operands. [..] You're not trying to be clever All you're trying to do is make it natural to use features that exist in many of other languages. The thing about Swift is you don't have to add syntax to the language to do it. Because it's so malleable. If you're not adding a feature, like I'm adding regular expressions to the language. If you're not doing that, don't try to get clever. Consider the features as existing for the benefit of the expansion of the language, so that future features look natural in it and not bolted on even though technically everything is in a library. Don't think of it as in my end user code I'm going to come up with symbols that combine my types in novel ways, because what are you even doing there?
  • 50:17 if you have a language like this, you need new syntax and new behavior to make it feel natural. [new behavior strings array] and it has the whole struct thing. The basics of the language, the most basic things you can do, have to be different, look different and behave different for a modern language.
  • 51:52 "using operator overloading to add features to the language" [again, not actually true]
The interesting thing about this idea of a boundary between "language things" and "user things" is that it does not align with the "operators" and "named operators" in Swift, but apparently it still feels like it does, so we "extending the language" is seen as roughly equivalent to "adding some operators", with all the sound caveats that apply.

In fact, going back to Matt Thompson's article from above, it is kind of odd that he talks about exponentiation operator as missing from the language, when if fact the operation is available in the language. So if the operation crosses the boundary from function to operator, then and only then does it become part of the language.

In Smalltalk, on the other hand, the boundary has disappeared from view. It still exists in the form of primitives, but those are well hidden all over the class hierarchy and not something that is visible to the developer. So in addition to having infix notation available for named operations, Smalltalk doesn't have the notion of something being "part of the language" rather than "just the library" just because it uses non-sensical characters. Everything is part of the library, the library is the language and you can use names or special characters as appropriate, not because of asymmetries in the language.

And that's why operator overloading is a a thing even in languages like Swift, whereas it is a non-event in Smalltalk.

Thursday, September 11, 2014

iPhone 6 Plus and The End of Pixels

It's been a long time coming. NeXTStep in 1989 featured DisplayPostscript, and therefore a device independent imaging model that meant you did not specify graphics in pixels, but rather in physical units. The default was a variant of the printer's point at 1/72nd of an inch, which happened to be close the the typical pixel resolution of displays at the time. However, 1 point never meant 1 pixel, it meant 1/72nd of an inch, and the combination of floating point coordinates and transformation matrices meant you could use pretty much any unit you wanted. When NeXT bought Apple, it brought this imaging model with it, although with some modifications due to Adobe intransigence about licensing and the addition of anti-aliasing.

However, despite the device-independent APIs, we still have pixel-based content, and "pixel-accurate" graphics. This has made less and less sense over time, with retina displays making pixel-accuracy moot (no more screen fonts!) scaled modes making it impossible and both iOS 7 and OS X 10.10 going for a more geometric look. Still, the design community has resisted, talking about @3 pixel art etc.

No more.

The iPhone 6 Plus has a 1920x1080 panel, but the simulator renders at 3x. These two resolutions don't match and so the pixels will need to be downsampled to the display resolution. Whether that is accomplished by downsampling pixel art (which happens automagically with Quartz and the proper device transform set) or as a separate step that downsamples the entire rendered framebuffer doesn't matter (much). Either way, there are no more "pixel perfect" pre-rendered designs.

Device-independent graphics, here we come at last. We're only a quarter century late.

Update: "Its 401 PPI display is the first display I’ve ever used on which, no matter how close I hold it to my eyes, I can’t perceive the pixels. " - John Gruber (emphasis mine)

Wednesday, September 10, 2014

collect is what for does

I recently stumbled on Rob Napier's explanation of the map function in Swift. So I am reading along yadda yadda when suddenly I wake up and my eyes do a double take:
After years of begging for a map function in Cocoa [...]
Huh? I rub my eyes, probably just a slip up, but no, he continues:
In a generic language like Swift, “pattern” means there’s a probably a function hiding in there, so let’s pull out the part that doesn’t change and call it map:
Not sure what he means with a "generic language", but here's how we would implement a map function in Objective-C.
#import <Foundation/Foundation.h>

typedef id (*mappingfun)( id arg );

static id makeurl( NSString *domain ) {
  return [[[NSURL alloc] initWithScheme:@"http" host:domain path:@"/"] autorelease];
}

NSArray *map( NSArray *array, mappingfun theFun )
{
  NSMutableArray *result=[NSMutableArray array];
  for ( id object in array ) {
    id objresult=theFun( object );
    if ( objresult ) {
       [result addObject:objresult];
    }
  }
  return result;
}

int main(int argc, char *argv[]) {
  NSArray *source=@[ @"apple.com", @"objective.st", @"metaobject.com" ];
  NSLog(@"%@",map(source, makeurl ));
}

This is less than 7 non-empty lines of code for the mapping function, and took me less than 10 minutes to write in its entirety, including a trip to the kitchen for an extra cookie, recompiling 3 times and looking at the qsort(3) manpage because I just can't remember C function pointer declaration syntax (though it took me less time than usual, maybe I am learning?). So really, years of "begging" for something any mildly competent coder could whip up between bathroom breaks or during a lull in their twitter feed?

Or maybe we want a version with blocks instead? Another 2 minutes, because I am a klutz:


#import <Foundation/Foundation.h>

typedef id (^mappingblock)( id arg );

NSArray *map( NSArray *array, mappingblock theBlock )
{
  NSMutableArray *result=[NSMutableArray array];
  for ( id object in array ) {
    id objresult=theBlock( object );
    if ( objresult ) {
       [result addObject:objresult];
    }
  }
  return result;
}

int main(int argc, char *argv[]) {
  NSArray *source=@[ @"apple.com", @"objective.st", @"metaobject.com" ];
  NSLog(@"%@",map(source, ^id ( id domain ) {
    return [[[NSURL alloc] initWithScheme:@"http" host:domain path:@"/"] autorelease];
        }));
}

Of course, we've also had collect for a good decade or so, which turns the client code into the following, much more readable version (Objective-Smalltalk syntax):
NSURL collect URLWithScheme:'http' host:#('objective.st' 'metaobject.com') each path:'/'.

As I wrote in my previous post, we seem to be regressing to a mindset about computer languages that harkens back to the days of BASIC, where everything was baked into the language, and things not baked into the language or provided by the language vendor do not exist.

Rob goes on the write "The mapping could be performed in parallel [..]", for example like parcollect? And then "This is the heart of good functional programming." No. This is the heart of good programming.

Having processed that shock, I fly over a discussion of filter (select) and stumble over the next whopper:

It’s all about the types

Again...huh?? Our map implementation certainly didn't need (static) types for the list, and all the Smalltalkers and LISPers that have been gleefully using higher order techniques for 40-50 years without static types must also not have gotten the memo.

We [..] started to think about the power of functions to separate intent from implementation. [..] Soon we’ll explore some more of these transforming functions and see what they can do for us. Until then, stop mutating. Evolve.
All modern programming separates intent from implementation. Functions are a fairly limited and primitive way of doing so. Limiting power in this fashion can be useful, but please don't confuse the power of higher order programming with the limitations of functional programming, they are quite distinct.

Tuesday, September 9, 2014

No Virginia, Swift is not 10x faster than Objective-C

About a month ago, Jesse Squires published a post titled Apples to Apples, documenting benchmark results that he claims show Swift now with a roughly 10x performance advantage over Objective-C. Although completely bogus, the post was retweeted by Chris Lattner (who should know better, and was supposedly mostly interested in highlighting the improvements in the Swift optimizer, rather than the bogus comparison) and has now been referenced a number of times as background knowledge as to the state of Swift. More importantly, though the actual mistake Jesse makes is pretty basic and not that interesting, it does point to some deeper misunderstandings about performance and language that I at least do find interesting.

So what's the mistake? Ironically, given the post's title, is that he is comparing apples to oranges, so to speak. The following table, which shows the time to sort an array of 10000 numbers 10 times in millisecond, illustrates the problem:

NSNumbernative integer
Objective-C6.040.97
Swift11.920.8
Jesse compared the two versions highlighted, so native Swift integers with Objective-C NSNumber object wrappers. All times are for binaries with optimization enabled, the machine was a 13" MBR with 2.9 GHz Intel Core i7 and 8GB of RAM. The integer sort was done using a C integer array and the system qsort() function. When you compare apples to apples, Objective-C has a roughly 2x edge with NSNumbers and is around 18% slower for native integers, at least when using qsort()

Why the 18% disadvantage? The qsort() function is made generically applicable to different types of arrays using a function pointer parameter for the comparison function that itself is parametrized using pointers to the elements to be compared. This means there is a per-comparison overhead of one function call and two pointer dereferences per comparison. That overhead overwhelms the actual comparison operation, which is a single machine instruction on most processors.

Swift, on the other hand, appears to produce a version of the sort function that is specialized to the integer type, with the comparison function inlined to the generated function so there is no function call or pointer dereference overhead. That is very clever and a Good Thing™ for performance. Sort of. The drawback is that this breaks separate compilation, because the functions actually have to be combined during the compile/link process every time it is used (I assume there is caching going on so we only got one per type combination).

Apart from making the compiler/linker slower , possibly significantly so (like C++ headers, though I presume they use LLVM bitcode to optimize the process), it also likely bloats the executable, causing cache and memory pressure. So it's a tradeoff, as usual, and while I think having the ability to specialize at compile-time is good, not being able to control it is not.

Objective-C doesn't have this ability to automagically adapt a function or method to parameters, if you want inlining the relationship has to be known at definition not at point of use. However, if the benefit of inlining is only 21% for the most primitive type, a machine integer, then it is clear that that the set of types for which compile-time specialization is beneficial at all is small.

Cocoa of course already provides specialized collection classes for the byte and unichar types, NSData and NSString respectively. I never quite understood why this wasn't extended to the other primitive types, particularly integer and float/double. On the other hand, the omission never bothered me much, I just implemented those classes myself in MPWFoundation. MPWRealArray even has support for DisplayPostscript binary object sequences, it's that old!

Both MPWRealArray and the corresponding MPWIntArray classes are small and fairly trivial to implement, and once I have them, using a specialized integer or real array is at least as convenient as using an NSArray, just a lot faster. They could also be quite a bit smaller than they are, sharing code either via subclassing or poor-man's generic programming via include files. Once I have a nice OO interface, I can swap out the implementation for something really quick like a dual-pivot integer sort I found in Java-land and adapted to C. (It is surprising just how similar they are at that level). With that sort, the test time drops to 0.56 ms, so 42% faster than the Swift version and almost twice as fast as the system qsort() function.

So the takeaway is that if you are using NSNumber objects in performance-sensitive code: stop. This is always a mistake. The native number types for Objective-C are int, float, double and friends, not NSNumber. After all, how do you perform arithmetic? Either directly on a primitive or by unboxing the NSNumber and then performing arithmetic on the primitive and then reboxing. Use primitive scalar types as much as possible where they make sense.

A second takeaway is that the question "which language is faster" doesn't really make sense, a more relevant and interesting question is "does this language make it hard/possible/easy to write fast code". Objective-C lets you write really fast code, if you want to, because it has the low-level chops and an understandable performance model. Swift so far can achieve reasonable performance at times, ludicrously bad at other times (especially with the optimizer turned off, which hardly fazes Objective-C), with as far as I can tell fairly little predictability or control. Having 10% faster (or slower) performance for code I don't particularly care about is not worth nearly as much as the knowledge that I can get the 1-5% of code that I do care about in shape no matter what. Swift is definitely not there yet, and given the direction it is taking I am not sure whether it will allow that kind of control, at least in comprehensible ways.

A third point is something more general about language. The whole argument that NSNumber and NSArray are "built in" somehow and int is not displays a lack of understanding of Objective-C that to me seems staggering. Even more so, the whole idea that you must only use what comes provided with Cocoa and are not allowed to build your own flies in the face of modern language design, throwing us back to the times of BASIC (Cathy Doser, in the comments):

I had added graphics primitives to Dartmouth Basic around 1976 and developed an X-Y pen-plotter to carry out graphics commands mixed in with the text being sent to Teletype terminals.
The idea is that is that a language is a bundle of features, or to put it linguistically, a language is a list of words to be used as is.

Both C and Pascal introduced me to a new notion: that languages are not lists of words, but means of constructing your own words. For example, C did/does not have I/O as a language feature. I/O was just another set of functions placed in a library that you included just like any of your own functions/libraries. And there were two sets of them, the stdio package and the raw Unix I/O.

At around the same time I was introduced to both top-down and bottom-up programming. Both assume there is a recursive de-composition of the problem at hand (assuming the problem sufficiently complex to warrant it).

In bottom-up programming, you build up the vocabulary (the procedures and functions) that are necessary to succinctly describe your top-level problem, and then you describe your program in terms of that vocabulary you created. In top-down programming, you start at the other end and write your top-level program in terms of the vocabulary you wish you had to optimally describe the problem. Then you fill in the blanks.

In both, you define your own language to fit the problem, then you solve the problem using the language you defined. You would not add plotting commands to the language, you would either add plotting commands as a library or, if that were not possible, a way of adding plotting commands as a library. You would not look at whether plotting comes with the "standard library" or not. To quote Guy Steele in Growing a Language:

This is the nub of what I want to say. A language design can no longer be a thing. It must be a pattern—a pattern for growth—a pattern for growing the pattern for defining the patterns that programmers can use for their real work and their main goal.
So build your own libraries, your own abstractions. It's easy, fun and useful. It's the heart of Domain Driven Design, probably the most productive and effective software construction technique we as an industry have come up with to date. See what abstractions you can build easily and which ones are hard. Analyze the latter and you have started on the road to modern language design.

Saturday, August 30, 2014

So how are those special Swift initializers working out?

Splendidly:
If you're building a UIView subclass that needs to set up a mess of subviews this can get old really quick. Best option I've found so far? Just initialize them with a default value like you would a regular variable. Now the compiler's off your back and and you can move on with your life, or at least what's left of it after choosing software development as a career.
Justin Driscoll

This is something people who create elaborate mechanisms to force people to "Do the Right Thing" never seem to understand: they hardly ever achieve what they are trying to achieve. Instead, people will do the minimal amount of work to get the compiler off their backs. Compare Java's checked exceptions.

Friday, July 11, 2014

Overspeccing

I just took my car to its biennial TüV inspection and apart from the tires that had simply worn out everything was A-OK, nothing wrong at all. Kind of surprising for a 7 year old mechanical device that has been used: daily commute from Mountain View to Oakland, tight cornering in the foothills, shipped across the Atlantic twice and now that it is back in its native country, occasional and sometimes prolonged sprints at 200 km/h. All that with not all that much maintenance, because the owner is not exactly a car nut.

Cars used to not be nearly this reliable, and getting there wasn't easy, it took the industry both plenty of time and a lot of effort. It's not that the engineers didn't know how to build reliable cars, but making them reliable and keeping them affordable and still allowing car companies to turn a profit, that was hard.

One particular component is the alternator belt, which had to be changed so frequently that engine compartments were specially designed to make the belt easily accessible. That's no longer the case, and the characteristic screeching sound of a worn belt is one that I haven't heard in a long time.

My late dad, who was in the business, told me how it went down, at least at Volkswagen. As other problems had been whittled away over the decades, alternator belts were becoming a real issue on the reliability reports compiled by motoring magazines, and the engineers were tasked with the job of fixing the problem. And fix it they did: they came up with a design that would "never" break or wear out, and no I don't know the details of how that was supposed to work.

Problem was: it was a tad expensive. Much more expensive than the existing solution and simply too expensive for the price bracket they were aiming for (this may seem odd to outsiders considering the total cost of a car, but pennies matter). Which of course was one reason why they had put up with unreliable belts for so long. Then word came in that the Japanese had solved the problem as well, and were offering it on their cheap(er) models. Next auto-show, they went to the both of one of those Japanese companies and popped the hood.

The engineers scoffed: the design the Japanese was cheaper because it was much, much more primitive than the one they had come up with, and it would, in fact, also wear out much more quickly. But exactly how much more quickly would it wear out? In other words, what was the expected lifetime of this cheaper, inferior alternator belt design?

About the expected lifetime of the car.

Ahh. As far as I can tell, the Japanese design or variants thereof conquered the world. I can't recall the last time I heard the screech of a worn out belt, engine compartments these days are not designed with accessibility in mind and cars are still affordable, although changing the belt if it does break will cost more in labor because of the less accessible placement.

What do alternator belts have to do with software development? Probably nothing, but to me at least, the situation reminds me of the one I write about in The Safyness of Static Typing. I am actually with those commenters who scoffed at the idea that the safety benefit of static typing is only around 2%, because theoretically having a tight specification of possible values checked at compile-time absolutely should bring a greater benefit.

For example, when static typing and protocols were introduced to Objective-C, I absolutely expected them to catch my errors, so I was quite surprised when it turned out that in practice they didn't: because I could actually compile/run/test my code without having to specify static types, by the time I added static types the code simply no longer had type errors, because the vast majority of those were caught by running it. The dynamic safety also helped, because instead of a random crash, I got a nice clean error message "object abc doesn't understand message xyz".

My suspicion is that although dynamic typing and the practices that go with it may only be, let's say, 50% as good at catching type errors as a good static type system, they are actually 98% effective at catching real world type errors. So if static type systems are twice as good, they would be 196% effective at catching real world type errors, which just like the perfect, german-engineered alternator belts, is simply more than is actually needed (96% more with my hypothetical numbers).

There are obviously other factors at play, but I think this may account for a good part of the perceived discrepancy.

What do you think? Comments welcome here or on Hacker News.

Saturday, June 28, 2014

Compiler Writers Gone Wild: ARC Madness

In this week's episode of CWGW: This can't possibly crash, yet crash it does.

In a project I am currently working on, the top crash for the last week or so has been the following NSOutlineView delegate method:

- (BOOL)outlineView:(NSOutlineView *)outlineView isGroupItem:(id)item
{
    return NO;
}
The team had been ignoring it, because it just didn't make any sense and they had other things to do. (Fortunately not too many other crashes, the app is pretty solid at this point). When they turned to me, I was also initially puzzled, because all this should do on x86 is stuff a zero into %eax and return. This cannot possibly crash[1], so everyone just assumed that the stack traces were off, as they frequently are.

Fortunately I had just looked at the project settings and noticed that we were compiling with -O0, so optimizations disabled, and my suspicion was that ARC was doing some unnecessary retaining. That suspicion turned out to be on the money, otool -Vt revealed that ARC had turned our innocuous return NO; into the following monstrosity:

-[SomeOutlineViewDelegeate outlineView:isGroupItem:]:
00000001001bfdb0        pushq   %rbp
00000001001bfdb1        movq    %rsp, %rbp
00000001001bfdb4        subq    $0x30, %rsp
00000001001bfdb8        leaq    -0x18(%rbp), %rax
00000001001bfdbc        movq    %rdi, -0x8(%rbp)
00000001001bfdc0        movq    %rsi, -0x10(%rbp)
00000001001bfdc4        movq    $0x0, -0x18(%rbp)
00000001001bfdcc        movq    %rax, %rdi
00000001001bfdcf        movq    %rdx, %rsi
00000001001bfdd2        movq    %rcx, -0x30(%rbp)
00000001001bfdd6        callq   0x10027dbaa             ## symbol stub for: _objc_storeStrong
00000001001bfddb        leaq    -0x20(%rbp), %rdi
00000001001bfddf        movq    $0x0, -0x20(%rbp)
00000001001bfde7        movq    -0x30(%rbp), %rsi
00000001001bfdeb        callq   0x10027dbaa             ## symbol stub for: _objc_storeStrong
00000001001bfdf0        leaq    -0x20(%rbp), %rdi
00000001001bfdf4        movabsq $0x0, %rsi
00000001001bfdfe        movl    $0x1, -0x24(%rbp)
00000001001bfe05        callq   0x10027dbaa             ## symbol stub for: _objc_storeStrong
00000001001bfe0a        movabsq $0x0, %rsi
00000001001bfe14        leaq    -0x18(%rbp), %rax
00000001001bfe18        movq    %rax, %rdi
00000001001bfe1b        callq   0x10027dbaa             ## symbol stub for: _objc_storeStrong
00000001001bfe20        movb    $0x0, %r8b
00000001001bfe23        movsbl  %r8b, %eax
00000001001bfe27        addq    $0x30, %rsp
00000001001bfe2b        popq    %rbp
00000001001bfe2c        retq
00000001001bfe2d        nopl    (%rax)
Yikes! Of course, this is how ARC works: it generates an insane amount of retains and releases (hidden inside objc_storeStrong()), then relies on a special optimization pass to remove the insanity and leave behind the necessary retains/releases. Turn on the "standard" optimization -Os and we get the following, much more reasonable result:
-[WLTaskListsDataSource outlineView:isGroupItem:]:
00000001000e958a        pushq   %rbp
00000001000e958b        movq    %rsp, %rbp
00000001000e958e        xorl    %eax, %eax
00000001000e9590        popq    %rbp
00000001000e9591        retq
Much better!

It isn't clear why those retains/releases were crashing, all the objects involved looked OK in the debugger, but at least we will no longer be puzzled by code that can't possibly crash...crashing, and therefore have a better chance of actually debugging it.

Another issue is performance. I just benchmarked the following equivalent program:

#import 

@interface Hi:NSObject {}
-(BOOL)doSomething:arg1 with:arg2;
@end

@implementation Hi
-(BOOL)doSomething:arg1 with:arg2
{
  return NO;
}
@end

int main( int argc, char *argv[] ) 
{
  Hi *hi=[Hi new];
  for (int i=0;i < 100000000; i++ ) {
    [hi doSomething:hi with:hi];
  } 
  return 0;
}
On my 13" MBPR, it runs in roughly 0.5 seconds with ARC disabled and in 13 seconds with ARC enabled. That's 26 time slower, meaning we now have a highly non-obvious performance model, where performance is extremely hard to predict and control. The simple and obvious performance model was one of the main reasons Objective-C code tended to actually be quite fast if even minimal effort was expended on performance, despite the fact that some parts of Objective-C aren't all that fast.

I find the approach of handing off all control and responsibility to the optimizer writers worrying. My worries stem partly from the fact that I've never actually had that work in the past. With ARC it also happens that the optimizer can't figure out a retain/release isn't needed, so you need to sprinkle a few __unsafe_unretains throughout your code (not many, but you need to figure out which).

Good optimization has always been something that needed a human touch (with automatic assistance), the message "just trust the compiler" doesn't resonate with me. Especially since, and this is the other part I am worried about, compiler optimizations have been getting crazier and crazier, clang for example thinks there is nothing wrong with producing two different values for de-referencing the same pointer (at the same time, with no stores in-between (source: http://blog.regehr.org/archives/767):

#include 
#include 
 
int main() {
  int *p = (int*)malloc(sizeof(int));
  int *q = (int*)realloc(p, sizeof(int));
  *p = 1;
  *q = 2;
  if (p == q)
    printf("%d %d\n", *p, *q);
}
I tested this with clang-600.0.34.4 on my machine and it also gives this non-sensical result: 1 2. There are more examples, which I also wrote about in my post cc -Osmartass. Of course, Swift moves further in this direction, with expensive default semantics and reliance on the compiler to remove the resulting glaring inefficiencies.

In what I've seen reported and tested myself, this approach results in differences between normal builds and -Ofast-optimized builds of more than a factor of 100. That's not close to being OK, and it makes code much harder to understand and optimize. My guess is that we will be looking at assembly a lot more when optimizing Swift than we ever did in Objective-C, and then scratching our heads as to why the optimizer didn't manage to optimize that particular piece of code.

I fondly remember the "Java optimization" WWDC sessions back when we were supposed to rewrite all our code in that particular new hotness. In essence, we were given a model of the capabilities of HotSpot's JIT optimizer, so in order to optimize code we had to know what the resulting generated code would be, what the optimizer could handle (not a lot), and then translate that information back into the original source code. At that point, it's simpler to just write yourself the assembly that you are trying to goad the JIT into emitting for you. Or portable Macro Assembler. Or object-oriented portable Macro Assembler.

Well it could if the stack had previously reached its limit
Discuss here or on HN