In a project I am currently working on, the top crash for the last week or so has been the following NSOutlineView delegate method:
The team had been ignoring it, because it just didn't make any sense and they had other
things to do. (Fortunately not too many other crashes, the app is pretty solid at this
point). When they turned to me, I was also initially puzzled, because all this should
do on x86 is stuff a zero into %eax and return. This cannot possibly crash[1],
so everyone just assumed that the stack traces were off, as they frequently are.
- (BOOL)outlineView:(NSOutlineView *)outlineView isGroupItem:(id)item
{
return NO;
}
Fortunately I had just looked at the project settings and noticed that we were compiling
with -O0, so optimizations disabled, and my suspicion was that ARC was doing some
unnecessary retaining. That suspicion turned out to be on the money, otool -Vt
revealed that
ARC had turned our innocuous return NO;
into the following monstrosity:
-[SomeOutlineViewDelegeate outlineView:isGroupItem:]:
00000001001bfdb0 pushq %rbp
00000001001bfdb1 movq %rsp, %rbp
00000001001bfdb4 subq $0x30, %rsp
00000001001bfdb8 leaq -0x18(%rbp), %rax
00000001001bfdbc movq %rdi, -0x8(%rbp)
00000001001bfdc0 movq %rsi, -0x10(%rbp)
00000001001bfdc4 movq $0x0, -0x18(%rbp)
00000001001bfdcc movq %rax, %rdi
00000001001bfdcf movq %rdx, %rsi
00000001001bfdd2 movq %rcx, -0x30(%rbp)
00000001001bfdd6 callq 0x10027dbaa ## symbol stub for: _objc_storeStrong
00000001001bfddb leaq -0x20(%rbp), %rdi
00000001001bfddf movq $0x0, -0x20(%rbp)
00000001001bfde7 movq -0x30(%rbp), %rsi
00000001001bfdeb callq 0x10027dbaa ## symbol stub for: _objc_storeStrong
00000001001bfdf0 leaq -0x20(%rbp), %rdi
00000001001bfdf4 movabsq $0x0, %rsi
00000001001bfdfe movl $0x1, -0x24(%rbp)
00000001001bfe05 callq 0x10027dbaa ## symbol stub for: _objc_storeStrong
00000001001bfe0a movabsq $0x0, %rsi
00000001001bfe14 leaq -0x18(%rbp), %rax
00000001001bfe18 movq %rax, %rdi
00000001001bfe1b callq 0x10027dbaa ## symbol stub for: _objc_storeStrong
00000001001bfe20 movb $0x0, %r8b
00000001001bfe23 movsbl %r8b, %eax
00000001001bfe27 addq $0x30, %rsp
00000001001bfe2b popq %rbp
00000001001bfe2c retq
00000001001bfe2d nopl (%rax)
Yikes! Of course, this is how ARC works: it generates an insane amount of retains and
releases (hidden inside objc_storeStrong()
), then relies on a special optimization pass to remove the insanity and
leave behind the necessary retains/releases. Turn on the "standard" optimization
-Os
and we get the following, much more reasonable result:
-[WLTaskListsDataSource outlineView:isGroupItem:]:
00000001000e958a pushq %rbp
00000001000e958b movq %rsp, %rbp
00000001000e958e xorl %eax, %eax
00000001000e9590 popq %rbp
00000001000e9591 retq
Much better!It isn't clear why those retains/releases were crashing, all the objects involved looked OK in the debugger, but at least we will no longer be puzzled by code that can't possibly crash...crashing, and therefore have a better chance of actually debugging it.
Another issue is performance. I just benchmarked the following equivalent program:
#import
@interface Hi:NSObject {}
-(BOOL)doSomething:arg1 with:arg2;
@end
@implementation Hi
-(BOOL)doSomething:arg1 with:arg2
{
return NO;
}
@end
int main( int argc, char *argv[] )
{
Hi *hi=[Hi new];
for (int i=0;i < 100000000; i++ ) {
[hi doSomething:hi with:hi];
}
return 0;
}
On my 13" MBPR, it runs in roughly 0.5 seconds with ARC disabled and in 13 seconds with
ARC enabled. That's 26 time slower, meaning we now have a highly non-obvious performance
model, where performance is extremely hard to predict and control. The simple and obvious
performance model was one of the main reasons Objective-C code tended to actually be
quite fast if even minimal effort was expended on performance, despite the fact that
some parts of Objective-C aren't all that fast.
I find the approach of handing off all control and responsibility to the optimizer
writers worrying. My worries stem partly from the fact that I've never actually
had that work in the past. With ARC it also happens that
the optimizer can't figure out a retain/release isn't needed, so you need to
sprinkle
a few __unsafe_unretain
s throughout your code (not many, but you need to
figure out which).
Good optimization has always been something that
needed a human touch (with automatic assistance), the message "just trust
the compiler" doesn't resonate with me. Especially since, and this is
the other part I am worried about, compiler optimizations have been
getting crazier and crazier, clang
for example thinks
there is nothing wrong with producing two different values for
de-referencing the same pointer (at the same time, with no
stores in-between (source: http://blog.regehr.org/archives/767):
#include
#include
int main() {
int *p = (int*)malloc(sizeof(int));
int *q = (int*)realloc(p, sizeof(int));
*p = 1;
*q = 2;
if (p == q)
printf("%d %d\n", *p, *q);
}
I tested this with clang-600.0.34.4
on my machine and it also gives
this non-sensical result: 1 2
. There are more examples, which I also wrote about
in my post cc -Osmartass.
Of course, Swift moves further in this direction, with expensive default semantics
and reliance on the compiler to remove the resulting glaring inefficiencies.
In what I've seen reported and tested myself, this approach results in differences between normal
builds and -Ofast
-optimized builds of more than a factor of 100. That's
not close to being OK, and it makes code much harder to understand and optimize. My
guess is that we will be looking at assembly a lot more when optimizing Swift than
we ever did in Objective-C, and then scratching our heads as to why the optimizer
didn't manage to optimize that particular piece of code.
I fondly remember the "Java optimization" WWDC sessions back when we were supposed to rewrite all our code in that particular new hotness. In essence, we were given a model of the capabilities of HotSpot's JIT optimizer, so in order to optimize code we had to know what the resulting generated code would be, what the optimizer could handle (not a lot), and then translate that information back into the original source code. At that point, it's simpler to just write yourself the assembly that you are trying to goad the JIT into emitting for you. Or portable Macro Assembler. Or object-oriented portable Macro Assembler.
Well it could if the stack had previously reached its limit
Discuss here or on HN
I would think, that dereferencing "p" after the call to "realloc" (in particular, storing something into that location without first making sure, that p == q) invokes Undefined Behaviour (TM), since "realloc" could just as well have moved the memory, invalidating the originally allocated address. So, technically, clang would have the right to make the snippet do, whatever I feels like...
ReplyDeleteThat is the justification, yes. Doesn't change the fact that it is silly.
ReplyDeleteHow is it silly? If you write something that invokes undefined behaviour, the compiler is free to do anything up to and including make demons fly out of your nose. You can't write broken code and complain that the compiler didn't agree with your invented semantics of the chosen language. Well, you can, but you won't get much sympathy.
ReplyDeleteBest to assume that realloc always *re*-allocs. If the standard explicitly declares that realloc does not realloc sometimes (it declares no such thing: http://man.cx/realloc), then you could do what you did, otherwise, the previous pointer points to freed memory, and what you did was summon arbitrary demons.
ReplyDeleteOtherwise, I liked the post. Automatic optimization is always a bit hairy and very hard to lean on. For all the noise that gets made about optimization and smart compilers, there's really something to be said for a transparent execution/performance model.
Have you seen this?: http://prog21.dadgum.com/40.html
I don't get your point from the benchmark program. 13 seconds is definitely got from a Debug build. Please use Release configuration for any benchmark claims.
ReplyDelete