Saturday, May 3, 2014

The sp(id)y subset, or Avoiding Copeland 2010 with Objective-C 1984

In my recent post on Cargo Cult Typing, I mentioned a concept I called the id subset. Briefly, it is the subset of Objective-C that deals only with object pointers, or id's. There has been some misunderstanding that I am opposed to types. I am not, but more on that another time.

One of the many nice properties of the (transitive) id subset is that it is dynamically (memory) safe, just like Smalltalk. That is, as long as all arguments and return values of your message are objects, you can never dereference a pointer incorrectly, the worst that can happen is that you get a "Message not understood" that can be caught and handled by the object in question or raised as an exception. The reason this is safe is that objc_msgSend() will make sure that methods will only ever be invoked on objects of the correct class, no matter what the (possibly incorrect, or unavailable) static type says.

So no de-referencing an incorrect pointer, no scribbling over random bits of memory. In fact, this is the vaunted "pointer safety" that John Siracusa says requires ditching native compiled languages like Objective-C for VM based languages. The idea that a VM with an interpreter or a JIT was required for pointer safety was never true, of course, and it's interesting that both Google and Microsoft are turning to Ahead of Time (AOT) compilation in their newest SDKs, for performance reasons.

Did someone mention "performance"? :-)

Another nice aspect of the id subset is that it makes reflective code a lot simpler. And simplicity usually also translates to speed. How much speed? Apple's NSInvocation class has to deal with interpreting C type information at runtime to then construct proper stack frames dynamically for all possible C types. I think it uses libffi, though it may be some equivalent library. This is slow, around 340.1ns per message send on my 13" MBPR. By restricting itself to the id subset, my own MPWFastInvocation class's dispatch is much simpler, just a switch invoking objc_msgSend() with a different number of arguments.

The simplicity of MPWFastInvocation also pays off in speed: 6.2ns per message-send on the same machine. That's 50 times faster than NSInvocation and only 2-3x slower than a normal message send. In fact, once you're that close, things like IMP-caching (4 ns) start to make sense, especially since they can be hidden behind a nice interface. Using a C Macro and the IMP stashed in a public instance var takes the time down to 3 ns, making the reflective call via an object effectively as fast as the non-reflective code emitted by the compiler. Which is nice, because it makes reflective techniques much more feasible for wider varieties of code, which would be a good thing.

The speed improvement is not because MPWFastInvocation is better than NSInvocation, it is decidedly not, it is because it is solving a much, much simpler problem. By sticking to the safe id subset.

On HN.


John Siracusa said...

I don't think I said that a VM is required for memory safety. (Just look at Perl, for example: no VM, not interpreted, memory-safe.) Also, you should read my Copland 2010 Revisited article from (appropriately) 2010. I also did a long, rambling podcast on this topic with Guy English last month.

Marcel Weiher said...

Thanks, I both read the "revisited" article (it's actually one of the links above) and listened to the podcast all good stuff! Heck Guy and I had a little twitter-exchange about him mangling my last name :-)

The, er, point stands that the pointer-safe language you are looking for is already present inside Objective-C, we just need to give it a chance to come out. And I have reason to believe that it isn't even that hard...not necessarily to make such errors impossible, but to make you have to really go out of your way to get them, which seems sufficient to me.

John Siracusa said...

Yes, the "Objective-C without the C" idea. As I think I mentioned on the podcast, that sure appears to be where Apple is going, but it still seems like a bit of an uncomfortable truce between order and chaos to me.

Marcel Weiher said...

I think you're right that Apple's current direction is an uncomfortable truce between order and chaos.

I wouldn't characterize it as "Objective-C without the C", though. In fact, it seems more the direction of Java or C++, keeping the intermingling of objects and structs/pointers that is the source of the problems and then trying to get out of the mess through static analysis and static restrictiveness, but with the default still being raw pointer access that can and will crash.

As examples, see CoreAnimation and more recently SceneKit, which both uses structs fairly randomly in unexpected places where they could easily have used objects instead. And performance really isn't the issue here, these abstractions move sufficient other information that the object overheads shouldn't be significant (as also demonstrated by Quartz use of equally heavy CF-style objects).

Or the "NSInteger" madness, which causes great confusion because programmers that don't have the history don't understand that these are primitives that behave different from objects such as NSNumber (and how could they?)