Wednesday, March 5, 2014

Cargo-cult typing, or: Objective-C's default type is id

In discussing some feedback to a chapter of my upcoming book, I was surprised to get the following code flagged:
-objectAtIndex:(NSUInteger)anIndex
{
   if ( anIndex < [self count] ) {
	  return objects[anIndex];
   }
   return nil;
}

The feedback was, effectively: "This code is incorrect, it is missing a return type". Of course, the code isn't incorrect in the least bit, the return type is id, because that is the default type, and in fact, you will see this style in both Brad Cox's book:

Objc orig
and the early NeXTStep documentation:
Nextstep doku
Having a default type for objects isn't entirely surprising, because at that time id was not just the default type, it was the only type available for objects, the optional static typing for objects wasn't introduced into Objective-C until later. In addition the template for Objective-C's object system was Smalltalk, which doesn't use static types, you just use variable names.

Cargo-cult typing

So while it is possible (and apparently common) to write -(id)objectAtIndex:(NSUInteger)anIndex, it certainly isn't any more correct. In fact, it's worse, because it is just syntactic noise [1][2], although it is arguably even worse than what Fowler describes because it isn't actually mandated by the language, the noise is inflicted needlessly.

And while we could debate as to whether it is better or not to write things that are redundant syntactic noise, we could also not, as that was settled almost 800 years ago: entia non sunt multiplicanda praeter necessitatem. You could also say KISS or "when in doubt, leave it out", all of which just say the the burden of proof is on whoever wants to add the redundant pieces.

What's really odd about this phenomenon is that we really don't gain anything from typing out these explicit types, the code certainly doesn't become more readable. It's as if we think that by following the ritual of explicitly typing out a type, we made the proper sacrifice to the gods of type-safety and they will reward us with correctness. But just like those Pacific islanders that built wooden planes, radios and control towers, the ritual is empty, because it conveys no information to the type system, or the reader.

The id subset

Now, I personally don't really care whether you put in a redundant (id) or not, I certainly have been reading over it (and not even really noticing) for my last two decades of Objective-C coding. However, the mistaken belief that it has to be there, rather than this is a personal choice you make, does worry me.

I think the problem goes a little deeper than just slightly odd coding styles, because it seems to be part and parcel of a drive towards making Objective-C look like an explicitly statically typed language along the lines of C++ or maybe Java, with one of the types being id. That's not the case: Objective-C is an optionally statically typed language. This means that you may specify type information if you want to, but you generally don't have to. I also want the emphasize that you can at best get Objective-C to look like such a language, the holes in the type system are way too big for this to actually gain much safety.

Properties started this trend, and now the ARC variant of the language turns what used to be warnings about unknown selectors needlessly into hard compiler errors. Of course, there are some who plausibly argue that this always should have been an error, or actually, that it always was an error, we just didn't know about it.

That's hogwash, of course. There is a subset of the language, which I'd like to call the id subset, where all the arguments and returns are object pointers, and for this it was always safe to not have additional type information, to the point where the compiler didn't actually have that additional type information. You could also call it the Smalltalk subset.

Another thing that's odd about this move to rigidify Objective-C in the face of success of more dynamic languages is that we actually have been moving into the right direction at the language base-level (disregarding the type-system): in general programming style, with new syntax support for object literals and subscripting, SmallInteger style NSNumbers modern Objective-C consists much more of pure objects than was traditionally the case. And as long as we are dealing with pure objects, we are in the id subset.

A dynamic language

What's great about the id subset is that it makes incremental, explorative programming very easy and lots of fun, much like other dynamic languages such as Smalltalk, Python or Ruby. (Not entirely like them, due to the need to compile to native code, but compilers are fast these days and there are possible fixes such as Objective-Smalltalk.)

The newly enforced rigidity is starting to make explorative programming in Objective-C much harder, and a lot less fun. In fact, it feels much more like C++ or Java and much less like the dynamic language that it is, and in my opinion is the wrong direction: we should be making our language more dynamic, and of course that's what I've been doing. So while I wouldn't agree with that tradeoff even if it were true, the fact is that we aren't actually getting static type safety, we are just getting a wood prop that will not fly.

Discussion on Hacker News.


UPDATE: Inserted a little clarification that I don't care about bike-shedding your code with regard to (id). The problem is that people's mistaken belief both that and why it has to be there is symptomatic of that deeper trend I wrote about.

13 comments:

Anonymous said...

Having a return type conveys that the method returns something. That was easy.

Marcel Weiher said...

@Anonymous: Thanks for your comment, but what you write is actually not true. Not having an explicit return type conveys that the method returns something (an object), whereas the fact that nothing is returned is indicated by the explicit return type "void".

Marcin said...

Could you please elaborate a little more on having instancetype as return type in the light of arguments you made?

Anonymous said...

"the code certainly doesn't become more readable"

I disagree. But then, I've only been programming in Objective-C for 10 or 12 years (not as long as several other languages I know), so I grant that it might get easier with time. :-)

There's a whole continuum of "explicitly declare nothing" to "explicitly declare everything". It's not terribly helpful to plant a flag in the ground and declare that it is the Right Place for that line, for 'readability', for all programmers.

I like to think we have a slightly higher standard for docs today than 'early NeXTStep docs'! But it would also be a straightforward thing to test. You could grab NSArray.h and NSArray.html, and make copies with all the (id)'s removed, and do a survey. You could even ask new/intermediate/experienced Objective-C programmers separately, to find out if the preference changes with time. I would love to see that, especially if it showed that I'm in the minority here.

Marcel Weiher said...

@Marcin:

I haven't really come to a conclusion regarding (instancetype). On the one hand, it actually makes some sense, unlike (id), because it conveys additional information that we actually couldn't reasonably express before. So I kind of like the fact that is there.

In terms of actually helping the user, I am not sure whether the information outweighs the clutter, since it is typically used in highly idiomatic situations where that information is really already known.

I think I should stress that in both cases, instancetype and id, I don't think it makes a huge difference which way you choose. I've certainly been reading over the (id) returns for many years without noticing them much. In fact, I was surprised how far back the convention to put the redundant (id) goes (NeXTStep 3.x, as far as I can tell). Actually adding them, though, is a bit painful though (parsimony).

emp said...

I've tried to make Objective-C as Smalltalk an experience as possible, namely live coding and restarting an app only when the changes to the app state are not easily patched in the IDE.

My favourite setup is to use http://injectionforxcode.com for hot code reloading.
Reloading hot code however breaks the IDE breakpoints.

But for a nice Squeak/Pharo self halt. experience, you can raise(SIGABRT); which will break execution nicely, albeit a stack frame or two away.

Along with a nice IDE like AppCode, I don't miss Smalltalk as much as I did before.

Anonymous said...

"the code certainly doesn't become more readable"

I don't agree. It may be syntactic 'noise', but it serves to make the semantics more clear to the human programmer.

Marcel Weiher said...

@emp: That's very cool, I'll have to check it out.

Tammo Freese said...

I would suggest to make the "return nil" a guard clause:

if (anIndex >= [self count]) {
return nil;
}
return objects[anIndex];

Or if you don't measure line coverage:

if (anIndex >= [self count]) return nil;

return objects[anIndex];

battlmonstr said...

I was inspired by your post, especially the historical part about omitting id type, which is one of the weirdest things in Objective C for a modern developer. I'm linking to this post from an article about the Objective C id data type. Thank you.

Anonymous said...

"the ritual is empty, because it conveys no information to the type system, or the reader"

It does to me: it tells me that you are intentionally returning an object.

Suppose my wife tells me to pick up my son Kyle on the way home. I text her either (a) nothing, (b) "on my way home", (c) "have kid, on my way home", (d) "have Kyle, on my way home". Each of these conveys different information.

Even though the people in my car are exactly the same, in case (a) or (b), maybe she's not 100% sure if it's because everything's right, or I just forgot. In case (c), it's clear that I remembered, and while it's not explicit the identity of the kid in my car, it's pretty unlikely that I have the wrong one.

These cases, of course, correspond to no type, (id), (instancetype), and (MyTypeName).

I know people who, if I did (b) in real life, would say "Of course -- don't bother texting me if everything is fine". I know people who expect (d) in all situations, and otherwise assume the worst. Personally I think (b) or (c) are most reasonable, as they suggests that I'm on top of things, without being overly explicit in every detail.

Marcel Weiher said...

@Anonymous:

Nice analogy, but wrong. In Objective-C, "I return nothing" is not "nothing", but the (void) return type, whereas saying nothing means you return an id. Furthermore, it is hard to confuse the two, because the compiler will yell at you if you claim that you return something and then don't.

Anonymous said...

You link to Occam's Razor as a justification for omitting the return value, but that's not at all what ol' Billy meant by it. He simply meant that, all else being equal, the mechanism of an observation with the fewest implicit assumptions is usually the correct one. It was never intended to suggest that one should prefer to write something with the fewest number of symbols. It never had anything to do with syntax.

You could write your blog post without any articles ("the", "a", etc), and it would use fewer symbols and convey equivalent meaning, but I don't think anyone would claim this would make it easier to read, or that Occam's Razor says you should do this.