Friday, June 6, 2014

Remove features for greater power, aka: Swift and Objective-C initializers

One of the things I find curious is how Apple's new Swift language rehashes mistakes that were made in other languages. Let's take construction or initializers.

Objective-C/Smalltalk

These are the rules for initializers in Smalltalk and Objective-C:
  1. An "initializer" is a normal method and a normal message send.
  2. There is no second rule.
There's really nothing more to it, the rest follows organically and naturally from this simple fact and various things you like to see happen. For example, is there a rule that you have to send the initial initializer (alloc or new) to the class? No there isn't, it's just a convenient and obvious place to put it since we don't have the instance yet and the class exists and is an obvious place to go to for instances of that class. However, we could just as well ask a different class to create the object for us.

The same goes with calling super. Yes, that's usually a good idea, because usually you want the superclass's behavior, but if you don't want the superclass's behavior, then don't call. Again, this is not a special rule for initializers, it usually follows from what you want to achieve. And sometimes it doesn't, just like with any other method you override: sometimes you call super, sometimes you do not.

The same goes for assigning the return value, doing the self=[super init]; dance. Again, this is not at all required by the language or the frameworks, although apparently it is a common misconception that it is, a misconception that is, IMHO, promoted by careless creation of "best practices" as "immutable rules", something I wrote about earlier when talking about the useless typing out of the id type in method declarations.

However, returning self and using the returned value is a useful convention, because it makes it possible for init methods to return a different object than what they started with (for example a specific subclass or a singleton).

Swift initializers

Apple's new Swift language has taken a page from the C++ and Java playbooks and made initialization a special case. Well, lots of special cases actually. The Swift book has 30 pages on initialization, and they aren't just illustration and explanation, they are dense with rules and special cases. For example:
  1. You can set a default value of a property in the variable definition.
  2. Or you can set the default value in an initializer.
  3. Designated initializers are now a first class language construct.
  4. Parameterized initializers have local and external parameter names, line methods.
  5. Except that the first parameter name is different and so Swift automatically provides and external parameter name for all arguments, which it doesn't with methods.
  6. Constant properties aren't constant in initializers.
  7. Swift creates a default initializer for both classes and structs.
  8. Swift also creates a default member wise initializer, but only for structs.
  9. Initializers can (only) call other initializers, but there are special rules for what is and is not allowed and these rules are different for structs and classes.
  10. Providing specialized initializers removes the automatically-provided default initializers.
  11. Initializers are different from other methods in that they are not inherited, usually.
  12. Except that there are specific circumstances where they are inherited.
  13. Confused yet? There's more!
  14. If your subclass provides no initializers itself, it inherits all the superclass's initializers
  15. If your subclass overrides all the superclass's designated initializers, it inherits all the convenience initializers (that's also a language construct). How does this not break if the superclass adds initializers? I think we've just re-invented the fragile-base-class problem.
  16. Oh, and you can initialize instance variables with the values returned by closures or functions.
Well, that was easy, but that's probably only because I missed a few. Having all these rules means that this new way of initialization is less powerful than the one before it, because all of these rules restrict the power that a general method has.

Particularly, it is not possible to substitute a different value or return nil to indicate failure to initialize, nor is it possible to call other methods (as far as I can tell).

To actually provide these useful features, we need something else:

  1. Use the Factory method pattern to actually do the powerful stuff you need to do ...
  2. ...which gets you back to where we were at the beginning with Objective-C or Smalltalk, namely sending a normal message.
Of course, we are familiar with this because both C++ and Java also have special constructor language features, plagued by the same problems. They are also the source of the Factory method pattern, at least as a separate "pattern". Smalltalk and Objective-C simply made that pattern the default for object creation, in fact Brad Cox called classes "Factory Objects", long long before the GOF patterns book.

So with all due respect to Michael A. Jackson:

First rule of baking programming conventions into the language: Don't do it!
The second rule of baking programming conventions into the language (experts only): Don't do it yet!


p.s.: I have filed a radar, please dup
p.p.s.: HN

Wednesday, May 28, 2014

Why I don't mock

Well, it's impolite, isn't it? But seriously, when I first heard about mock object testing, I was excited, because it certainly sounded like The Right Thing™: message-based, checking relationships instead of state, and the new hip thing.

However, when I looked at actual examples, they looked sophisticated and obscure, the opposite of what I feel unit tests should be: obvious and simple, simplistic to the point of stupidity. I couldn't figure out at a glance what the expected behavior was, what was being tested and what was environment.

So I never used mocks in practice, meaning my opinions could not go beyond being superficial. Fortunately, I was given the task of porting a fairly large Objective-C project to OS X (yes, you read that right: "to OS X" ), and it was heavily mock-tested.

As far as I could tell, most of the vague premonitions I had about mock testing were borne out in that project: obscure mock tests, mock tests that didn't actually test anything except their own expectations and mock tests that were deeply coupled to implementation details.

Again, though, that could just be my misunderstandings, certainly people for whom I have a great deal of respect advocate for mock tests, but I was heartened when I heard in the recent DHH/Fowler/Beck TDD death-matches friendly conversations that neither Kent nor Martin are great fans of mocking, and certainly not of deeply nested mocks.

However, it was DHH's comments that finally made me realize that what really bothered was something more subtle, and much more pervasive. The talk is about "mocking the database", or mocking some other component. While not proof positive, this kind of mocking seems indicative of not letting the tests drive the design towards simplicity, because the design is already set in stone.

As a result, you're going to have constant pain, because the tests will continuously try to drive you towards simplifying your design, which you resist by putting in mocks.

Instead of putting in mocks of presumed components, let the tests tell you what counterparts they want. Then build those counterparts, again in simplest way possible. You will likely discover that a lot of your assumptions about the required environment for your application turn out not to be true.

For example, when building SportStats v2 at the BBC we thought we needed a database for persistence. But we didn't build it in until we needed it, and we didn't mock it out either. We waited until the code told us that we now needed a database.

It never did.

So we discovered that our problem was simpler than we had originally thought, and therefore our architecture could be as well. Mocking eliminates that feedback.

So don't mock. Because it's impolite to not listen to what your code is trying to tell you.

Tuesday, May 27, 2014

Live objects vs. static types for code completion in Objective-Smalltalk

Objective-Smalltalk is now getting into a very nice virtuous cycle of being more useful, therefore being used more and therefore motivating changes to make it even more useful. One of the recent additions was autocomplete, for both the tty-based and the GUI based REPLs.

I modeled the autocomplete after the one in bash and other Unix shells: it will insert partial completions without asking up the point that they become ambiguous. If there is no unambiguous partial completion, it displays the alternatives. So a usual sequence is <TAB> -> something is inserted <TAB> again -> list is displayed, type one character to disambiguate, <TAB> again and so on. I find that I get to my desired result much quicker and with fewer backtracks than with the mechanism Xcode uses.

Fortunately, I was able to wrestle NSTextView's completion mechanism (in ShellView borrowed from the excellent FSCript) to provide these semantics rather than the built in ones.

Another cool thing about the autocomplete is that it is very precise, unlike for example FScript which as far as I can tell just offers all possible symbols. How can this be, when Objective-Smalltalk is (currently) dynamically typed and we all know that good autocomplete requires static types? The reason is simply that there is one thing that's even better than having the static types available: having the actual objects themselves available!

The two REPLs aren't just syntax-aware, they also evaluate the expression as much as needed and possible to figure out what a good completion might be. So instead of having to figure out the type of the object, we can just ask the object what messages it understands. This was very easy to implement, almost comically trivial compared to a full blown static type-system.

So while static types are good for this purpose, live objects are even better! The Self team made a similar discovery when they were working on their optimizing compiler, trying both static type inference and dynamic type feedback. Type feedback was both simpler and performed vastly better and is currently used even for optimizing statically typed languages such as Java.

Finally, autocomplete also works with Polymorphic Identifiers, for example file:./a<TAB> will autocomplete files in the current directory starting with the letter 'a' (and just fi<TAB> will autocomplete to the file: scheme). Completion is scheme-specific, so any schemes you add can provide their own completion logic.

Like all of Objective-Smalltalk, this is still a work in progress: not all syntactic constructs support completions, for example Polymorphic Identifiers don't support complex paths and there is no bracket matching. However, just like Objective-Smalltalk, what is there is quite useful and often already better what else is out there in small areas.

HN

Sunday, May 4, 2014

Satisfying the hunger for type safety?

Tom Adriaenssen riffs on the id subset in show me some id:
Let me explain: even though you might assume that all those objects are actually going to be DataPoint objects, there’s no actual guarantee that they will actual be DataPoint objects at runtime. Casting them only satisfies your hunger for type safety, but nothing else really.
More importantly, it only seems to satisfy your hunger for type safety, it doesn't actually provide any. It's less nutritious than sugar water in that respect, not even calories, never mind the protein, fiber, vitamins and other goodness. More like a pacifier, really, or the product of a cargo cult.

Saturday, May 3, 2014

The sp(id)y subset, or Avoiding Copeland 2010 with Objective-C 1984

In my recent post on Cargo Cult Typing, I mentioned a concept I called the id subset. Briefly, it is the subset of Objective-C that deals only with object pointers, or id's. There has been some misunderstanding that I am opposed to types. I am not, but more on that another time.

One of the many nice properties of the (transitive) id subset is that it is dynamically (memory) safe, just like Smalltalk. That is, as long as all arguments and return values of your message are objects, you can never dereference a pointer incorrectly, the worst that can happen is that you get a "Message not understood" that can be caught and handled by the object in question or raised as an exception. The reason this is safe is that objc_msgSend() will make sure that methods will only ever be invoked on objects of the correct class, no matter what the (possibly incorrect, or unavailable) static type says.

So no de-referencing an incorrect pointer, no scribbling over random bits of memory. In fact, this is the vaunted "pointer safety" that John Siracusa says requires ditching native compiled languages like Objective-C for VM based languages. The idea that a VM with an interpreter or a JIT was required for pointer safety was never true, of course, and it's interesting that both Google and Microsoft are turning to Ahead of Time (AOT) compilation in their newest SDKs, for performance reasons.

Did someone mention "performance"? :-)

Another nice aspect of the id subset is that it makes reflective code a lot simpler. And simplicity usually also translates to speed. How much speed? Apple's NSInvocation class has to deal with interpreting C type information at runtime to then construct proper stack frames dynamically for all possible C types. I think it uses libffi, though it may be some equivalent library. This is slow, around 340.1ns per message send on my 13" MBPR. By restricting itself to the id subset, my own MPWFastInvocation class's dispatch is much simpler, just a switch invoking objc_msgSend() with a different number of arguments.

The simplicity of MPWFastInvocation also pays off in speed: 6.2ns per message-send on the same machine. That's 50 times faster than NSInvocation and only 2-3x slower than a normal message send. In fact, once you're that close, things like IMP-caching (4 ns) start to make sense, especially since they can be hidden behind a nice interface. Using a C Macro and the IMP stashed in a public instance var takes the time down to 3 ns, making the reflective call via an object effectively as fast as the non-reflective code emitted by the compiler. Which is nice, because it makes reflective techniques much more feasible for wider varieties of code, which would be a good thing.

The speed improvement is not because MPWFastInvocation is better than NSInvocation, it is decidedly not, it is because it is solving a much, much simpler problem. By sticking to the safe id subset.

On HN.

Monday, April 14, 2014

cc -Osmartass

I have to admit I am a bit startled to see pople seriously (?) advocate exploitation of "undefined behavior" in the C standard to just eliminate that code altogether, arguing that undefined means literally anything is OK. I've certainly seen it justified many times. Apart from being awful, this idea smacks of hubris on part of the compiler writers.

The job of the compiler is to do the best job it can at turning the programmer's intent into executable machine code, as expressed by the program. It is not to show how clever the optimizer writer is, how good at lawyering the language standard, or to wring out a 0.1% performance improvement on <benchmark-of-choice>, at least not when it conflicts with the primary goal.

For let's not pretend that these optimizations are actually useful or significant: Proebsting's law shows that all compiler optimizations have been at best 1/10th as effective at improving performance as hardware advances, and recent research suggests that even that may be optimistic.

That doesn't mean that I don't like my factor 2 or 3 improvement in code performance for codes where basic optimizations apply. But almost all of those performance gains come at the lowest levels of optimization, the more sophisticated stuff just doesn't bring much if any additional benefit. (There's a reason Apple recommends -Os and not -O3 as default). So don't get ahead of yourselves, other non-compiler optimizations can often achieve 2-3 orders of magnitude improvement, and for a lot of Objective-C code, for example, the compiler's optimizations barely register at all. Again: perspective!

Furthermore, the purpose of "undefined behavior" was (not sure it still is) to be inclusive, so for example compilers for machines with slightly odd architectures could still be called ANSI-C without having to do unnatural things on that architecture in order to conform to over-specification. Sometimes, undefined behavior is needed for programs to work.

So when there is integer overflow, for example, that's not a license to silently perform dead code elimination at certain optimization levels, it's license to do the natural thing on the platform, which on most platforms these days is let the integer overflow, because that is what a C programmer is likely to expect. In addition, feel free to emit a warning. The same goes for optimizing away an out of bounds array access that is intended to terminate a loop. If you are smart enough to figure out the out-of-bounds access, warn about it and then proceed to emit the code. Eliminating the check and turning a terminating loop into an infinite loop is never the right answer.

So please don't do this, you're not producing value: those optimizations will cease to "help" when programmers "fix" their code. You are also not producing value: any additional gains are extremely modest compared to the cost. So please stop doing, certainly stop doing it on purpose, and please carefully evaluate the cost/benefit ratio when introducing optimizations that cause this to happen as a side effect...and then don't. Or do, and label them appropriately.

Saturday, April 12, 2014

Sophisticated Simplicity

This quote from Steve Jobs is one that's been an inspiration to me for some time:
[...] when you first attack a problem it seems really simple because you don't understand it. Then when you start to really understand it, you come up with these very complicated solutions because it's really hairy. Most people stop there. But a few people keep burning the midnight oil and finally understand the underlying principles of the problem and come up with an elegantly simple solution for it. But very few people go the distance to get there.
In other words:
  1. Naive Simplicity
  2. Sophisticated Complexity
  3. Sophisticated Simplicity
It's from the February 1984 Byte Interview introducing the Macintosh.

UPDATE: Well, it seems that Heinelein got there first:

Every technology goes through three stages: first, a crudely simple and quite unsatisfactory gadget; second, an enormously complicated group of gadgets designed to overcome the shortcomings of the original and achieving thereby somewhat satisfactory performance through extremely complex compromise; third, a final stage of smooth simplicity and efficient performance [..]
(From the book Rolling Stones, 1952)