I have to admit I am a bit startled to see pople seriously (?) advocate exploitation of "undefined behavior" in the C standard to just eliminate that code altogether, arguing that
undefined means literally anything is OK. I've certainly seen it justified
many times. Apart from being awful, this idea smacks of hubris on part of the compiler writers.
The job of the compiler is to do the best job it can at turning the
programmer's intent into executable machine code, as expressed by
the program. It is not to
show how clever the optimizer writer is, how good at lawyering the language
standard, or to wring out a 0.1% performance
improvement on <benchmark-of-choice>, at least not when it
conflicts with the primary goal.
For let's not pretend that these optimizations are actually useful
or significant: Proebsting's law shows that all compiler optimizations
have been at best 1/10th as effective at improving performance as hardware
advances, and recent research suggests that even that may be optimistic.
That doesn't mean that I don't like my factor 2 or 3 improvement in
code performance for codes where basic optimizations apply. But almost
all of those performance gains come at the lowest levels of optimization,
the more sophisticated stuff just doesn't bring much if any additional
benefit. (There's a reason Apple recommends -Os and not -O3 as default).
So don't get ahead of yourselves, other non-compiler optimizations can often
achieve 2-3 orders of magnitude improvement, and for a lot of
Objective-C code, for example,
the compiler's optimizations barely register at all. Again: perspective!
Furthermore, the purpose of "undefined behavior" was (not sure it still is)
to be inclusive, so for example compilers for machines with slightly odd
architectures could still be called ANSI-C without having to do unnatural
things on that architecture in order to conform to over-specification.
Sometimes, undefined behavior is needed for programs to work.
So when there is integer overflow, for example, that's not a license to
silently perform dead code elimination at certain optimization levels, it's
license to do the natural thing on the platform, which on most platforms
these days is let the integer overflow, because that is what a C programmer
is likely to expect. In addition, feel free to emit a warning. The
same goes for optimizing away an out of bounds array access that is
intended to terminate a loop. If you are smart enough to figure out
the out-of-bounds access, warn about it and then proceed to emit the
code. Eliminating the check and turning a terminating loop into an
infinite loop is never the right answer.
So please don't do this, you're not producing value: those optimizations
will cease to "help" when programmers "fix" their code. You are also
not producing value: any additional gains are extremely modest compared
to the cost. So please stop doing, certainly stop doing it on purpose,
and please carefully evaluate the cost/benefit ratio when introducing optimizations that cause this to happen as a side effect...and then
don't. Or do, and label them appropriately.
This quote from Steve Jobs is one that's been an inspiration to me for some time:
[...] when you first attack a problem it
seems really simple because you
don't understand it. Then when you
start to really understand it, you
come up with these very complicated
solutions because it's really hairy.
Most people stop there. But a few
people keep burning the midnight oil
and finally understand the underlying
principles of the problem and
come up with an elegantly simple
solution for it. But very few people
go the distance to get there.
In other words:
Naive Simplicity
Sophisticated Complexity
Sophisticated Simplicity
It's from the February 1984 Byte Interview introducing the Macintosh.
UPDATE: Well, it seems that Heinelein got there first:
Every technology goes through three stages: first, a crudely simple and quite unsatisfactory gadget; second, an enormously complicated group of gadgets designed to overcome the shortcomings of the original and achieving thereby somewhat satisfactory performance through extremely complex compromise; third, a final stage of smooth simplicity and efficient performance [..]
I like bindings. I also like Key Value Observing. What they do is undeniably cool: you do some initial setup, and presto: magic! You change a value over here, and another
value over there changes as well. Action at a distance. Power.
What they do is also undeniably valuable. I'd venture that nobody actually
likes writing state
maintenance and update code such as the following: when the user clicks this button, or finishes entering
text in that textfield, take the value and put it over here. If the underlying
value changes, update the textfield. If I modify this value, notify
these clients that the value has changed so they can update themselves accordingly.
That's boring. There is no glory in state maintenance code, just potential for
failure when you screw up something this simple.
Finally, their implementation is also undeniably cool: observing an attribute
of a generic
object creates a private subclass for that object (who says we can't do
prototype-based programming in Objective-C?), swizzles the object's
class pointer to that private subclass and then replaces the attribute's
(KVO-compliant) accessor methods with new ones that hook into the
KVO system.
Despite these positives, I have actively removed bindings code from
projects I have worked on, don't use either KVO or bindings myself and
generally recommend staying away from them. Why on earth would I
do that?
Excursion: Constraint Solvers
Before I can answer that question, I have to go back a little and talk about
constraint solvers.
The idea of setting up relationships once and then having the system maintain them
without manually shoveling values back and forth is not exactly new, the first variant
I am aware of was Sketchpad,
Ivan Sutherland's PhD Thesis from 1961/63 (here with narration by Alan Kay):
I still love Ivan's answer to the question as to how he could invent computer graphics,
object orientation and constraint solving in one fell swoop: "I didn't know it was hard".
The first system I am aware of that integrated constraint solving with an object-oriented
programming language was ThingLab, implemented on top of Smalltalk by Alan Borning at Xerox PARC around 1978 (where else...):
While the definition
of a paths is simple, the idea behind it has proved quite powerful and has been essential
in allowing constraint- and object-oriented metaphors to be integrated. [..] The notion
of a path helps strengthen [the distinction between inside and outside of an object] by
providing a protected way for an object to provide external reference to its parts and
subparts.
Yes, that's a better version of KVC. From 1981.
Alan Borning's group at the University of Washington continued working on constraint solvers
for many years, with the final result being the Cassowary linear constraint solver (based on the simplex
algorithm) that was picked up by Apple for Autolayout. The papers on Cassowary and
constraint hierarchies should help with understanding why Autolayout does what it does.
A simpler form of constraints are one-way dataflow constraints.
A one-way, dataflow constraint is an equation of the form y = f(x1,...,xn) in which the formula on the right side
is automatically re-evaluated and assigned to the variable y whenever any variable xi.
If y is modified from
outside the constraint, the equation is left temporarily unsatisfied, hence the attribute “one-way”. Dataflow constraints are recognized as a powerful programming methodology in a variety of contexts because of their versatility and simplicity. The most widespread application of dataflow constraints is perhaps embodied by spreadsheets.
The most important lessons they found were the following:
constraints should be allowed to contain arbitrary code that is written in the underlying toolkit language and does not require any annotations, such as parameter declarations
constraints are difficult to debug and better debugging tools are needed
programmers will readily use one-way constraints to specify the graphical layout of an application, but must be carefully and time-consumingly trained to use them for other purposes.
However, these really are just the headlines, and particularly for Cocoa programmers
the actual reports are well worth reading as they contain many useful pieces of
information that aren't included in the summaries.
Back to KVO and Cocoa Bindings
So what does this history lesson about constraint programming have to do with KVO
and Bindings? You probably already figured it out: bindings are one-way
dataflow constraints, specifically with the equation limited to y = x1.
more complex equations can be obtained by using NSValueTransformers. KVO
is more of an implicit invocation
mechanism that is used primarily to build ad-hoc dataflow constraints.
The specific problems of the API and the implementation have been documented
elsewhere, for example by Soroush Khanlou and Mike Ash, who not only suggested and
implemented improvements back in 2008, but even followed up on them in 2012. All
these problems and workarounds
demonstrate that KVO and Bindings are very sophisticated, complex and error prone
technologies for solving what is a simple and straightforward task: keeping
data in sync.
To these implementation problems, I would add performance: even
just adding the willChangeValueForKey: and didChangeValueForKey:
message sends in your setter (these are usually added automagically for you) without triggering any notifications makes that setter 30 times slower (from 5ns to
150ns on my computer) than a simple setter that just sets and retains the object.
Actually having that access trigger a notification takes the penalty to a factor of over 100
( 5ns vs over 540ns), even when there is only a single observer. I am pretty sure
it gets worse when there are lots of observers (there used to be an O(n^3)
algorithm in there, that was fortunately fixed a while ago). While 500ns may
not seem a lot when dealing with UI code, KVO tends to be implemented at
the model layer in such a way that a significant number of model data accesses
incur at least the base penalties. For example KVO notifications were one of the primary
reasons for NSOperationQueue's somewhat anemic performance back when
we measured it for the Leopard release.
Not only is the constraint graph not available at run time, there is also no
direct representation at coding time. All there is either code or IB settings
that construct such a graph indirectly, so the programmer has to infer the
graph from what is there and keep it in her head. There are also no formulae, the best
we can do are ValueTransformers and
keyPathsForValuesAffectingValueForKey.
As best as I can tell, the reason for this state of affairs is that there simply
wasn't any awareness of the decades of
research and practical experience with constraint solvers at the time (How
do I know? I asked, the answer was "Huh?").
Anyway, when you add it all up, my conclusion is that while I would really,
really, really like a good constraint solving system (at least for spreadsheet
constraints), KVO and Bindings are not it. They are too simplistic, too
fragile and solve too little of the actual problem to be worth the trouble.
It is easier to just write that damn state maintenance code, and infinitely
easier to debug it.
I think one of the main communication problems between advocates for and
critics of KVO/Bindings is that the advocates are advocating more for
the concept of constraint solving, whereas critics are critical of the
implementation. How can these critics not see that despite a few flaws,
this approach is obviously
The Right Thing™? How can the advocates not see the
obvious flaws?
Functional Reactive Programming
As far as I can tell, Functional Reactive Programming (FRP) in general and Reactive
Cocoa in particular are another way of scratching the same itch.
[..] is an integration of declarative [..] and imperative object-oriented programming. The primary goal of this integration is to use constraints to express relations among objects explicitly -- relations that were implicit in the code in previous languages.
Sounds like FRP, right? Well, the first "[..]" part is actually "Constraint Imperative Programming" and the second is "constraints",
from the abstract of a 1994 paper. Similarly, I've seen it stated that FRP is like a spreadsheet.
The connection between functional programming and constraint programming is also well
known and documented in the literature, for example the experience report above states the
following:
Since constraints are simply functional programming dressed up with syntactic sugar, it should not be surprising that 1) programmers do not think of using constraints for most programming tasks and, 2) programmers require extensive training to overcome their procedural instincts so that they will use constraints.
However, you wouldn't be able to tell that there's a relationship there from reading
the FRP literature, which focuses exclusively on the connection to functional
programming via functional reactive animations and Microsoft's Rx extensions. Explaining and particularly motivating FRP this way has the
fundamental problem that whereas functional programming, which is per definition
static/timeless/non-reactive, really needs something to become interactive,
reactivity is already inherent in OO. In fact, reactivity is the quintessence of
objects: all computation is modeled as objects reacting to messages.
So adding reactivity to an object-oriented language is, at first blush, non-sensical
and certainly causesconfusion when explained this way.
I was certainly confused, because until I found this one
paper on reactive imperative programming,
which adds constraints to C++ in a very cool and general way,
none of the documentation, references or papers made the connection that seemed so
blindingly obvious to me. I was starting to question my own sanity.
Architecture
Additionally, one-way dataflow constraints creating relationships between program variables
can, as far as I can tell, always be replaced by a formulation where the dependent
variable is simply replaced by a method that computes the value on-demand. So
instead of setting up a constraint between point1.x and point2.x,
you implement point2.x as a method that uses point1.x to
compute its value and never stores that value. Although this may evaluate more
often than necessary rather than memoizing the value and computing just once, the
additional cost of managing constraint evaluation is such that the two probably
balance.
However, such an implementation creates permanent coupling and requires dedicated
classes for each relationship. Constraints thus become more of an architectural
feature, allowing existing, usually stateful components to be used together without
having to adapt each component for each individual ensemble it is a part of.
Panta Rhei
Everything flows, so they say. As far as I can tell, two different
communities, the F(R)P people and the OO people came up with very similar
solutions based on data flow. The FP people wanted to become more reactive/interactive,
and achieved this by modeling time as sequence numbers in streams of values, sort
of like Lucid or other dataflow languages.
The OO people wanted to be able to specify relationships declaratively and have
their system figure out the best way to satisfy those constraints, with
a large and useful subset of those constraints falling into the category of
the one-way dataflow constraints that, at least to my eye, are equivalent
to FRP. In fact, this sort of state maintenance and update-propagation
pops up in lots of different places, for example makefiles or other
build systems, web-server generators, publication workflows etc. ("this
OmniGraffle diagram embedded as a PDF into this LaTeX document that
in turn becomes a PDF document" -> the final PDF should update
automatically when I change the diagram, instead of me having to
save the diagram, export it to PDF and then re-run LaTeX).
What's kind of funny is that these two groups seem to have converged
in essentially the same space, but they seem to not be aware of
each other, maybe they are phase-shifted with respect to each other?
Part of that phase-shift is, again, communication. The FP guys
couch everything in must destroy all humans er state rethoric,
which doesn't do much to convince OO guys who know that for most
of their programs, state isn't an implementation detail but fundamental
to their applications. Also practical experience does not support the
idea that the FP approach is obvious:
Unfortunately, given the considerable amount of time required to train students to use constraints in a non-graphical manner, it does not seem reasonable to expect that constraints will ever be widely used for purposes other than graphical layout. In retrospect this result should not have been surprising. Business people readily use constraints in spreadsheets because constraints match their mental model of the world. Similarly, we have found that students readily use constraints for graphical layout since constraints match their mental model of the world, both because they use constraints, such as left align or center, to align objects in drawing editors, and because they use constraints to specify the layout of objects in precision paper sketches, such as blueprints. However, in their everyday lives, students are much more accustomed to accomplishing tasks using an imperative set of actions rather than using a declarative set of actions.
Of course there are other groups hanging out in this convergence zone, for example the
Unix folk with their pipes and filters. That is also not too surprising if
you look at the history:
So, we were all ready. Because it was so easy to compose processes with shell scripts. We were already doing that. But, when you have to decorate or invent the name of intermediate files and every function has to say put your file there. And the next one say get your input from there. The clarity of composition of function, which you perceived in your mind when you wrote the program, is lost in the program. Whereas the piping symbol keeps it. It's the old thing about notations are important.
I think the familiarity with Unix pipes also increases the itch: why can't I have
that sort of thing in my general purpose programming language? Especially when
it can lead to very concise programs, such as the Quartz-like graphics subsystem
Gezira written in
under 400 lines of code using the Nile dataflow language.
Moving Forward
I too have heard the siren sing.
I also think that a more spreadsheet-like programming model would not just make my life
as a developer easier, it might also make software more approachable for end-user adaptation and tinkering,
contributing to a more meaningful version of open source. But how do we get there?
Apart from a reasonable implementation and better debuggingsupport, a new system would need much tighter
language integration. Preferably there would be a direct syntax for expressing constraints
such as that available in constraint imperative programming languages or constraint extensions to existing
languages like
Ruby or JavaScript.
This language support should be unified as much as
possible between different constraint systems, not one mechanism for Autolayout and a
completely different one for Bindings.
Supporting constraint programming has always been one of the goals of my Objective-Smalltalk project, and so far that has informed the
PolymorphicIdentifiers that support a uniform interface for data backed by different types of
stores, including one or more constraint stores supporting cooperating solvers, filesystems or web-sites. More needs
to be done, such as extending the data-flow connector hierarchy to conceptually integrate
constraints. The idea is to create a language that does not actually include constraints
in its core, but rather provides sufficient conceptual, expressive and implementation
flexibility to allow users to add such a facility in a non-ad-hoc way so that it is
fully integrated into the language once added. I am not there yet, but all the results
so far are very promising. The architectural focus of Objective-Smalltalk also ties
in well with the architectural interpretation of constraints.
There is a lot to do, but on the other hand I think the payback is huge, and there is
also a large body of existing theoretical,
practical and empirical groundwork to fall back on, so I think the task is doable.
Your feedback, help and pull requests would be very much appreciated!
After thinking about the id subset and being pointed to WebScript, Brent Simmons imagines a scripting language. I have to admit I have been imagining pretty much the same language...and at some
time decided to stop imagining and start building Objective-Smalltalk:
Peer of Objective-C: objects are Objective-C objects, methods are Objective-C methods,
added to the runtime and indistinguishable from the outside.
"You can subclass UIViewController, or write a category on it."
The example is from the site, it was copied
from an actual program. As you can see, interoperability with the C parts of
Objective-C is still necessary, but not bothersome.
This example was also copied from an actual small educational game that was
ported over from Flash.
You also get Higher Order Messaging, Polymorpic Identifiers etc.
Works with the toolchain: this is a a little more tricky, but I've made
some progress...part of that is an llvm based native compiler, part is
tooling that enables some level of integration with Xcode, part is
a separate toolset that has comparable or better capabilities.
While Objective-Smalltalk would not require shipping source code with your applications,
due to the native compiler, it would certainly allow it, and in fact my own
BookLightning imposition program
has been shipping with part of its Objective-Smalltalk source hidden its Resources
folder for about a decade or so. Go ahead, download it, crack it open and have
a look! I'll wait here while you do.
Did you have a look?
The part that is in Smalltalk is the distilled (but very simple) imposition algorithm
shown here.
What this means is that any user of BookLightning could adapt it to suit their needs,
though I am pretty sure that none have done so to this date. This is partly due to
the fact that this imposition algorithm is too limited to allow for much variation,
and partly due to the fact that the feature is well hidden and completely unexpected.
There are two ideas behind this:
Open Source should be more about being able to tinker with well-made
apps in useful ways, rather than downloading and compiling gargantuan and
incomprehensible tarballs of C/C++ code.
There is no hard distinction between programming and scripting. A
higher level scripting/programming language would not just make developer's
jobs easier, it could also enable the sort of tinkering and adaptation that
Open Source should be about.
I don't think the code samples shown above are quite at the level needed to really
enable tinkering, but maybe they can be a useful contribution to the discussion.
The feedback was, effectively: "This code is incorrect, it is missing a return type". Of course, the code isn't incorrect in the least bit, the return type is id, because that is the default type, and in fact, you will see this style in both Brad Cox's book:
and the early NeXTStep documentation:
Having a default type for objects isn't entirely surprising, because at that time id was not just the default type, it was the only type available for objects, the optional static typing for objects wasn't introduced into Objective-C until later. In addition the template for Objective-C's object system was Smalltalk, which doesn't use static types, you just use variable names.
Cargo-cult typing
So while it is possible (and apparently common) to write -(id)objectAtIndex:(NSUInteger)anIndex, it certainly isn't any more correct. In fact, it's
worse, because it is just syntactic noise [1][2], although it is arguably even worse than what Fowler describes because it isn't actually mandated by
the language, the noise is inflicted needlessly.
And while we could debate as to whether it is better or not to write things that are redundant
syntactic noise, we could also not, as that was settled almost 800 years ago: entia non sunt multiplicanda praeter necessitatem. You could also say KISS or "when in doubt, leave it out", all of which just
say the the burden of proof is on whoever wants to add the redundant pieces.
What's really odd about this phenomenon is that we really don't gain anything from typing
out these explicit types, the code certainly doesn't become more readable. It's as if
we think that by following the ritual of explicitly typing out a type, we made the
proper sacrifice to the gods of type-safety and they will reward us with correctness.
But just like those Pacific islanders that built wooden planes, radios and control
towers, the ritual is empty, because it conveys no information to the type system,
or the reader.
The id subset
Now, I personally don't really care whether you put in a redundant (id)
or not, I certainly have been reading over it (and not even really noticing) for
my last two decades of Objective-C coding. However, the mistaken belief that it
has to be there, rather than this is a personal choice you make, does worry me.
I think the problem goes a little deeper than just slightly odd coding styles, because it seems to be part and parcel of a drive towards making Objective-C look like an explicitly statically typed language along the lines of C++ or maybe Java,
with one of the types being id. That's not the case: Objective-C
is an optionally statically typed language. This means that you
may specify type information if you want to, but you generally
don't have to. I also want the emphasize that you can at best get Objective-C
to look like such a language, the holes in the type system are way too big for
this to actually gain much safety.
Properties started this trend, and now the ARC variant of the language turns what used to be warnings about unknown selectors needlessly into hard compiler errors.
Of course, there are some who plausibly argue that this always should have been an error,
or actually, that it always was an error, we just didn't know about it.
That's hogwash, of course. There is a subset of the language, which I'd like
to call the id subset, where all the arguments and returns are object
pointers, and for this it was always safe to not have additional type information,
to the point where the compiler didn't actually have that additional type information.
You could also call it the Smalltalk subset.
Another thing that's odd about this move to rigidify Objective-C in the face of
success of more dynamic languages is that we actually have been moving into the
right direction at the language base-level (disregarding the type-system): in general programming style, with new syntax support
for object literals and subscripting, SmallInteger style NSNumbers modern
Objective-C consists much more of pure objects than was traditionally the case.
And as long as we are dealing with pure objects, we are in the id subset.
A dynamic language
What's great about the id subset is that it makes incremental, explorative
programming very easy and lots of fun, much like other dynamic languages
such as Smalltalk, Python or Ruby.
(Not entirely like them, due to the need to compile to native code, but compilers are fast these
days and there are possible fixes such as Objective-Smalltalk.)
The newly enforced rigidity is starting to make explorative programming in Objective-C much
harder, and a lot less fun. In fact, it feels much more like C++ or Java and much less
like the dynamic language that it is, and in my opinion is the wrong direction: we should
be making our language more dynamic, and of course that's what I've been doing. So while I wouldn't agree with that tradeoff even if
it were true, the fact is that we aren't actually
getting static type safety, we are just getting a wood prop that will not fly.
Discussion on Hacker News.
UPDATE: Inserted a little clarification that I don't care about bike-shedding your code
with regard to (id). The problem is that people's mistaken belief both that and why it has to be there is symptomatic of that deeper trend I wrote about.
Just had a case of codesign telling me my app was fine, just for the same app to be rejected by GateKeeper. The spctl tool fortunately was more truthful, but didn't really say where the problem was.
A little sleuthing determined that although I had signed all my frameworks with the Developer ID, two auxiliary executables were signed with my development certificate.
Lesson learned: don't trust codesign, use spctl to verify your binaries.
Actually: no it isn't, Transact-SQL got the honors. Apart from the obvious question, "Transact-Who?", it really should have been Objetive-C, because Tiobe readjusted the index mid-year in a way that resulted in a drop of 0.5% for the popular languages, which is fine, but without readjusting the historical data! Which is...not...especially if you make judgements based on relative performance.
In this case, Transact-SQL beat Objective-C by 0.17%, far less than the roughly 0.5% drop suffered by Objective-C mid-year. So Objective-C would have easily done the
hat-trick, but I guess Tiobe didn't want that and rigged the game to make sure
it doesn't happen.
Not that it matters...
UPDATE: I contacted Tiobe and they confirmed, both the lack of rebaselining and that Objective-C would likely have won an unprecedented third time in a row.