It's been a long time coming. NeXTStep in 1989 featured DisplayPostscript, and therefore
a device independent imaging model that meant you did not specify graphics in pixels, but
rather in physical units. The default was a variant of the printer's point at 1/72nd of
an inch, which happened to be close the the typical pixel resolution of displays at the
time. However, 1 point never meant 1 pixel, it meant 1/72nd of an inch,
and the combination of floating point coordinates and transformation matrices
meant you could use pretty much any unit you wanted. When NeXT bought Apple, it
brought this imaging model with it, although with some modifications due to Adobe
intransigence about licensing and the addition of anti-aliasing.
However, despite the device-independent APIs, we still have pixel-based content,
and "pixel-accurate" graphics. This has made less and less sense over time,
with retina displays making pixel-accuracy moot (no more screen fonts!) scaled
modes making it impossible and both iOS 7 and OS X 10.10 going for a more
geometric look. Still, the design community has resisted, talking about
@3 pixel art etc.
No more.
The iPhone 6 Plus has a 1920x1080 panel, but the simulator renders at 3x. These
two resolutions don't match and so the pixels will need to be downsampled to
the display resolution. Whether that is accomplished by downsampling pixel art
(which happens automagically with Quartz and the proper device transform set)
or as a separate step that downsamples the entire rendered framebuffer doesn't
matter (much). Either way, there are no more "pixel perfect" pre-rendered
designs.
Device-independent graphics, here we come at last.
We're only a quarter century late.
Update: "Its 401 PPI display is the first display I’ve ever used on which, no matter how close I hold it to my eyes, I can’t perceive the pixels. " - John Gruber (emphasis mine)
I recently stumbled on Rob Napier's explanation of the map
function in Swift. So I am reading along yadda yadda when suddenly I wake up and
my eyes do a double take:
After years of begging for a map function in Cocoa [...]
Huh? I rub my eyes, probably just a slip up, but no, he continues:
In a generic language like Swift, “pattern” means there’s a probably a function hiding in there, so let’s pull out the part that doesn’t change and call it map:
Not sure what he means with a "generic language", but here's how we would implement a map function in Objective-C.
#import <Foundation/Foundation.h>
typedef id (*mappingfun)( id arg );
static id makeurl( NSString *domain ) {
return [[[NSURL alloc] initWithScheme:@"http" host:domain path:@"/"] autorelease];
}
NSArray *map( NSArray *array, mappingfun theFun )
{
NSMutableArray *result=[NSMutableArray array];
for ( id object in array ) {
id objresult=theFun( object );
if ( objresult ) {
[result addObject:objresult];
}
}
return result;
}
int main(int argc, char *argv[]) {
NSArray *source=@[ @"apple.com", @"objective.st", @"metaobject.com" ];
NSLog(@"%@",map(source, makeurl ));
}
This is less than 7 non-empty lines of code for the mapping function, and took me less
than 10 minutes to write in its entirety, including a trip to the kitchen for an
extra cookie, recompiling 3 times and looking at the qsort(3) manpage
because I just can't remember C function pointer declaration syntax (though it took
me less time than usual, maybe I am learning?). So really, years of "begging" for
something any mildly competent coder could whip up between bathroom breaks or
during a lull in their twitter feed?
Or maybe we want a version with blocks instead? Another 2 minutes, because I am a klutz:
#import <Foundation/Foundation.h>
typedef id (^mappingblock)( id arg );
NSArray *map( NSArray *array, mappingblock theBlock )
{
NSMutableArray *result=[NSMutableArray array];
for ( id object in array ) {
id objresult=theBlock( object );
if ( objresult ) {
[result addObject:objresult];
}
}
return result;
}
int main(int argc, char *argv[]) {
NSArray *source=@[ @"apple.com", @"objective.st", @"metaobject.com" ];
NSLog(@"%@",map(source, ^id ( id domain ) {
return [[[NSURL alloc] initWithScheme:@"http" host:domain path:@"/"] autorelease];
}));
}
Of course, we've also had collect for a good decadeorso, which turns the client code into the following,
much more readable version (Objective-Smalltalk syntax):
NSURL collect URLWithScheme:'http' host:#('objective.st' 'metaobject.com') each path:'/'.
As I wrote in my previous post, we seem to be regressing to a mindset about computer
languages that harkens back to the days of BASIC, where everything was baked into the
language, and things not baked into the language or provided by the language vendor do not exist.
Rob goes on the write "The mapping could be performed in parallel [..]", for example like parcollect? And then "This is the heart of good functional programming." No. This is the heart of good programming.
Having processed that shock, I fly over a discussion of filter (select) and stumble over
the next whopper:
It’s all about the types
Again...huh?? Our map implementation certainly didn't need (static) types for the list, and
all the Smalltalkers and LISPers that have been gleefully using higher order
techniques for 40-50 years without static types must also not have gotten the memo.
We [..] started to think about the power of functions to separate intent from implementation. [..] Soon we’ll explore some more of these transforming functions and see what they can do for us. Until then, stop mutating. Evolve.
All modern programming separates intent from implementation. Functions are a
fairly limited and primitive way of doing so. Limiting power in this fashion can be
useful, but please don't confuse the power of higher order programming with the
limitations of functional programming, they are quite distinct.
About a month ago, Jesse Squires published a post titled Apples to Apples, documenting benchmark results that he
claims show Swift now with a roughly 10x performance advantage over Objective-C.
Although completely bogus,
the post was retweeted by Chris Lattner (who should know better, and was supposedly
mostly interested in highlighting the improvements in the Swift optimizer, rather than
the bogus comparison) and has now been referenced a number of times as background
knowledge as to the state of Swift. More importantly, though the actual mistake
Jesse makes is pretty basic and not that interesting, it does point to some deeper
misunderstandings about performance and language that I at least do find interesting.
So what's the mistake? Ironically, given the post's title, is that he is comparing
apples to oranges, so to speak. The following table, which shows the time to sort
an array of 10000 numbers 10 times in millisecond, illustrates the problem:
NSNumber
native integer
Objective-C
6.04
0.97
Swift
11.92
0.8
Jesse compared the two versions highlighted, so native Swift integers with Objective-C
NSNumber object wrappers.
All times are for binaries with optimization enabled, the machine was a 13" MBR with
2.9 GHz Intel Core i7 and 8GB of RAM. The integer sort was done using a C integer
array and the system qsort() function. When you compare apples to apples,
Objective-C has a roughly 2x edge with NSNumbers and is around 18% slower for native
integers, at least when using qsort()
Why the 18% disadvantage? The qsort() function is made generically applicable to
different types of arrays using a function pointer parameter for the comparison function
that itself is parametrized using pointers to the elements to be compared. This means
there is a per-comparison overhead of one function call and two pointer dereferences
per comparison. That overhead overwhelms the actual comparison operation, which is
a single machine instruction on most processors.
Swift, on the other hand, appears to produce a version of the sort function that is
specialized to
the integer type, with the comparison function inlined to the generated function
so there is no function call or pointer dereference overhead.
That is very clever and a Good Thing™ for performance. Sort of. The drawback is
that this breaks separate compilation, because the functions actually have to be
combined during the compile/link process every time it is used (I assume there is
caching going on so we only got one per type combination).
Apart from making the compiler/linker slower , possibly significantly so
(like C++ headers, though I presume they use LLVM bitcode to optimize the process),
it also likely bloats the executable, causing cache and memory pressure. So it's
a tradeoff, as usual, and while I think having the ability to specialize at
compile-time is good, not being able to control it is not.
Objective-C doesn't have this ability to automagically adapt a function or method
to parameters, if you want inlining the relationship has to be known at definition
not at point of use. However, if the benefit of inlining is only 21% for the
most primitive type, a machine integer, then it is clear that that the set of
types for which compile-time specialization is beneficial at all is small.
Cocoa of course already provides specialized collection classes for the byte and
unichar types, NSData and NSString respectively.
I never quite understood why this wasn't extended to the other primitive types, particularly
integer and float/double. On the other hand, the omission never bothered me much, I
just implemented those classes myself in MPWFoundation. MPWRealArray even has support for DisplayPostscript
binary object sequences, it's that old!
Both MPWRealArray and the corresponding MPWIntArray classes are small and fairly trivial to implement, and once I have them,
using a specialized integer or real array is at least as convenient as using an NSArray, just a lot faster. They could also be quite a bit smaller than they are, sharing
code either via subclassing or poor-man's generic programming via include files. Once
I have a nice OO interface, I can swap out the implementation for something really quick
like a dual-pivot integer sort I found in Java-land and adapted to C. (It is surprising
just how similar they are at that level). With that sort, the test time drops to 0.56 ms,
so 42% faster than the Swift version and almost twice as fast as the system qsort()
function.
So the takeaway is that if you are using NSNumber objects in performance-sensitive code: stop.
This is always a mistake. The native number types for Objective-C are int,
float, double and friends, not NSNumber. After all, how
do you perform arithmetic? Either directly on a primitive or by unboxing the NSNumber
and then performing arithmetic on the primitive and then reboxing. Use primitive scalar
types as much as possible where they make sense.
A second takeaway is that the question "which language is faster" doesn't really make
sense, a more relevant and interesting question is "does this language make it
hard/possible/easy to write fast code". Objective-C lets you write really fast code,
if you want to, because it has the low-level chops and an understandable performance
model. Swift so far can achieve reasonable performance at times, ludicrously bad
at other times (especially with the optimizer turned off, which hardly fazes Objective-C),
with as far as I can tell fairly little predictability or control. Having 10% faster
(or slower) performance for code I don't particularly care about is not worth nearly
as much as the knowledge that I can get the 1-5% of code that I do care about in
shape no matter what. Swift is definitely not there yet, and given the direction
it is taking I am not sure whether it will allow that kind of control, at least in
comprehensible ways.
A third point is something more general about language. The whole argument that
NSNumber and NSArray are "built in" somehow and int is not displays a lack of
understanding of Objective-C that to me seems staggering. Even more so, the
whole idea that you must only use what comes provided with Cocoa and are not
allowed to build your own flies in the face of modern language design, throwing
us back to the times of BASIC (Arthur Luehrmann, in the comments):
I had added graphics primitives to Dartmouth Basic around 1976 and developed an X-Y pen-plotter to carry out graphics commands mixed in with the text being sent to Teletype terminals.
The idea is that is that a language is a bundle of features, or to put it linguistically,
a language is a list of words to be used as is.
Both C and Pascal introduced me to a new notion: that languages are not lists of
words, but means of constructing your own words. For example, C did/does not have
I/O as a language feature. I/O was just another set of functions placed in a
library that you included just like any of your own functions/libraries.
And there were two sets of them, the stdio package and the raw Unix
I/O.
At around the same time I was introduced to both top-down and bottom-up
programming. Both assume there is a recursive de-composition of the
problem at hand (assuming the problem sufficiently complex to warrant it).
In bottom-up programming, you build up the vocabulary (the procedures and functions) that are necessary to succinctly describe your top-level problem,
and then you describe your program in terms of that vocabulary you created.
In top-down programming, you start at the other end and write your top-level
program in terms of the vocabulary you wish you had to optimally describe
the problem. Then you fill in the blanks.
In both, you define your own language to fit the problem, then you solve the
problem using the language you defined. You would not add plotting commands
to the language, you would either add plotting commands as a library or, if
that were not possible, a way
of adding plotting commands as a library. You would not look at whether plotting
comes with the "standard library" or not. To quote Guy Steele in Growing a Language:
This is the nub of what I want to say. A language design can no longer be a thing. It must be a pattern—a pattern for growth—a pattern for growing the pattern for defining the patterns that programmers can use for their real work and their main goal.
So build your own libraries, your own abstractions. It's easy, fun and useful.
It's the heart of Domain Driven Design, probably the most productive and effective
software construction technique we as an industry have come up with to date.
See what abstractions you can build easily and which ones are hard. Analyze
the latter and you have started on the road to modern language design.
CORRECTION (June 4th 2015): I misattributed the Dartmouth BASIC quote to Cathy Doser, when
the comment line on the Macintosh folklore entry clearly said Arthur Luehrmann. (Cathy's
comment was a bit earlier).
If you're building a UIView subclass that needs to set up a mess of subviews this can get old really quick. Best option I've found so far? Just initialize them with a default value like you would a regular variable. Now the compiler's off your back and and you can move on with your life, or at least what's left of it after choosing software development as a career.
This is something people who create elaborate mechanisms to force people to "Do the Right Thing"
never seem to understand: they hardly ever achieve what they are trying to achieve.
Instead, people will do the minimal amount of work to get the compiler off their backs.
Compare Java's checked exceptions.
I just took my car to its biennial TüV inspection and apart from the tires that had simply
worn out everything was A-OK, nothing wrong at all. Kind of surprising for a 7 year old
mechanical device that has been used: daily commute from Mountain View to Oakland, tight
cornering in the foothills, shipped across the Atlantic twice and now that it is back in its
native country, occasional and sometimes prolonged sprints at 200 km/h. All that with not all
that much maintenance, because the owner is not exactly a car nut.
Cars used to not be nearly this reliable, and getting there wasn't easy, it took the
industry both plenty of time and a lot of effort. It's not that the engineers
didn't know how to build reliable cars, but making them reliable and keeping them
affordable and still allowing car companies to turn a profit, that
was hard.
One particular component is the alternator belt, which had to be changed so frequently
that engine compartments were specially designed to make the belt easily accessible.
That's no longer the case, and the characteristic screeching sound of a worn belt
is one that I haven't heard in a long time.
My late dad, who was in the business, told me how it went down, at least at Volkswagen.
As other problems had been whittled away over the decades, alternator belts were becoming
a real issue on the reliability reports compiled by motoring magazines, and the engineers
were tasked with the job of fixing the problem. And fix it they did: they came up with
a design that would "never" break or wear out, and no I don't know the details of how
that was supposed to work.
Problem was: it was a tad expensive. Much more expensive than the existing solution
and simply too expensive for the price bracket they were aiming for (this may seem
odd to outsiders considering the total cost of a car, but pennies matter). Which of course
was one reason why they had put up with unreliable belts for so long. Then word came in that
the Japanese had solved the problem as well, and were offering it on their cheap(er)
models. Next auto-show, they went to the both of one of those Japanese companies
and popped the hood.
The engineers scoffed: the design the Japanese was cheaper because it was much, much
more primitive than the one they had come up with, and it would, in fact, also wear
out much more quickly. But exactly how much more quickly would it wear out? In other
words, what was the expected lifetime of this cheaper, inferior alternator belt design?
About the expected lifetime of the car.
Ahh. As far as I can tell, the Japanese design or variants thereof conquered the world. I can't
recall the last time I heard the screech of a worn out belt, engine compartments
these days are not designed with accessibility in mind and cars are still affordable,
although changing the belt if it does break will cost more in labor because of the
less accessible placement.
What do alternator belts have to do with software development? Probably nothing, but
to me at least, the situation reminds me of the one I write about in The Safyness of Static Typing. I am actually with those commenters who scoffed at the idea that the safety
benefit of static typing is only around 2%, because theoretically having a tight specification
of possible values checked at compile-time absolutely should bring a greater
benefit.
For example, when static typing and protocols were introduced to Objective-C, I absolutely
expected them to catch my errors, so I was quite surprised when it turned out that in practice they didn't: because
I could actually compile/run/test my code without having to specify static types, by the time
I added static types the code simply no longer had type errors, because the vast majority
of those were caught by running it. The dynamic safety also
helped, because instead of a random crash, I got a nice clean error message
"object abc doesn't understand message xyz".
My suspicion is that although dynamic typing and the practices that go with it may only be,
let's say, 50% as good at catching type errors as a good static type system, they are
actually 98% effective at catching real world type errors. So if static type systems
are twice as good, they would be 196% effective at catching real world type errors, which
just like the perfect, german-engineered alternator belts, is simply more than is
actually needed (96% more with my hypothetical numbers).
There are obviously other factors at play, but I think this may account for a good part of
the perceived discrepancy.
What do you think? Comments welcome here or on Hacker News.
The team had been ignoring it, because it just didn't make any sense and they had other
things to do. (Fortunately not too many other crashes, the app is pretty solid at this
point). When they turned to me, I was also initially puzzled, because all this should
do on x86 is stuff a zero into %eax and return. This cannot possibly crash[1],
so everyone just assumed that the stack traces were off, as they frequently are.
Fortunately I had just looked at the project settings and noticed that we were compiling
with -O0, so optimizations disabled, and my suspicion was that ARC was doing some
unnecessary retaining. That suspicion turned out to be on the money, otool -Vt revealed that
ARC had turned our innocuous return NO; into the following monstrosity:
Yikes! Of course, this is how ARC works: it generates an insane amount of retains and
releases (hidden inside objc_storeStrong()), then relies on a special optimization pass to remove the insanity and
leave behind the necessary retains/releases. Turn on the "standard" optimization
-Os and we get the following, much more reasonable result:
It isn't clear why those retains/releases were crashing, all the objects involved looked OK in the debugger, but at least we will no longer be puzzled by code that can't possibly crash...crashing, and therefore have a better chance of actually debugging it.
Another issue is performance. I just benchmarked the following equivalent program:
#import
@interface Hi:NSObject {}
-(BOOL)doSomething:arg1 with:arg2;
@end
@implementation Hi
-(BOOL)doSomething:arg1 with:arg2
{
return NO;
}
@end
int main( int argc, char *argv[] )
{
Hi *hi=[Hi new];
for (int i=0;i < 100000000; i++ ) {
[hi doSomething:hi with:hi];
}
return 0;
}
On my 13" MBPR, it runs in roughly 0.5 seconds with ARC disabled and in 13 seconds with
ARC enabled. That's 26 time slower, meaning we now have a highly non-obvious performance
model, where performance is extremely hard to predict and control. The simple and obvious
performance model was one of the main reasons Objective-C code tended to actually be
quite fast if even minimal effort was expended on performance, despite the fact that
some parts of Objective-C aren't all that fast.
I find the approach of handing off all control and responsibility to the optimizer
writers worrying. My worries stem partly from the fact that I've never actually
had that work in the past. With ARC it also happens that
the optimizer can't figure out a retain/release isn't needed, so you need to
sprinkle
a few __unsafe_unretains throughout your code (not many, but you need to
figure out which).
Good optimization has always been something that
needed a human touch (with automatic assistance), the message "just trust
the compiler" doesn't resonate with me. Especially since, and this is
the other part I am worried about, compiler optimizations have been
getting crazier and crazier, clang for example thinks
there is nothing wrong with producing two different values for
de-referencing the same pointer (at the same time, with no
stores in-between (source: http://blog.regehr.org/archives/767):
#include
#include
int main() {
int *p = (int*)malloc(sizeof(int));
int *q = (int*)realloc(p, sizeof(int));
*p = 1;
*q = 2;
if (p == q)
printf("%d %d\n", *p, *q);
}
I tested this with clang-600.0.34.4 on my machine and it also gives
this non-sensical result: 1 2. There are more examples, which I also wrote about
in my post cc -Osmartass.
Of course, Swift moves further in this direction, with expensive default semantics
and reliance on the compiler to remove the resulting glaring inefficiencies.
In what I've seen reported and tested myself, this approach results in differences between normal
builds and -Ofast-optimized builds of more than a factor of 100. That's
not close to being OK, and it makes code much harder to understand and optimize. My
guess is that we will be looking at assembly a lot more when optimizing Swift than
we ever did in Objective-C, and then scratching our heads as to why the optimizer
didn't manage to optimize that particular piece of code.
I fondly remember the "Java optimization" WWDC sessions back when we were supposed to rewrite
all our code in that particular new hotness. In essence, we were given a model of the
capabilities of HotSpot's JIT optimizer, so in order to optimize code we had to know
what the resulting generated code would be, what the optimizer could handle (not a lot),
and then translate that information back into the original source code. At that point,
it's simpler to just write yourself the assembly that you are trying to goad the
JIT into emitting for you. Or portable Macro Assembler. Or object-oriented portable
Macro Assembler.
Apple's new Swift programming language has been heavily promoted as being a safer
alternative to Objective-C, with a much stronger emphasis on static typing, for
example. While I am dubious about the additional safety of static typing, I
argue that it produces far more safyness than actual safety, this post is
going to look at a different feature: overflow protection.
Overflow protection means that when an arithmetic
operation on an integer exceeds the maximum value for that integer type, the value
doesn't wrap around as it does on most CPU ALUs, and by extension C. Instead the
program signals an exception and since Swift has no exception handling the program
crashes.
While this looks a little like the James Bond anti theft device in For Your Eyes Only,
which just blows up the car, the justification is that the program should be protected
from operating on values that have become bogus. While I understand the reasoning, I
am dubious that it really is safer to have every arithmetic operation on integers and
every conversion from higher precision to lower in the entire
program become a potential crash site, when before those operations could never
crash (except for division by zero).
While it would be interesting to see what evidence there is for this argument, I can
give at least one very prominent example against it. On June 4th 1996, ESA's brand
new Ariane 5 rocket blew up during launch, due to a software fault, with a total
loss of US $370 million, apparently one of the most expensive software faults in history.
What was that software fault? An overflow protection exception triggered by a floating point
to (short) integer conversion.
The resulting core-dump/diagnostics were then
interpreted by the next program in line as valid data, causing effectively random
steering inputs that caused the rocket to break up (and self destruct when it
detected it was breaking up).
What's interesting is that almost any other handling of the overflow apart from
raising an exception would have been OK and saved the mission and $370 million.
Silently truncating/clamping the value to the maximum permissible range (which
some in the static typing community incorrectly claim was the problem) would have worked perfectly
and was the actual solution used for other values.
Even wraparound might have worked, at least there would have been only one
bogus transition after which values would have been mostly monotonic again.
Certainly better than effectively random values.
Ideally, the integer would have just overflowed into a higher precision as in
a dynamic language such as Smalltalk, or even Postscript. Even JavaScript's somewhat
wonky idea that all numbers are floats, but some just don't know it yet would
have been better in this particular case. Considering the limitations of
the hardware those languages weren't options, but nowadays the required
computational horsepower is there.
In Ada you at least could potentially trap the exception generated by overflow,
but in Swift the only protection is to manually trace back the inputs of every
arithmetic operation on integers and enforce ranges for all possible combinations
of inputs that do not result in that operation overflowing. For any program
with external inputs and even slightly complex data paths and arithmetic, I would
venture to say that that is next to impossible.
The only viable method for avoiding arithmetic overflow is to not use integer
arithmetic with any external input, ever. Hello JavaScript!
with Ada.Text_IO,Ada.Integer_Text_IO;
use Ada.Text_IO,Ada.Integer_Text_IO;
procedure Hello is
b : FLOAT;
a : INTEGER;
begin
b:=3123123.0;
b:=b*b;
a:=INTEGER(b);
Put("a=");
Put(a);
end Hello;
You can watch your Swift playground crash using the following code:
var a = 2
var b:Int16
for i in 1..100 {
a=a*2
println(a)
b=Int16(a)
}
Note that neither the Ada nor Swift compilers have static checks that detect the overflow,
even when all the information is statically available, for example in the following Swift code:
var a:UInt8
a = 254
println(a)
a += 2
println(a)
What's even worse is that the -Ofast flag will remove the checks,
the integer will just wrap around. Optimization flags in general should not change
visible program behavior, except for performance.
Or maybe this is good, since it looks like we need that flag to get decent performance at all,
we also remove the overflow crashers...
I like static (manifest) typing. This may come as a shock to those who
have read other posts of mine, but it is true. I certainly am more
comfortable with having a MPWType1FontInterper *interpreter
rather than id interpreter. Much more comfortable, in fact, and
this feeling extends to
Xcode saying "0 warnings" and the clang static analyzer agreeing.
Safety
The question though is: are those feelings actually justified? The rhetoric
on the subject is certainly strong, and very rigid/absolute. I recently had a Professor
of Computer Science state unequivocally that anyone who doesn't use static typing should
have their degree revoked. In a room full of Squeakers. And that's not an
extreme or isolated case. Just about any discussion on the subject seems to
quickly devolve into proponents of static typing claiming absolutely that
dynamic typing invariably leads to programs that are steaming piles of bugs and crash left
and right in production, whereas statically typed programs have their bugs
caught by the compiler and are therefore safe and sound. In fact, Milner has supposedly made the claim that "well typed programs cannot go wrong". Hmmm...
That the compiler is capable of catching (some) bugs using static type checks
is undeniably true. However, what is also obviously true is that not all bugs are type
errors (for example, most of the 25 top software errors don't look like type errors
to me, and neither goto fail; nor Heartbleed look like type errors either, and neither
do the top errors in my different projects),
so having the type-checker give our programs a clean bill of health does not
make them bug free, it eliminates a certain type or class of bugs.
With that, we can take the question from the realm of religious zealotry to the
realm of reasoned inquiry: how many bugs does static type checking catch?
Alas, this is not an easy question to answer, because we are looking for something
that is not there. However, we can invert the question: what is the incidence
of type-errors in dynamically typed programs, ones that do not benefit from the
bug-removal that the static type system gives us and should therefore be steaming
piles of those type errors?
With the advent of public source repositories, we now have a way of answering that
question, and Robert Smallshire did the grunt work to come up with an answer: 2%.
So all those nasty type errors were actually not
having any negative impact on debug times, in fact the reverse was true. Which of
course makes sense if the incidence of type errors is even near 2%, because then other factors
are almost certain to dominate. Completely.
Some people are completely religious about type systems and as a mathematician I love the idea of type systems, but nobody has ever come up with one that has enough scope. If you combine Simula and Lisp—Lisp didn’t have data structures, it had instances of objects—you would have a dynamic type system that would give you the range of expression you need.
Even stringent advocates of strong typing such as Uncle Bob Martin, with whom I sparred
many a time on that and other subjects in comp.lang.object have now come around to this
point of view: yeah, it's nice, maybe, but just not that important, and in fact he
has actually reversed his position, as seen in this video of him debating static typing with Chad Fowler.
Truthiness and Safyness
What I find interesting is not so much whether one or the other is right/wronger/better/whatever, but rather
the disparity between the vehemence of the rhetoric, at least on one side of the
debate ("revoke degrees!", "can't go wrong!") and both the complete lack of empirical evidence
for (there is some against) and the lack of magnitude of the effect.
Stephen Colbert coined the term "truthiness" for "a "truth" that a person making an argument or assertion claims to know intuitively 'from the gut' or because it 'feels right' without regard to evidence, logic, intellectual examination, or facts." [Wikipedia]
To me it looks like a similar effect is at play here: as I notice myself, it just feels
so much safer if the computer tells you that there are no type errors. Especially if it
is quite a bit of effort to get to that state, which it is. As I wrote, I notice that
effect myself, despite the fact that I actually know the evidence is not there,
and have been a long-time friendly skeptic.
So it looks like static typing is "safy": people just know intuitively that it
must be safe, without regard to evidence. And that makes the debate both so
heated and so impossible to decide rationally, just like the political debate on
"truth" subjects.
One of the things I find curious is how Apple's new Swift language rehashes mistakes that were made in other languages. Let's take construction or
initializers.
Objective-C/Smalltalk
These are the rules for initializers in Smalltalk and Objective-C:
An "initializer" is a normal method and a normal message send.
There is no second rule.
There's really nothing more to it, the rest follows organically and naturally
from this simple fact and various things you like to see happen. For example, is
there a rule that you have to send the initial initializer (alloc
or new) to the class?
No there isn't, it's just a convenient and obvious place to put it since we don't
have the instance yet and the class exists and is an obvious place to go to for
instances of that class. However, we could just
as well ask a different class to create the object for us.
The same goes with calling super. Yes, that's usually
a good idea, because usually you want the superclass's behavior,
but if you don't want the superclass's behavior, then don't call.
Again, this is not a special rule for initializers, it usually
follows from what you want to achieve. And sometimes it doesn't, just
like with any other method you override: sometimes you call super,
sometimes you do not.
The same goes for assigning the return value, doing the
self=[super init]; dance. Again, this is not
at all required by the language or the frameworks, although
apparently it is a common misconception that it is, a
misconception that is, IMHO, promoted by careless creation
of "best practices" as "immutable rules", something I wrote
about earlier when talking about the useless typing out of the
id type in method declarations.
However, returning self and using the returned value is a useful
convention, because it makes it possible for init
methods to return a different object than what they started
with (for example a specific subclass or a singleton).
Swift initializers
Apple's new Swift language has taken a page from the C++ and Java
playbooks and made initialization a special case. Well, lots
of special cases actually. The Swift book has 30 pages on
initialization, and they aren't just illustration and explanation,
they are dense with rules and special cases. For example:
You can set a default value of a property in the variable definition.
Or you can set the default value in an initializer.
Designated initializers are now a first class language construct.
Parameterized initializers have local and external parameter names, line methods.
Except that the first parameter name is different and so Swift automatically provides and external parameter name for all arguments, which
it doesn't with methods.
Constant properties aren't constant in initializers.
Swift creates a default initializer for both classes and structs.
Swift also creates a default member wise initializer, but only
for structs.
Initializers can (only) call other initializers, but there are special
rules for what is and is not allowed and these rules are different
for structs and classes.
Providing specialized initializers removes the automatically-provided
default initializers.
Initializers are different from other methods in that they are
not inherited, usually.
Except that there are specific circumstances where they are
inherited.
Confused yet? There's more!
If your subclass provides no initializers itself, it inherits
all the superclass's initializers
If your subclass overrides all the superclass's designated
initializers, it inherits all the convenience initializers (that's
also a language construct). How does this not break if the
superclass adds initializers? I think we've just re-invented
the fragile-base-class problem.
Oh, and you can initialize instance variables with the values
returned by closures or functions.
Well, that was easy, but that's probably only because I missed a few.
Having all these rules means that
this new way of initialization is less powerful than the one
before it, because all of these rules restrict the power that
a general method has.
Particularly, it is not possible to substitute a different value
or return nil to indicate failure to initialize, nor
is it possible to call other methods (as far as I can tell).
To actually provide these useful features, we need something else:
Use the Factory method pattern to actually
do the powerful stuff you need to do ...
...which gets you back to where we were at the beginning
with Objective-C or Smalltalk, namely sending a normal message.
Of course, we are familiar with this because both C++ and Java also have
special constructor language features, plagued by the same
problems. They are also the source of the Factory method pattern,
at least as a separate "pattern". Smalltalk and Objective-C simply
made that pattern the default for object creation, in fact Brad Cox
called classes "Factory Objects", long long before the GOF patterns book.
First rule of baking programming conventions into the language: Don't do it! The second rule of baking programming conventions into the language (experts only): Don't do it yet!
Well, it's impolite, isn't it? But seriously,
when I first heard about mock object testing, I was excited, because it
certainly sounded like The Right Thing™: message-based, checking
relationships instead of state, and the new hip thing.
However, when I looked at actual examples, they looked sophisticated
and obscure, the opposite of what I feel unit tests should be:
obvious and simple, simplistic to the point of stupidity. I couldn't figure
out at a glance what the expected behavior was, what was being
tested and what was environment.
So I never used mocks in practice, meaning my opinions could not
go beyond being superficial. Fortunately, I was given the
task of porting a fairly large Objective-C project to OS X
(yes, you read that right: "to OS X" ), and it was heavily
mock-tested.
As far as I could tell, most of the vague premonitions I had
about mock testing were borne out in that project: obscure
mock tests, mock tests that didn't actually test anything except their
own expectations and mock tests that were deeply coupled to
implementation details.
Again, though, that could just be my misunderstandings, certainly
people for whom I have a great deal of respect advocate for
mock tests, but I was heartened when I heard in the recent
DHH/Fowler/Beck TDD death-matchesfriendlyconversations that neither Kent nor Martin are
great fans of mocking, and certainly not of deeply nested mocks.
However, it was DHH's comments that finally made me realize that
what really bothered was something more subtle, and
much more pervasive. The talk is about "mocking the database",
or mocking some other component. While not proof positive, this
kind of mocking seems indicative of not letting the tests drive
the design towards simplicity, because the design is already
set in stone.
As a result, you're going to have constant pain, because the
tests will continuously try to drive you towards simplifying
your design, which you resist by putting in mocks.
Instead of putting in mocks of presumed components, let the
tests tell you what counterparts they want. Then build those
counterparts, again in simplest way possible. You will likely
discover that a lot of your assumptions about the required
environment for your application turn out not to be true.
For example, when building SportStats v2 at the BBC we thought
we needed a database for persistence. But we didn't build it
in until we needed it, and we didn't mock it out either. We
waited until the code told us that we now needed a database.
It never did.
So we discovered that our problem was simpler than we had
originally thought, and therefore our architecture could
be as well. Mocking eliminates that feedback.
So don't mock. Because it's impolite to not listen to what
your code is trying to tell you.
Objective-Smalltalk is now getting into a very nice virtuous cycle of
being more useful, therefore being used more and therefore motivating changes
to make it even more useful. One of the recent additions was autocomplete,
for both the tty-based and the GUI based REPLs.
I modeled the autocomplete after the one in bash and other Unix shells:
it will insert partial completions without asking up the point that they
become ambiguous. If there is no unambiguous partial completion, it
displays the alternatives. So a usual sequence is <TAB> -> something
is inserted <TAB> again -> list is displayed, type one character to disambiguate, <TAB> again and so on. I find that I get to my
desired result much quicker and with fewer backtracks than with the
mechanism Xcode uses.
Fortunately, I was able to wrestle NSTextView's
completion mechanism (in ShellView borrowed from
the excellent FSCript) to provide these semantics rather than the
built in ones.
Another cool thing about the autocomplete is that it is very precise,
unlike for example FScript which as far as I can tell just offers all
possible symbols.
How can this be, when Objective-Smalltalk is (currently) dynamically
typed and we all know that good autocomplete requires static types?
The reason is simply that there is one thing that's even better
than having the static types available: having the actual objects
themselves available!
The two REPLs aren't just syntax-aware, they also evaluate the
expression as much as needed and possible to figure out what
a good completion might be. So instead of having to figure
out the type of the object, we can just ask the object what
messages it understands. This was very easy to implement,
almost comically trivial compared to a full blown static type-system.
So while static types are good for this purpose, live objects are
even better! The Self team made a similar discovery when they
were working on their optimizing compiler, trying both static
type inference and dynamic type feedback. Type feedback was
both simpler and performed vastly better and is currently used
even for optimizing statically typed languages such as Java.
Finally, autocomplete also works with Polymorphic Identifiers, for
example file:./a<TAB> will autocomplete files
in the current directory starting with the letter 'a' (and just
fi<TAB> will autocomplete to the file:
scheme). Completion is scheme-specific, so any schemes you add
can provide their own completion logic.
Like all of Objective-Smalltalk, this is still a work in progress:
not all syntactic constructs support completions, for example
Polymorphic Identifiers don't support complex paths and there
is no bracket matching. However, just like Objective-Smalltalk,
what is there is quite useful and often already better what else
is out there in small areas.
Let me explain: even though you might assume that all those objects are actually going to be DataPoint objects, there’s no actual guarantee that they will actual be DataPoint objects at runtime. Casting them only satisfies your hunger for type safety, but nothing else really.
More importantly, it only seems to satisfy your hunger for type safety,
it doesn't actually provide any. It's less nutritious than sugar water in
that respect, not even calories, never mind the protein, fiber, vitamins and
other goodness. More like a pacifier, really, or the product of a
cargo cult.
In my recent post on Cargo Cult Typing, I mentioned a
concept I called the id subset. Briefly, it is the subset of
Objective-C that deals only with object pointers, or id's.
There has been some misunderstanding that I am opposed to types. I am
not, but more on that another time.
One of the many nice properties of the (transitive) id subset is that it
is dynamically (memory) safe, just like Smalltalk. That is, as long as all arguments and return values
of your message are objects, you can never dereference a pointer incorrectly,
the worst that can happen is that you get a "Message not understood" that can
be caught and handled by the object in question or raised as an exception.
The reason this is safe is that objc_msgSend() will make sure that methods
will only ever be invoked on objects of the correct class, no matter what the
(possibly incorrect, or unavailable) static type says.
So no de-referencing an incorrect pointer, no scribbling over random bits
of memory.
In fact, this is the vaunted "pointer safety" that John Siracusa says requires
ditching native compiled languages like Objective-C for VM based languages. The idea
that a VM with an interpreter or a JIT was required for pointer safety
was never true, of course, and it's interesting that both Google and
Microsoft are turning to Ahead of Time (AOT) compilation in their newest
SDKs, for performance reasons.
Did someone mention "performance"? :-)
Another nice aspect of the id subset is that it makes reflective code
a lot simpler. And simplicity usually also translates to speed. How
much speed? Apple's NSInvocation class has to deal with
interpreting C type information at runtime to then construct proper stack
frames dynamically for all possible C types. I think it uses libffi, though
it may be some equivalent library. This is slow, around 340.1ns
per message send on my 13" MBPR. By restricting itself to the id subset,
my own MPWFastInvocation class's dispatch is
much simpler, just a switch invoking objc_msgSend() with
a different number of arguments.
The simplicity of MPWFastInvocation also pays off in
speed: 6.2ns per message-send on the same machine. That's 50 times
faster than NSInvocation and only 2-3x slower than
a normal message send. In fact, once you're that close, things like
IMP-caching (4 ns) start to make sense, especially since they can
be hidden behind a nice interface. Using a C Macro and the IMP
stashed in a public instance var takes the time down to 3 ns, making
the reflective call via an object effectively as fast as the
non-reflective code emitted by the compiler. Which is nice, because
it makes reflective techniques much more feasible for wider varieties
of code, which would be a good thing.
The speed improvement is not because MPWFastInvocation is better
than NSInvocation, it is decidedly not, it is because it is solving
a much, much simpler problem. By sticking to the safe id subset.
I have to admit I am a bit startled to see pople seriously (?) advocate exploitation of "undefined behavior" in the C standard to just eliminate that code altogether, arguing that
undefined means literally anything is OK. I've certainly seen it justified
many times. Apart from being awful, this idea smacks of hubris on part of the compiler writers.
The job of the compiler is to do the best job it can at turning the
programmer's intent into executable machine code, as expressed by
the program. It is not to
show how clever the optimizer writer is, how good at lawyering the language
standard, or to wring out a 0.1% performance
improvement on <benchmark-of-choice>, at least not when it
conflicts with the primary goal.
For let's not pretend that these optimizations are actually useful
or significant: Proebsting's law shows that all compiler optimizations
have been at best 1/10th as effective at improving performance as hardware
advances, and recent research suggests that even that may be optimistic.
That doesn't mean that I don't like my factor 2 or 3 improvement in
code performance for codes where basic optimizations apply. But almost
all of those performance gains come at the lowest levels of optimization,
the more sophisticated stuff just doesn't bring much if any additional
benefit. (There's a reason Apple recommends -Os and not -O3 as default).
So don't get ahead of yourselves, other non-compiler optimizations can often
achieve 2-3 orders of magnitude improvement, and for a lot of
Objective-C code, for example,
the compiler's optimizations barely register at all. Again: perspective!
Furthermore, the purpose of "undefined behavior" was (not sure it still is)
to be inclusive, so for example compilers for machines with slightly odd
architectures could still be called ANSI-C without having to do unnatural
things on that architecture in order to conform to over-specification.
Sometimes, undefined behavior is needed for programs to work.
So when there is integer overflow, for example, that's not a license to
silently perform dead code elimination at certain optimization levels, it's
license to do the natural thing on the platform, which on most platforms
these days is let the integer overflow, because that is what a C programmer
is likely to expect. In addition, feel free to emit a warning. The
same goes for optimizing away an out of bounds array access that is
intended to terminate a loop. If you are smart enough to figure out
the out-of-bounds access, warn about it and then proceed to emit the
code. Eliminating the check and turning a terminating loop into an
infinite loop is never the right answer.
So please don't do this, you're not producing value: those optimizations
will cease to "help" when programmers "fix" their code. You are also
not producing value: any additional gains are extremely modest compared
to the cost. So please stop doing, certainly stop doing it on purpose,
and please carefully evaluate the cost/benefit ratio when introducing optimizations that cause this to happen as a side effect...and then
don't. Or do, and label them appropriately.
This quote from Steve Jobs is one that's been an inspiration to me for some time:
[...] when you first attack a problem it
seems really simple because you
don't understand it. Then when you
start to really understand it, you
come up with these very complicated
solutions because it's really hairy.
Most people stop there. But a few
people keep burning the midnight oil
and finally understand the underlying
principles of the problem and
come up with an elegantly simple
solution for it. But very few people
go the distance to get there.
In other words:
Naive Simplicity
Sophisticated Complexity
Sophisticated Simplicity
It's from the February 1984 Byte Interview introducing the Macintosh.
UPDATE: Well, it seems that Heinelein got there first:
Every technology goes through three stages: first, a crudely simple and quite unsatisfactory gadget; second, an enormously complicated group of gadgets designed to overcome the shortcomings of the original and achieving thereby somewhat satisfactory performance through extremely complex compromise; third, a final stage of smooth simplicity and efficient performance [..]
I like bindings. I also like Key Value Observing. What they do is undeniably cool: you do some initial setup, and presto: magic! You change a value over here, and another
value over there changes as well. Action at a distance. Power.
What they do is also undeniably valuable. I'd venture that nobody actually
likes writing state
maintenance and update code such as the following: when the user clicks this button, or finishes entering
text in that textfield, take the value and put it over here. If the underlying
value changes, update the textfield. If I modify this value, notify
these clients that the value has changed so they can update themselves accordingly.
That's boring. There is no glory in state maintenance code, just potential for
failure when you screw up something this simple.
Finally, their implementation is also undeniably cool: observing an attribute
of a generic
object creates a private subclass for that object (who says we can't do
prototype-based programming in Objective-C?), swizzles the object's
class pointer to that private subclass and then replaces the attribute's
(KVO-compliant) accessor methods with new ones that hook into the
KVO system.
Despite these positives, I have actively removed bindings code from
projects I have worked on, don't use either KVO or bindings myself and
generally recommend staying away from them. Why on earth would I
do that?
Excursion: Constraint Solvers
Before I can answer that question, I have to go back a little and talk about
constraint solvers.
The idea of setting up relationships once and then having the system maintain them
without manually shoveling values back and forth is not exactly new, the first variant
I am aware of was Sketchpad,
Ivan Sutherland's PhD Thesis from 1961/63 (here with narration by Alan Kay):
I still love Ivan's answer to the question as to how he could invent computer graphics,
object orientation and constraint solving in one fell swoop: "I didn't know it was hard".
The first system I am aware of that integrated constraint solving with an object-oriented
programming language was ThingLab, implemented on top of Smalltalk by Alan Borning at Xerox PARC around 1978 (where else...):
While the definition
of a paths is simple, the idea behind it has proved quite powerful and has been essential
in allowing constraint- and object-oriented metaphors to be integrated. [..] The notion
of a path helps strengthen [the distinction between inside and outside of an object] by
providing a protected way for an object to provide external reference to its parts and
subparts.
Yes, that's a better version of KVC. From 1981.
Alan Borning's group at the University of Washington continued working on constraint solvers
for many years, with the final result being the Cassowary linear constraint solver (based on the simplex
algorithm) that was picked up by Apple for Autolayout. The papers on Cassowary and
constraint hierarchies should help with understanding why Autolayout does what it does.
A simpler form of constraints are one-way dataflow constraints.
A one-way, dataflow constraint is an equation of the form y = f(x1,...,xn) in which the formula on the right side
is automatically re-evaluated and assigned to the variable y whenever any variable xi.
If y is modified from
outside the constraint, the equation is left temporarily unsatisfied, hence the attribute “one-way”. Dataflow constraints are recognized as a powerful programming methodology in a variety of contexts because of their versatility and simplicity. The most widespread application of dataflow constraints is perhaps embodied by spreadsheets.
The most important lessons they found were the following:
constraints should be allowed to contain arbitrary code that is written in the underlying toolkit language and does not require any annotations, such as parameter declarations
constraints are difficult to debug and better debugging tools are needed
programmers will readily use one-way constraints to specify the graphical layout of an application, but must be carefully and time-consumingly trained to use them for other purposes.
However, these really are just the headlines, and particularly for Cocoa programmers
the actual reports are well worth reading as they contain many useful pieces of
information that aren't included in the summaries.
Back to KVO and Cocoa Bindings
So what does this history lesson about constraint programming have to do with KVO
and Bindings? You probably already figured it out: bindings are one-way
dataflow constraints, specifically with the equation limited to y = x1.
more complex equations can be obtained by using NSValueTransformers. KVO
is more of an implicit invocation
mechanism that is used primarily to build ad-hoc dataflow constraints.
The specific problems of the API and the implementation have been documented
elsewhere, for example by Soroush Khanlou and Mike Ash, who not only suggested and
implemented improvements back in 2008, but even followed up on them in 2012. All
these problems and workarounds
demonstrate that KVO and Bindings are very sophisticated, complex and error prone
technologies for solving what is a simple and straightforward task: keeping
data in sync.
To these implementation problems, I would add performance: even
just adding the willChangeValueForKey: and didChangeValueForKey:
message sends in your setter (these are usually added automagically for you) without triggering any notifications makes that setter 30 times slower (from 5ns to
150ns on my computer) than a simple setter that just sets and retains the object.
Actually having that access trigger a notification takes the penalty to a factor of over 100
( 5ns vs over 540ns), even when there is only a single observer. I am pretty sure
it gets worse when there are lots of observers (there used to be an O(n^3)
algorithm in there, that was fortunately fixed a while ago). While 500ns may
not seem a lot when dealing with UI code, KVO tends to be implemented at
the model layer in such a way that a significant number of model data accesses
incur at least the base penalties. For example KVO notifications were one of the primary
reasons for NSOperationQueue's somewhat anemic performance back when
we measured it for the Leopard release.
Not only is the constraint graph not available at run time, there is also no
direct representation at coding time. All there is either code or IB settings
that construct such a graph indirectly, so the programmer has to infer the
graph from what is there and keep it in her head. There are also no formulae, the best
we can do are ValueTransformers and
keyPathsForValuesAffectingValueForKey.
As best as I can tell, the reason for this state of affairs is that there simply
wasn't any awareness of the decades of
research and practical experience with constraint solvers at the time (How
do I know? I asked, the answer was "Huh?").
Anyway, when you add it all up, my conclusion is that while I would really,
really, really like a good constraint solving system (at least for spreadsheet
constraints), KVO and Bindings are not it. They are too simplistic, too
fragile and solve too little of the actual problem to be worth the trouble.
It is easier to just write that damn state maintenance code, and infinitely
easier to debug it.
I think one of the main communication problems between advocates for and
critics of KVO/Bindings is that the advocates are advocating more for
the concept of constraint solving, whereas critics are critical of the
implementation. How can these critics not see that despite a few flaws,
this approach is obviously
The Right Thing™? How can the advocates not see the
obvious flaws?
Functional Reactive Programming
As far as I can tell, Functional Reactive Programming (FRP) in general and Reactive
Cocoa in particular are another way of scratching the same itch.
[..] is an integration of declarative [..] and imperative object-oriented programming. The primary goal of this integration is to use constraints to express relations among objects explicitly -- relations that were implicit in the code in previous languages.
Sounds like FRP, right? Well, the first "[..]" part is actually "Constraint Imperative Programming" and the second is "constraints",
from the abstract of a 1994 paper. Similarly, I've seen it stated that FRP is like a spreadsheet.
The connection between functional programming and constraint programming is also well
known and documented in the literature, for example the experience report above states the
following:
Since constraints are simply functional programming dressed up with syntactic sugar, it should not be surprising that 1) programmers do not think of using constraints for most programming tasks and, 2) programmers require extensive training to overcome their procedural instincts so that they will use constraints.
However, you wouldn't be able to tell that there's a relationship there from reading
the FRP literature, which focuses exclusively on the connection to functional
programming via functional reactive animations and Microsoft's Rx extensions. Explaining and particularly motivating FRP this way has the
fundamental problem that whereas functional programming, which is per definition
static/timeless/non-reactive, really needs something to become interactive,
reactivity is already inherent in OO. In fact, reactivity is the quintessence of
objects: all computation is modeled as objects reacting to messages.
So adding reactivity to an object-oriented language is, at first blush, non-sensical
and certainly causesconfusion when explained this way.
I was certainly confused, because until I found this one
paper on reactive imperative programming,
which adds constraints to C++ in a very cool and general way,
none of the documentation, references or papers made the connection that seemed so
blindingly obvious to me. I was starting to question my own sanity.
Architecture
Additionally, one-way dataflow constraints creating relationships between program variables
can, as far as I can tell, always be replaced by a formulation where the dependent
variable is simply replaced by a method that computes the value on-demand. So
instead of setting up a constraint between point1.x and point2.x,
you implement point2.x as a method that uses point1.x to
compute its value and never stores that value. Although this may evaluate more
often than necessary rather than memoizing the value and computing just once, the
additional cost of managing constraint evaluation is such that the two probably
balance.
However, such an implementation creates permanent coupling and requires dedicated
classes for each relationship. Constraints thus become more of an architectural
feature, allowing existing, usually stateful components to be used together without
having to adapt each component for each individual ensemble it is a part of.
Panta Rhei
Everything flows, so they say. As far as I can tell, two different
communities, the F(R)P people and the OO people came up with very similar
solutions based on data flow. The FP people wanted to become more reactive/interactive,
and achieved this by modeling time as sequence numbers in streams of values, sort
of like Lucid or other dataflow languages.
The OO people wanted to be able to specify relationships declaratively and have
their system figure out the best way to satisfy those constraints, with
a large and useful subset of those constraints falling into the category of
the one-way dataflow constraints that, at least to my eye, are equivalent
to FRP. In fact, this sort of state maintenance and update-propagation
pops up in lots of different places, for example makefiles or other
build systems, web-server generators, publication workflows etc. ("this
OmniGraffle diagram embedded as a PDF into this LaTeX document that
in turn becomes a PDF document" -> the final PDF should update
automatically when I change the diagram, instead of me having to
save the diagram, export it to PDF and then re-run LaTeX).
What's kind of funny is that these two groups seem to have converged
in essentially the same space, but they seem to not be aware of
each other, maybe they are phase-shifted with respect to each other?
Part of that phase-shift is, again, communication. The FP guys
couch everything in must destroy all humans er state rethoric,
which doesn't do much to convince OO guys who know that for most
of their programs, state isn't an implementation detail but fundamental
to their applications. Also practical experience does not support the
idea that the FP approach is obvious:
Unfortunately, given the considerable amount of time required to train students to use constraints in a non-graphical manner, it does not seem reasonable to expect that constraints will ever be widely used for purposes other than graphical layout. In retrospect this result should not have been surprising. Business people readily use constraints in spreadsheets because constraints match their mental model of the world. Similarly, we have found that students readily use constraints for graphical layout since constraints match their mental model of the world, both because they use constraints, such as left align or center, to align objects in drawing editors, and because they use constraints to specify the layout of objects in precision paper sketches, such as blueprints. However, in their everyday lives, students are much more accustomed to accomplishing tasks using an imperative set of actions rather than using a declarative set of actions.
Of course there are other groups hanging out in this convergence zone, for example the
Unix folk with their pipes and filters. That is also not too surprising if
you look at the history:
So, we were all ready. Because it was so easy to compose processes with shell scripts. We were already doing that. But, when you have to decorate or invent the name of intermediate files and every function has to say put your file there. And the next one say get your input from there. The clarity of composition of function, which you perceived in your mind when you wrote the program, is lost in the program. Whereas the piping symbol keeps it. It's the old thing about notations are important.
I think the familiarity with Unix pipes also increases the itch: why can't I have
that sort of thing in my general purpose programming language? Especially when
it can lead to very concise programs, such as the Quartz-like graphics subsystem
Gezira written in
under 400 lines of code using the Nile dataflow language.
Moving Forward
I too have heard the siren sing.
I also think that a more spreadsheet-like programming model would not just make my life
as a developer easier, it might also make software more approachable for end-user adaptation and tinkering,
contributing to a more meaningful version of open source. But how do we get there?
Apart from a reasonable implementation and better debuggingsupport, a new system would need much tighter
language integration. Preferably there would be a direct syntax for expressing constraints
such as that available in constraint imperative programming languages or constraint extensions to existing
languages like
Ruby or JavaScript.
This language support should be unified as much as
possible between different constraint systems, not one mechanism for Autolayout and a
completely different one for Bindings.
Supporting constraint programming has always been one of the goals of my Objective-Smalltalk project, and so far that has informed the
PolymorphicIdentifiers that support a uniform interface for data backed by different types of
stores, including one or more constraint stores supporting cooperating solvers, filesystems or web-sites. More needs
to be done, such as extending the data-flow connector hierarchy to conceptually integrate
constraints. The idea is to create a language that does not actually include constraints
in its core, but rather provides sufficient conceptual, expressive and implementation
flexibility to allow users to add such a facility in a non-ad-hoc way so that it is
fully integrated into the language once added. I am not there yet, but all the results
so far are very promising. The architectural focus of Objective-Smalltalk also ties
in well with the architectural interpretation of constraints.
There is a lot to do, but on the other hand I think the payback is huge, and there is
also a large body of existing theoretical,
practical and empirical groundwork to fall back on, so I think the task is doable.
Your feedback, help and pull requests would be very much appreciated!
After thinking about the id subset and being pointed to WebScript, Brent Simmons imagines a scripting language. I have to admit I have been imagining pretty much the same language...and at some
time decided to stop imagining and start building Objective-Smalltalk:
Peer of Objective-C: objects are Objective-C objects, methods are Objective-C methods,
added to the runtime and indistinguishable from the outside.
"You can subclass UIViewController, or write a category on it."
The example is from the site, it was copied
from an actual program. As you can see, interoperability with the C parts of
Objective-C is still necessary, but not bothersome.
This example was also copied from an actual small educational game that was
ported over from Flash.
You also get Higher Order Messaging, Polymorpic Identifiers etc.
Works with the toolchain: this is a a little more tricky, but I've made
some progress...part of that is an llvm based native compiler, part is
tooling that enables some level of integration with Xcode, part is
a separate toolset that has comparable or better capabilities.
While Objective-Smalltalk would not require shipping source code with your applications,
due to the native compiler, it would certainly allow it, and in fact my own
BookLightning imposition program
has been shipping with part of its Objective-Smalltalk source hidden its Resources
folder for about a decade or so. Go ahead, download it, crack it open and have
a look! I'll wait here while you do.
Did you have a look?
The part that is in Smalltalk is the distilled (but very simple) imposition algorithm
shown here.
What this means is that any user of BookLightning could adapt it to suit their needs,
though I am pretty sure that none have done so to this date. This is partly due to
the fact that this imposition algorithm is too limited to allow for much variation,
and partly due to the fact that the feature is well hidden and completely unexpected.
There are two ideas behind this:
Open Source should be more about being able to tinker with well-made
apps in useful ways, rather than downloading and compiling gargantuan and
incomprehensible tarballs of C/C++ code.
There is no hard distinction between programming and scripting. A
higher level scripting/programming language would not just make developer's
jobs easier, it could also enable the sort of tinkering and adaptation that
Open Source should be about.
I don't think the code samples shown above are quite at the level needed to really
enable tinkering, but maybe they can be a useful contribution to the discussion.
The feedback was, effectively: "This code is incorrect, it is missing a return type". Of course, the code isn't incorrect in the least bit, the return type is id, because that is the default type, and in fact, you will see this style in both Brad Cox's book:
and the early NeXTStep documentation:
Having a default type for objects isn't entirely surprising, because at that time id was not just the default type, it was the only type available for objects, the optional static typing for objects wasn't introduced into Objective-C until later. In addition the template for Objective-C's object system was Smalltalk, which doesn't use static types, you just use variable names.
Cargo-cult typing
So while it is possible (and apparently common) to write -(id)objectAtIndex:(NSUInteger)anIndex, it certainly isn't any more correct. In fact, it's
worse, because it is just syntactic noise [1][2], although it is arguably even worse than what Fowler describes because it isn't actually mandated by
the language, the noise is inflicted needlessly.
And while we could debate as to whether it is better or not to write things that are redundant
syntactic noise, we could also not, as that was settled almost 800 years ago: entia non sunt multiplicanda praeter necessitatem. You could also say KISS or "when in doubt, leave it out", all of which just
say the the burden of proof is on whoever wants to add the redundant pieces.
What's really odd about this phenomenon is that we really don't gain anything from typing
out these explicit types, the code certainly doesn't become more readable. It's as if
we think that by following the ritual of explicitly typing out a type, we made the
proper sacrifice to the gods of type-safety and they will reward us with correctness.
But just like those Pacific islanders that built wooden planes, radios and control
towers, the ritual is empty, because it conveys no information to the type system,
or the reader.
The id subset
Now, I personally don't really care whether you put in a redundant (id)
or not, I certainly have been reading over it (and not even really noticing) for
my last two decades of Objective-C coding. However, the mistaken belief that it
has to be there, rather than this is a personal choice you make, does worry me.
I think the problem goes a little deeper than just slightly odd coding styles, because it seems to be part and parcel of a drive towards making Objective-C look like an explicitly statically typed language along the lines of C++ or maybe Java,
with one of the types being id. That's not the case: Objective-C
is an optionally statically typed language. This means that you
may specify type information if you want to, but you generally
don't have to. I also want the emphasize that you can at best get Objective-C
to look like such a language, the holes in the type system are way too big for
this to actually gain much safety.
Properties started this trend, and now the ARC variant of the language turns what used to be warnings about unknown selectors needlessly into hard compiler errors.
Of course, there are some who plausibly argue that this always should have been an error,
or actually, that it always was an error, we just didn't know about it.
That's hogwash, of course. There is a subset of the language, which I'd like
to call the id subset, where all the arguments and returns are object
pointers, and for this it was always safe to not have additional type information,
to the point where the compiler didn't actually have that additional type information.
You could also call it the Smalltalk subset.
Another thing that's odd about this move to rigidify Objective-C in the face of
success of more dynamic languages is that we actually have been moving into the
right direction at the language base-level (disregarding the type-system): in general programming style, with new syntax support
for object literals and subscripting, SmallInteger style NSNumbers modern
Objective-C consists much more of pure objects than was traditionally the case.
And as long as we are dealing with pure objects, we are in the id subset.
A dynamic language
What's great about the id subset is that it makes incremental, explorative
programming very easy and lots of fun, much like other dynamic languages
such as Smalltalk, Python or Ruby.
(Not entirely like them, due to the need to compile to native code, but compilers are fast these
days and there are possible fixes such as Objective-Smalltalk.)
The newly enforced rigidity is starting to make explorative programming in Objective-C much
harder, and a lot less fun. In fact, it feels much more like C++ or Java and much less
like the dynamic language that it is, and in my opinion is the wrong direction: we should
be making our language more dynamic, and of course that's what I've been doing. So while I wouldn't agree with that tradeoff even if
it were true, the fact is that we aren't actually
getting static type safety, we are just getting a wood prop that will not fly.
Discussion on Hacker News.
UPDATE: Inserted a little clarification that I don't care about bike-shedding your code
with regard to (id). The problem is that people's mistaken belief both that and why it has to be there is symptomatic of that deeper trend I wrote about.
Just had a case of codesign telling me my app was fine, just for the same app to be rejected by GateKeeper. The spctl tool fortunately was more truthful, but didn't really say where the problem was.
A little sleuthing determined that although I had signed all my frameworks with the Developer ID, two auxiliary executables were signed with my development certificate.
Lesson learned: don't trust codesign, use spctl to verify your binaries.
Actually: no it isn't, Transact-SQL got the honors. Apart from the obvious question, "Transact-Who?", it really should have been Objetive-C, because Tiobe readjusted the index mid-year in a way that resulted in a drop of 0.5% for the popular languages, which is fine, but without readjusting the historical data! Which is...not...especially if you make judgements based on relative performance.
In this case, Transact-SQL beat Objective-C by 0.17%, far less than the roughly 0.5% drop suffered by Objective-C mid-year. So Objective-C would have easily done the
hat-trick, but I guess Tiobe didn't want that and rigged the game to make sure
it doesn't happen.
Not that it matters...
UPDATE: I contacted Tiobe and they confirmed, both the lack of rebaselining and that Objective-C would likely have won an unprecedented third time in a row.