I've been working much more on
Objective-Smalltalk again, which is great,
but that also made me think more intently on the motivations, because a lot of what is there is probably
not quite obvious without knowing why and how it got there.
One of the obvious motivations has been Objective-C, and in particular the notion of Software-ICs,
components that are connected via dynamic messages. This is actually an instance of the
Scripted Components (pdf) pattern,
with the interesting twist that unlike most of the instances of this pattern, there is only a single programming
language for both the scripts and the components, that language being Objective-C.
In a sense, Objective-C is not really a language, it is middleware with some language features. And therein lies
much confusion, because people try to use it as "just" a language. In particular, they try
to use the "Objective" parts as an algorithmic programming language, something it is eminently unsuited
for. This is what the "C" part is for, which is somewhere around 90% of the actual language.
To back off for a bit, a Scripted Component system is one where you have components that are written in
one language, often C or C++, that are then glued together using a more dynamic scripting language,
for example Unix shell, Visual Basic. One of the more extreme examples of this pattern is computational
steering of supercomputer applications using Tcl(!). Which brings up the point of performance: one of
the interesting aspects of the Scripted Components pattern is that it combines the excellent performance
of the component language with the flexibility and interactivity of the scripting language.
Which brings me back to Objective-C, almost. No one at the supercomputer centers that used Tcl for steering
would have had the idea of implementing their actual computation in Tcl. No, implementing the
components is what you use C or C++ or maybe FORTRAN for. You then combine, parametrise and maybe
invoke those components from the scripting language.
Yet when it comes to Objective-C, that is exactly what people do: they implement the algorithmic
parts in the scripting/middleware part of Objective-C and then wonder why it comes out looking
like the following (taken from Vincent's post):
NSArray<NSNumber *> *sieve(NSArray<NSNumber *> *sorted) {
if ([sorted count] == 0) { return [NSArray new]; }
NSUInteger head = sorted[0].unsignedIntegerValue;
NSArray *tail = [sorted subarrayWithRange:NSMakeRange(1, sorted.count - 1)];
NSIndexSet *indexes = [tail indexesOfObjectsPassingTest:^BOOL(NSNumber *element, NSUInteger idx, BOOL *stop) {
return element.unsignedIntegerValue % head > 0;
}];
NSArray *sieved = [tail objectsAtIndexes:indexes];
return [@[@(head)] arrayByAddingObjectsFromArray:sieve(sieved)];
}
NSMutableArray *numbers = [NSMutableArray new];
for (NSUInteger i = 2; i <= 100; i++) {
[numbers addObject: @(i)];
}
NSArray *primes = sieve(numbers);
NSLog(@"%@ -> %@", numbers, primes);
Yes, this code doesn't make sense. Or it makes about as much sense as implementing your computational fluid dynamics
simulation in Tcl and then scripting it using C. And yes, the Swift code definitely looks a bit nicer:
func sieve(_ sorted: [Int]) -> [Int] {
guard !sorted.isEmpty else { return [] }
let (head, tail) = (sorted[0], sorted[1..&now lt;sorted.count])
return [head] + sieve(tail.filter { $0 % head > 0 })
}
let numbers = Array(2...100)
let primes = sieve(numbers)
print("\(numbers) -> \(primes)")
The first thing I noticed was that I couldn't even tell what this was doing, it took me a bit of close reading of the
code to realise that my confusion was due to the fact that although this does compute primes, it isn't actually
the Sieve of Eratosthenes!
The hallmark of the Sieve is that it computes primes while avoiding expensive division (and multiplication by anything other than 2, a simple shift left) operations. It does this
by allocating an array with the candidate numbers (it starts from 2, because all numbers are multiples of 1) and
then eliminating any known non-primes (multiples of some other number) by stepping through the array at a certain
step-size. To eliminate all multiples of 2, step through the array with a step size of 2. To eliminate all multiples
of 3, step through the array with a step size of 3, and so on. This was great on small microcomputers that did
not have multiplication or division operations in hardware.
The algorithm presented here just checks all remaining numbers if they are divisible by the current number using
a modulo operation.
But the fact that this is the wrong algorithm wasn't really what caught my eye, rather that was the
unidiomatic and to me, quite frankly, rather bizarre way of using Objective-C: this example uses
Objective-C Foundation objects to implement algorithms on integers.
It is not surprising that this code looks weird, because it is. Objective-C is a hybrid language
intended for the construction and connection of Software-ICs. You implement the algorithms inside
the Software-IC using C and then package it up (and connect) with the dynamic messaging provided
by the "Objective" part.
So let's do that! First, we grab a C implementation of the (real) Sieve of Eratosthenes algorithm, for
example from the first Duck Go Go search result here:
#include <stdio.h>
#define LIMIT 1500000 /*size of integers array*/
#define PRIMES 100000 /*size of primes array*/
int main(){
int i,j,numbers[LIMIT];
int primes[PRIMES];
/*fill the array with natural numbers*/
for (i=0;i<limit;i++){
numbers[i]=i+2;
}
/*sieve the non-primes*/
for (i=0;i<limit;i++){
if (numbers[i]!=-1){
for (j=2*numbers[i]-2;j<limit;j+=numbers[i])
numbers[j]=-1;
}
}
/*transfer the primes to their own array*/
j = 0;
for (i=0;i<limit&&j<primes;i++)
if (numbers[i]!=-1)
primes[j++] = numbers[i];
/*print*/
for (i=0;i<primes;i++)
printf("%dn",primes[i]);
return 0;
}
This somewhat longer than the previous examples, but it does have the distinct advantage of actually being a Sieve of Eratosthenes :-).
The way it works is that it first intialises an array of size N with the integers from 1 to N. The Sieve then knocks out multiples
of n for n in 1 to N. The "knocking out" is performed by setting the array element to -1. Finally, all the non-negative numbers
are retrieved from the array. These are the primes, which are then printed.
Sine Objective-C is a strict superset of C, we could simply use this code as-is. However, it does lack some conveniences, for example
the array sizes are fixed, the code isn't reusable and for example printing is very manual. So using the idea of the Software-IC,
we notice that an array of integers is the central data structure here. Apple's Foundation doesn't provide such a beast, but that
isn't a big deal, we can easily write one ourselves, or more precisely reuse the one provided in MPWFoundation.
As this is going to be running all over the array, we probably want to break open the encapsulation boundary of the Software-IC and
implement the core of the algorithm as an extension, er, category of MPWIntArray.
Another slight improvement is breaking the Sieve algorithm up into smaller, named pieces that tell you what is going on, and
using conveniences such as -select:
to filter the results. Printing of the contents is also something that
is provided.
@import MPWFoundation;
@implementation MPWIntArray(primes)
-(void)knockoutMultiplesOf:(int)n
{
int *numbers=data;
if (n > 0){
for (int j=2*n-2,max=[self count];j<max;j+=n) {
numbers[j]=-1;
}
}
}
-(void)sievePrimes
{
for (int i=0,max=[self count];i<max;i++){
[self knockoutMultiplesOf:data[i]];
}
}
@end
int main(int argc, char *argv[])
{
int n=argc > 1 ? atoi(argv[1]) : 100000;
MPWIntArray *numbers=[[@(2) to:@(n-2)] asArray];
[numbers sievePrimes];
MPWIntArray *primes = [numbers select:^(int potentialPrime) {
return (BOOL)(potentialPrime > 0);
}];
printf("primes: %s\n",[[primes description] UTF8String]);
printf("number of primes: %d\n",[primes count]);
}
One of the benefits of the Software-IC or Scripted Components style is performance, otherwise you wouldn't want to use it
in supercomputer settings. In this particular case I ran the Swift version against the Software-IC
version (I didn't bother with the original Objective-C version). The performance difference is so remarkable that it was actually
difficult to find values for the N parameter that gave at least somewhat reasonable/comparable results for both variants:
N
|
Swift Sieve
|
Software-IC Sieve
|
Ratio
|
100000
|
0,501
|
0,006
|
83,5
|
200000
|
1,78
|
0,007
|
254,3
|
300000
|
3,78
|
0,008
|
472,5
|
400000
|
6,43
|
0,009
|
714,4
|
500000
|
9,74
|
0,010
|
974,0
|
I couldn't go much higher than N=500,000 with the Swift version because at N=500,000 it was already using around 8GB of memory and rising
quickly. At
that point you start to see paging activity on a reasonably busy machine, particularly one with Xcode open and I didn't want to
get my machine "benchmark-clean". At those same N
values, the Software-IC version isn't really getting started yet, the execution time actually being dominated by startup costs and random fluctuations. In fact, launching it with N=1 also takes 6-8ms the same as for N=500,000.
So yes, using Swift to do computations is certainly a better choice than abusing the middleware part of Objective-C to do those
computations. On the other hand, used properly for constructing a Software-IC and then providing an interface to it, Objective-C
does quite well. Which doesn't mean we couldn't do better for both parts: yes, somewhat better algorithmics but more importantly
better middleware and less
muddle about which is which. It shouldn't come as a complete shock that that is the (rough) idea behind Objective-Smalltalk,
but more on that later.
UPDATE:
My little post triggered a small twitter discussion that was really much better than the post itself. Although my point was
really mostly about the scripted components pattern and how it relates to Objective-C and its Software-ICs, I focused
a bit too much on the performance aspect (occupational hazard). It also comes off a bit too much as "Swift vs. Objective-C",
so to make that clear: the Objective-C code initially presented is probably slowest, and I am reasonably certain that
you can write a C-like Sieve using Swift with roughly comparable performance.
For me, the bigger point was that this code was a good illustration of the misunderstanding and misapplication of
Objective-C. I am also decidedly not saying that this misapplication is due ignorance of the people doing
the misapplication, I think it has to do with the fact that the tension between scripted coomponents and unified
language is something we haven't figured out yet.
Anyway, Helge managed to put the point very succinctly:
Also insightful as alway, Joe Groff:
And a little bit of "hey, I didn't know that":
And last not least: apologies again to Vincent for picking on him, just a little bit :-)