Thursday, January 10, 2019

Objective-Smalltalk on Linux via GNUstep and Docker

Although Objective-Smalltalk's architectural orientation should be an ideal fit for The Cloud™, this has been largely theoretical in recent times as Objective-Smalltalk only ran on macOS and iOS.

Having used both GNUstep and Cocotron to port, for example, MusselWind to Linux in the past, I knew this wasn't an insurmountable probl...er opportunity. However, dealing with VMs and OSes, and configurations getting everything to compile was always painful, and usually not reproducible.

Enter Docker for Mac. I had obviously heard quite a bit about Docker, but so far never had an opportunity to actually try it out in anger. Well, maybe not "anger", more "mild chagrin", because Docker is not really meant for running Linux binaries on your Mac. However, it does that job really, really well. Instead of futzing around with VMs, dealing with special guest OS tools in order to get window sizes to match, Retina not working, displays not resizing etc. you just get a login onto a Linux "box" right in your Terminal window, no muss no fuss. Nice. And mounting directories from the Mac is also trivial. Niiiice.

The other part that makes this different is the whole idea of reproducibility via Dockerfiles. Because a container is at least conceptually ephemeral (practically you can keep it around for quite a bit), you don't capture your knowledge in the state of the VM you are working on. Instead, you accumulate the knowledge in the Dockerfile(s) used to construct your image.

The Dockerfile for creating a GNUstep image/container that is capable of compiling + running Objective-Smalltalk can be found here:

https://github.com/mpw/MPWFoundation/tree/master/GNUstep/gnustep-combined
The main Dockerfile loads first loads a bunch of packages via apt-get:
FROM ubuntu

ENV LC_ALL en_US.UTF-8
ENV LANG en_US.UTF-8
ENV TZ 'Europe/Berlin'
RUN ln -snf /usr/share/zoneinfo/$TZ /etc/localtime && echo $TZ > /etc/timezone


RUN apt-get update && apt-get install -y \
    git \
    make \
    ssh \
    sudo \
    curl \
    inetutils-ping \
    vim build-essential  \
    clang llvm libblocksruntime-dev libkqueue-dev libpthread-workqueue-dev libxml2-dev cmake \
    libffi-dev \
    libreadline6-dev \
    libedit-dev \
    libmicrohttpd-dev \
    gnutls-dev 


RUN useradd -ms /bin/bash gnustep

COPY install-gnustep-clang /home/gnustep/
RUN chmod u+x /home/gnustep/install-gnustep-clang
RUN /home/gnustep/install-gnustep-clang  

COPY bashrc /home/gnustep/.bashrc
COPY profile /home/gnustep/.profile
COPY bashrc /root/.bashrc
COPY profile /root/.profile
COPY bashrc /.bashrc
COPY profile /.profile

COPY build-gnustep-clang /home/gnustep/build-gnustep-clang
RUN  mkdir -p /home/gnustep/patches/libobjc2-1.8.1/
COPY patches/libobjc2-1.8.1/objcxx_eh.cc /home/gnustep/patches/libobjc2-1.8.1/objcxx_eh.cc

RUN chmod u+x /home/gnustep/build-gnustep-clang
RUN /home/gnustep/build-gnustep-clang  


COPY build-gnustep-clang /home/gnustep/build-gnustep-clang
RUN  mkdir -p /home/gnustep/patches/libobjc2-1.8.1/
COPY patches/libobjc2-1.8.1/objcxx_eh.cc /home/gnustep/patches/libobjc2-1.8.1/objcxx_eh.cc

RUN chmod u+x /home/gnustep/build-gnustep-clang
RUN /home/gnustep/build-gnustep-clang  

CMD ["bash"]

It adds a user "gnustep", then runs a script that will downloads specific version of the gnustep and libobjc2 sources using a script, patches one of those sources which wouldn't compile for me and finally builds and installs the whole thing using another script, both adapted from Tobias Lensing's post.


#!/bin/bash
#


cd /home/gnustep/

echo Installing gnustep-make
export CC=clang
echo compiler is $CC

tar zxf gnustep-make-2.7.0.tar.gz
cd gnustep-make-2.7.0
./configure
make install
cd ..

echo 
echo 
echo ======================
echo Installing libobjc2


tar zxf libobjc2-1.8.1.tar.gz
cp patches/libobjc2-1.8.1/* libobjc2-1.8.1/
cd libobjc2-1.8.1
mkdir Build
cd Build
# cmake -DCMAKE_C_COMPILER=clang -DCMAKE_CXX_COMPILER=clang -DTESTS=OFF -DLLVM_OPTS=OFF  ..
cmake -DTESTS=OFF .. 
make install
cd ../..


cd gnustep-make-2.7.0
make clean
./configure --with-library-combo=ng-gnu-gnu
make install
cd ..

source /usr/local/share/GNUstep/Makefiles/GNUstep.sh
tar zxf gnustep-base-1.25.1.tar.gz 
cd gnustep-base-1.25.1
./configure
make installn
cd ..

echo Installation script finished successfully

One of the tricky bits about this is that you need to install the gnustep-make package twice, first to get libobjc2 to install, the second time with libobjc2 installed so all the other packages find the right runtime. The apt-get packages did not work for me, because I needed newer compiler/runtime features.

Once you have the image, you can start it up and log into it. I then su -l to the gnustep account and work with the MPWFoundation, ObjectiveSmalltalk and ObjectiveHTTPD mounted from my local filesystem, rather than having separate copies inside the container.

Not everything is ported, but basic expressions work, as does messaging back and forth between Objective-Smalltalk and Objective-C, Higher Order Messaging, etc. Considering some of the runtime trickery involved, I was surprised how straightforward it was to get this far.

In the future, I expect to also build a runtime image that has the libraries and executable pre-built.

Friday, January 4, 2019

Swifter Sieving of Primes with Software ICs

I've been working much more on Objective-Smalltalk again, which is great, but that also made me think more intently on the motivations, because a lot of what is there is probably not quite obvious without knowing why and how it got there.

One of the obvious motivations has been Objective-C, and in particular the notion of Software-ICs, components that are connected via dynamic messages. This is actually an instance of the Scripted Components (pdf) pattern, with the interesting twist that unlike most of the instances of this pattern, there is only a single programming language for both the scripts and the components, that language being Objective-C.

In a sense, Objective-C is not really a language, it is middleware with some language features. And therein lies much confusion, because people try to use it as "just" a language. In particular, they try to use the "Objective" parts as an algorithmic programming language, something it is eminently unsuited for. This is what the "C" part is for, which is somewhere around 90% of the actual language.

To back off for a bit, a Scripted Component system is one where you have components that are written in one language, often C or C++, that are then glued together using a more dynamic scripting language, for example Unix shell, Visual Basic. One of the more extreme examples of this pattern is computational steering of supercomputer applications using Tcl(!). Which brings up the point of performance: one of the interesting aspects of the Scripted Components pattern is that it combines the excellent performance of the component language with the flexibility and interactivity of the scripting language.

Which brings me back to Objective-C, almost. No one at the supercomputer centers that used Tcl for steering would have had the idea of implementing their actual computation in Tcl. No, implementing the components is what you use C or C++ or maybe FORTRAN for. You then combine, parametrise and maybe invoke those components from the scripting language.

Yet when it comes to Objective-C, that is exactly what people do: they implement the algorithmic parts in the scripting/middleware part of Objective-C and then wonder why it comes out looking like the following (taken from Vincent's post):


NSArray<NSNumber *> *sieve(NSArray<NSNumber *> *sorted) {
  if ([sorted count] == 0) { return [NSArray new]; }
  NSUInteger head = sorted[0].unsignedIntegerValue;
  NSArray *tail = [sorted subarrayWithRange:NSMakeRange(1, sorted.count - 1)];
  NSIndexSet *indexes = [tail indexesOfObjectsPassingTest:^BOOL(NSNumber *element, NSUInteger idx, BOOL *stop) {
    return element.unsignedIntegerValue % head > 0;
  }];
  NSArray *sieved = [tail objectsAtIndexes:indexes];
  return [@[@(head)] arrayByAddingObjectsFromArray:sieve(sieved)];
}

NSMutableArray *numbers = [NSMutableArray new];
for (NSUInteger i = 2; i <= 100; i++) {
  [numbers addObject: @(i)];
}
NSArray *primes = sieve(numbers);
NSLog(@"%@ -> %@", numbers, primes);

Yes, this code doesn't make sense. Or it makes about as much sense as implementing your computational fluid dynamics simulation in Tcl and then scripting it using C. And yes, the Swift code definitely looks a bit nicer:


func sieve(_ sorted: [Int]) -> [Int] {
	guard !sorted.isEmpty else { return [] }
	let (head, tail) = (sorted[0], sorted[1..&now lt;sorted.count])
	return [head] + sieve(tail.filter { $0 % head > 0 })
}

let numbers = Array(2...100)
let primes = sieve(numbers)
print("\(numbers) -> \(primes)")

The first thing I noticed was that I couldn't even tell what this was doing, it took me a bit of close reading of the code to realise that my confusion was due to the fact that although this does compute primes, it isn't actually the Sieve of Eratosthenes!

The hallmark of the Sieve is that it computes primes while avoiding expensive division (and multiplication by anything other than 2, a simple shift left) operations. It does this by allocating an array with the candidate numbers (it starts from 2, because all numbers are multiples of 1) and then eliminating any known non-primes (multiples of some other number) by stepping through the array at a certain step-size. To eliminate all multiples of 2, step through the array with a step size of 2. To eliminate all multiples of 3, step through the array with a step size of 3, and so on. This was great on small microcomputers that did not have multiplication or division operations in hardware.

The algorithm presented here just checks all remaining numbers if they are divisible by the current number using a modulo operation.

But the fact that this is the wrong algorithm wasn't really what caught my eye, rather that was the unidiomatic and to me, quite frankly, rather bizarre way of using Objective-C: this example uses Objective-C Foundation objects to implement algorithms on integers.

It is not surprising that this code looks weird, because it is. Objective-C is a hybrid language intended for the construction and connection of Software-ICs. You implement the algorithms inside the Software-IC using C and then package it up (and connect) with the dynamic messaging provided by the "Objective" part.

So let's do that! First, we grab a C implementation of the (real) Sieve of Eratosthenes algorithm, for example from the first Duck Go Go search result here:


#include <stdio.h>

#define LIMIT 1500000 /*size of integers array*/
#define PRIMES 100000 /*size of primes array*/

int main(){
    int i,j,numbers[LIMIT];
    int primes[PRIMES];

    /*fill the array with natural numbers*/
    for (i=0;i<limit;i++){
        numbers[i]=i+2;
    }

    /*sieve the non-primes*/
    for (i=0;i<limit;i++){
        if (numbers[i]!=-1){
            for (j=2*numbers[i]-2;j<limit;j+=numbers[i])
                numbers[j]=-1;
        }
    }

    /*transfer the primes to their own array*/
    j = 0;
    for (i=0;i<limit&&j<primes;i++)
        if (numbers[i]!=-1)
            primes[j++] = numbers[i];

    /*print*/
    for (i=0;i<primes;i++)
        printf("%dn",primes[i]);

return 0;
}

This somewhat longer than the previous examples, but it does have the distinct advantage of actually being a Sieve of Eratosthenes :-). The way it works is that it first intialises an array of size N with the integers from 1 to N. The Sieve then knocks out multiples of n for n in 1 to N. The "knocking out" is performed by setting the array element to -1. Finally, all the non-negative numbers are retrieved from the array. These are the primes, which are then printed.

Sine Objective-C is a strict superset of C, we could simply use this code as-is. However, it does lack some conveniences, for example the array sizes are fixed, the code isn't reusable and for example printing is very manual. So using the idea of the Software-IC, we notice that an array of integers is the central data structure here. Apple's Foundation doesn't provide such a beast, but that isn't a big deal, we can easily write one ourselves, or more precisely reuse the one provided in MPWFoundation.

As this is going to be running all over the array, we probably want to break open the encapsulation boundary of the Software-IC and implement the core of the algorithm as an extension, er, category of MPWIntArray.

Another slight improvement is breaking the Sieve algorithm up into smaller, named pieces that tell you what is going on, and using conveniences such as -select: to filter the results. Printing of the contents is also something that is provided.


@import MPWFoundation;

@implementation MPWIntArray(primes)

-(void)knockoutMultiplesOf:(int)n
{
    int *numbers=data;
    if (n > 0){
       for (int j=2*n-2,max=[self count];j<max;j+=n) {
           numbers[j]=-1;
       }
    }
}

-(void)sievePrimes
{
    for (int i=0,max=[self count];i<max;i++){
        [self knockoutMultiplesOf:data[i]];
    }
}

@end

int main(int argc, char *argv[])
{
    int n=argc > 1 ? atoi(argv[1]) : 100000;
    MPWIntArray *numbers=[[@(2) to:@(n-2)] asArray];
    [numbers sievePrimes];

    MPWIntArray *primes = [numbers select:^(int potentialPrime) {
        return (BOOL)(potentialPrime > 0);
    }];
    printf("primes: %s\n",[[primes description] UTF8String]);
    printf("number of primes: %d\n",[primes count]);
}

One of the benefits of the Software-IC or Scripted Components style is performance, otherwise you wouldn't want to use it in supercomputer settings. In this particular case I ran the Swift version against the Software-IC version (I didn't bother with the original Objective-C version). The performance difference is so remarkable that it was actually difficult to find values for the N parameter that gave at least somewhat reasonable/comparable results for both variants:
N Swift Sieve Software-IC Sieve Ratio
100000 0,501 0,006 83,5
200000 1,78 0,007 254,3
300000 3,78 0,008 472,5
400000 6,43 0,009 714,4
500000 9,74 0,010 974,0
Sieve times
I couldn't go much higher than N=500,000 with the Swift version because at N=500,000 it was already using around 8GB of memory and rising quickly. At that point you start to see paging activity on a reasonably busy machine, particularly one with Xcode open and I didn't want to get my machine "benchmark-clean". At those same N values, the Software-IC version isn't really getting started yet, the execution time actually being dominated by startup costs and random fluctuations. In fact, launching it with N=1 also takes 6-8ms the same as for N=500,000.

So yes, using Swift to do computations is certainly a better choice than abusing the middleware part of Objective-C to do those computations. On the other hand, used properly for constructing a Software-IC and then providing an interface to it, Objective-C does quite well. Which doesn't mean we couldn't do better for both parts: yes, somewhat better algorithmics but more importantly better middleware and less muddle about which is which. It shouldn't come as a complete shock that that is the (rough) idea behind Objective-Smalltalk, but more on that later.

UPDATE:
My little post triggered a small twitter discussion that was really much better than the post itself. Although my point was really mostly about the scripted components pattern and how it relates to Objective-C and its Software-ICs, I focused a bit too much on the performance aspect (occupational hazard). It also comes off a bit too much as "Swift vs. Objective-C", so to make that clear: the Objective-C code initially presented is probably slowest, and I am reasonably certain that you can write a C-like Sieve using Swift with roughly comparable performance.

For me, the bigger point was that this code was a good illustration of the misunderstanding and misapplication of Objective-C. I am also decidedly not saying that this misapplication is due ignorance of the people doing the misapplication, I think it has to do with the fact that the tension between scripted coomponents and unified language is something we haven't figured out yet.

Anyway, Helge managed to put the point very succinctly:

Also insightful as alway, Joe Groff:

And a little bit of "hey, I didn't know that":

And last not least: apologies again to Vincent for picking on him, just a little bit :-)