Friday, April 17, 2020

Less Lethargic JSON Support for iOS/macOS, Part 5: Cutting out the Middleman

After initially disappointing results trying to get to faster JSON processing (parsing, for now), we finally got parity with NSJSONSerialization, more or less, in the last instalment, with the help of MPWSmallStringTable to unique our strings before turning them into objects, string creation being surprisingly expensive even for tagged pointer strings.

Cutting out the Middleman: ObjectBuilder

In the first instalment of this series, we saw that we could fairly trivially create objects from the plist created by NSJSONSerialization.

MPWObjectBuilder (.h .m) is a subclass of MPWPlistBuilder that changes just a few things: instead of creating dictionaries, it creates objects, and instead of using -setObject:forKey: to set values in that dictionary, it uses the KVC message -setValue:forKey: (vive la petite différence!) to set values in that object.


@implementation MPWObjectBuilder

-(instancetype)initWithClass:(Class)theClass
{
    self=[super init];
    self.cache=[MPWObjectCache cacheWithCapacity:20 class:theClass];
    return self;
}

-(void)beginDictionary
{
    [self pushContainer:GETOBJECT(_cache) ];
}

-(void)writeObject:anObject forKey:aKey
{
    [*tos setValue:anObject forKey:aKey];
}

That's it! Well, all that need concern us for now, the actual class has some additional features that don't matter here. The _tos instance variable is the top of a stack that MPWPlistBuilder maintains while constructing the result. The MPWObjectCache is just a factory for creating objects.

So let's fire it up and see what it can do!


-(void)decodeMPWDirect:(NSData*)json
{
    NSArray *keys=@[ @"hi", @"there", @"comment"];
    MPWMASONParser *parser=[MPWMASONParser parser];
    MPWObjectBuilder *builder=[[MPWObjectBuilder alloc] initWithClass:[TestClass class]];
    [parser setBuilder:builder];
    [parser setFrequentStrings:keys];
    NSArray* objResult = [parser parsedData:json];
    NSLog(@"MPWMASON %@ with %ld elements",[objResult firstObject],[objResult count]);
}

Not the most elegant code in the universe, and not a complete parser by an stretch of the imagination, but workable.

Result: 621 ms.

Not too shabby, only 50% slower than baseNSJSONSerialization on our non-representative 44MB JSON file, but creating the final objects, instead of just the intermediate representation, and arround 7x faster than Apple's JSONDecoder.

Although still below 100 MB/s and nowhere near 2.5 GB/s we're also starting to close in on the performance level that should be achievable given the context, with 140ms for basic object creation and 124ms for a mostly empty parse.

Analysis and next steps

Ignoring such trivialities as actually being useful for more than the most constrained situations (array of single kind of object), how can we improve this? Well, make it faster, of course, so let's have a look at the profile:

As expected, the KVC code is now the top contributor, with around 40% of total runtime. (The locking functions that show up as siblings of -setValue:forKey: are almost certainly part of that implementation, this slight misattribution of times is something you should generally expect and be aware of with Instruments. I am guessing it has to do with missing frame-pointers (-fomit-frame-pointer) but don't really feel any deep urge to investigate, as it doesn't materially impact the outcome of the analysis.

I guess that's another point: gather enough data to inform your next step, certainly no less, but also no more. I see both mistakes, the more common one definitely being making things "fast" without enough data. Or any, for that matter. If I had a €uro for every project that claims high performance without any (comparative) benchmarking, simply because they did something the authors think should be fast, well, you know, ....

The other extreme is both less common and typically less bad, as at least you don't get the complete nonsense of performance claims not backed by any performance testing, but running a huge battery of benchmarks on every step of an optimization process is probably going to get in the way of achieving results, and yes, I've seen this in practice.

So next we need to remove KVC.

TOC

Somewhat Less Lethargic JSON Support for iOS/macOS, Part 1: The Status Quo
Somewhat Less Lethargic JSON Support for iOS/macOS, Part 2: Analysis
Somewhat Less Lethargic JSON Support for iOS/macOS, Part 3: Dematerialization
Equally Lethargic JSON Support for iOS/macOS, Part 4: Our Keys are Small but Legion
Less Lethargic JSON Support for iOS/macOS, Part 5: Cutting out the Middleman
Somewhat Faster JSON Support for iOS/macOS, Part 6: Cutting KVC out of the Loop
Faster JSON Support for iOS/macOS, Part 7: Polishing the Parser
Faster JSON Support for iOS/macOS, Part 8: Dematerialize All the Things!
Beyond Faster JSON Support for iOS/macOS, Part 9: CSV and SQLite

Thursday, April 16, 2020

Equally Lethargic JSON Support for iOS/macOS, Part 4: Our Keys are Small but Legion

In our last instalment, we started implementing our JSON parser with lots of good ideas, such as dematerialization via a property list protocol, but immediately fell flat on our face with our code being 50% slower than NSJSONSerialization. And what's worse, there wasn't an obvious way out, as the bulk of the time was spent in Apple code.

Nobody said this was going to be easy.

Analysis

Let's have another look at the profile:

The top 4 consumers of CPU are -setObject:forKey:, string creation, dictionary creation and message sending. I don't really know what to do about either creating those dictionaries we have to create or setting their contents, so what about string creation?

Although making string creation itself faster is unlikely, what we can do is reduce the number of strings we create: since most of our JSON payload consists of objects né dictionaries, the vast majority of our strings is actually going to be string keys. So they will come from a small set of known strings and be on the small-ish side. Particularly the former suggests that we should re-use keys, rather than creating multiple new copies.

The usual way to look up something with a known key is an NSDictionary, but alas that would require the keys we look up to already be objects, meaning we would have to create string objects to look up our sting object values, rather defeating the purpose of the exercise.

What we would need is a way of looking up objects by raw C-Sting, an unadorned char*. Fortunately, I've been here before, so the required class has been in MPWFoundation for a little over 13 years. (What's the "Trump smug face emoticon?)

MPWSmallSStringTable

The MPWSmallStringTable (.h / .m ) class is exactly what it says on the tin: a table for looking up objects by (small) string keys. And it is accessible by char* (+length, don't want to require NUL termination) in addition to string objects.

Quite a bit of work went into making this fast, both the implementation and the interface. It is not a hash table, it compares chars directly, using indexing and bucketing to expend as little work as possible while discarding non-matching strings.

In fact, since performance is its primary reason for existing, its unit tests include performance comparisons against an NSDictionary with NSString keys, which currently clock in at 5-8x faster.

The interface includes two macros: OBJECTFORSTRINGLENGTH() and OBJECTFORCONSTANTSTRING(). You need to give the former a length, the latter figures the size out compile time using the sizeof operator, which really does return the length of string constants. Don't use it with non-constant strings (so char*) as there sizeof will return the size of the pointer.

Avoiding Allocation of Frequent Strings

With MPWSmallStringTable at hand, we can now use it in MPWMASONParser to look up common strings like our keys without allocating them.

The -setFrequentStrings: method we saw declared in the interface takes an array of strings, which the parser turns into a string table mapping from the C-Sting versions of those to the NSString version.


-(void)setFrequentStrings:(NSArray*)strings
{
	[self setCommonStrings:[[[MPWSmallStringTable alloc] initWithKeys:strings values:strings] autorelease]];
}

The method that is supposed to create string objects from char*s starts as follows:
-(NSString*)makeRetainedJSONStringStart:(const char*)start length:(long)len
{
	NSString *curstr;
	if ( commonStrings  ) {
		NSString *res=OBJECTFORSTRINGLENGTH( commonStrings, start, len );
		if ( res ) {
			return [res retain];
		}
	}
    ...

So we first check the common stings table, and only if we don't find it there do we drop down to the code to allocated the string. (Yeah, the -retain is probably questionable, though currently necessary)

Trying it out

Now all we need to do is tell the parser about those common strings before we ask it to parse JSON.
-(void)decodeMPWDicts:(NSData*)json
{
    NSArray *keys=@[ @"hi", @"there", @"comment"];
    MPWMASONParser *parser=[MPWMASONParser parser];
    [parser setFrequentStrings:keys];
    NSArray* plistResult = [parser parsedData:json];
    NSLog(@"MPWMASON %@ with %ld dicts",[plistResult firstObject],[plistResult count]);
}

While this seems a bit tacky, telling a JSON parser what to expect beforehand at least a little seems par for the course, so whatever.

How does that fare? Well, 440ms, which is 180ms faster than before and anywhere from as fast as NSJSONSerialization to 5% slower. Good enough for now.

This result is actually a bit surprising, because the keys that are created by both NSJSONSerialization and MPWMASONParser happen to be instances of NSTaggedPointerString. These strings do not get allocated on the heap, the entire string contents are cleverly encoded in the object pointer itself. Creating these should only be a couple of shifts and ORs, but apparently that takes (significantly) longer than doing the lookup, or more likely CF adds other overhead. This was certainly the case with the original tagged CFNumber, where just doing the shift+OR yourself was massively faster than calling CFNumberCreate().

What next?

Having MPWSmallStringTable immediately suggests ways of tackling the other expensive parts we identified in the profile, -setObject:forKey: and dictionary creation: use a string table with pre-computed key space, then set the objects via char* keys.

Another alternative is to use the MPWXmlAttributes class from MAX, which is optimized for the parsing and use-once case.

However, all this loses sight of the fact that we aren't actually interested in producing a plist. We want to create objects, ideally without creating that plist. This is a common pitfall I see in optimization work: getting so caught up in the details (because there is a lot of detail, and it tends to be important) that one loses sight of the context, the big picture so to speak.

Can this, creating objects from JSON, now be done more quickly? That will be in the next instalment. But as a taste of what's possible, we can just set the builder to nil, in order to see how the parser does when not having to create a plist.

The result: 160ms.

So yes, this can probably work, but it is work.

TOC

Somewhat Less Lethargic JSON Support for iOS/macOS, Part 1: The Status Quo
Somewhat Less Lethargic JSON Support for iOS/macOS, Part 2: Analysis
Somewhat Less Lethargic JSON Support for iOS/macOS, Part 3: Dematerialization
Equally Lethargic JSON Support for iOS/macOS, Part 4: Our Keys are Small but Legion
Less Lethargic JSON Support for iOS/macOS, Part 5: Cutting out the Middleman
Somewhat Faster JSON Support for iOS/macOS, Part 6: Cutting KVC out of the Loop
Faster JSON Support for iOS/macOS, Part 7: Polishing the Parser
Faster JSON Support for iOS/macOS, Part 8: Dematerialize All the Things!
Beyond Faster JSON Support for iOS/macOS, Part 9: CSV and SQLite

Tuesday, April 14, 2020

Somewhat Less Lethargic JSON Support for iOS/macOS, Part 3: Dematerialization

In the previous in instalments, we looked at and analysed the status quo for JSON parsing on Apple platforms in general and Swift in particular and it wasn't all that promising: we know that parsing to an intermediate representation of Foundation plist types (dictionaries, arrays, strings, numbers) is one of the worst possible ideas, yet it is the fastest we have. We know that creating objects from JSON is, or at least should be, the slowest part of this, yet it is by far the fastest, and last, not least, we also know is the slowest possible way to transfer values to those objects, yet Swift Coding somehow manages to be several times slower.

So either we're wrong about all of these things we know, always a distinct possibility, or there is something fishy going on. My vote is on the latter, and while figuring out exactly what fishy thing is going on would probably be a fascinating investigation for an Apple performance engineer, I prefer proof by creation:

Just make something that doesn't have these problems. In that case you not only know where the problem is, you also have a better alternative to use.

MASON

Without much further ado, here is the definition of the MPWMASONParser class:
@class MPWSmallStringTable;
@protocol MPWPlistStreaming;

@interface MPWMASONParser : MPWXmlAppleProplistReader {
	BOOL inDict;
	BOOL inArray;
	MPWSmallStringTable *commonStrings;
}

@property (nonatomic, strong) id  builder;

-(void)setFrequentStrings:(NSArray*)strings;

@end

What it does is send messages of the MPWPlistStreaming protocol to its builder property. So a Message-oriented parser for JaSON, just like MAX is the Message oriented API for XML.

The implementation-history is also reflected in the fact that it is a subclass of MPWXmlAppleProplistReader, which itself is a subclass of MPWMAXParser>. The core of the implementation is a loop that handles JSON syntax and sends one-way messages for the different elements to the builder. It looks very similar to loops in other simple parsers (and probably not at all like the crazy SIMD contortioins of simdjson). When done, it returns whatever the builder constructed.


-parsedData:(NSData*)jsonData
{
	[self setData:jsonData];
	const char *curptr=[jsonData bytes];
	const char *endptr=curptr+[jsonData length];
	const char *stringstart=NULL;
	NSString *curstr=nil;
	while (curptr < endptr ) {
		switch (*curptr) {
			case '{':
				[_builder beginDictionary];
				inDict=YES;
				inArray=NO;
				curptr++;
				break;
			case '}':
				[_builder endDictionary];
				curptr++;
				break;
			case '[':
				[_builder beginArray];
				inDict=NO;
				inArray=YES;
				curptr++;
				break;
			case ']':
				[_builder endArray];
				curptr++;
				break;
			case '"':
                parsestring( curptr , endptr, &stringstart, &curptr  );
                curstr = [self makeRetainedJSONStringStart:stringstart length:curptr-stringstart];
				curptr++;
				if ( *curptr == ':' ) {
					[_builder writeKey:curstr];
					curptr++;
					
				} else {
					[_builder writeString:curstr];
				}
				break;
			case ',':
				curptr++;
				break;
			case '-':
			case '0':
			case '1':
			case '2':
			case '3':
			case '4':
			case '5':
			case '6':
			case '7':
			case '8':
			case '9':
			{
				BOOL isReal=NO;
				const char *numstart=curptr;
				id number=nil;
				if ( *curptr == '-' ) {
					curptr++;
				}
				while ( curptr < endptr && isdigit(*curptr) ) {
					curptr++;
				}
				if ( *curptr == '.' ) {
					curptr++;
					while ( curptr < endptr && isdigit(*curptr) ) {
						curptr++;
					}
					isReal=YES;
				}
				if ( curptr < endptr && (*curptr=='e' | *curptr=='E') ) {
					curptr++;
					while ( curptr < endptr && isdigit(*curptr) ) {
						curptr++;
					}
					isReal=YES;
				}
                number = isReal ?
                            [self realElement:numstart length:curptr-numstart] :
                            [self integerElementAtPtr:numstart length:curptr-numstart];

				[_builder writeString:number];
				break;
			}
			case 't':
				if ( (endptr-curptr) >=4  && !strncmp(curptr, "true", 4)) {
					curptr+=4;
					[_builder pushObject:true_value];
				}
				break;
			case 'f':
				if ( (endptr-curptr) >=5  && !strncmp(curptr, "false", 5)) {
					// return false;
					curptr+=5;
					[_builder pushObject:false_value];

				}
				break;
			case 'n':
				if ( (endptr-curptr) >=4  && !strncmp(curptr, "null", 4)) {
					[_builder pushObject:[NSNull null]];
					curptr+=4;
				}
				break;
			case ' ':
			case '\n':
				while (curptr < endptr && isspace(*curptr)) {
					curptr++;
				}
				break;

			default:
				[NSException raise:@"invalidcharacter" format:@"JSON invalid character %x/'%c' at %td",*curptr,*curptr,curptr-(char*)[data bytes]];
				break;
		}
	}
    return [_builder result];

}

It almost certainly doesn't correctly handle all edge-cases, but doing so is unlikely to impact overall performance.

Dematerializing Property Lists with MPWPlistStreaming

Above, I mentioned that MASON is message-oriented, and that its main purpose is sending messages of the MPWPlistStreaming protocol to its builder. Here is that protocol:


@protocol MPWPlistStreaming

-(void)beginArray;
-(void)endArray;
-(void)beginDictionary;
-(void)endDictionary;
-(void)writeKey:aKey;
-(void)writeString:aString;
-(void)writeNumber:aNumber;
-(void)writeObject:anObject forKey:aKey;
-(void)pushContainer:anObject;
-(void)pushObject:anObject;

@end

What this enables is using property lists as an intermediate format without actually instantiating them, instead sending the messages we would have sent if we had a property list. Protocol Oriented Programming, anyone? Oh, I forgot, you can only do that in Swift...

The same protocol can also be used on the output side, then you get something like Standard Object Out.

Trying it out

By default, MPWMASONParser sets its builder to an instance of MPWPlistBuilder, which, as the name hints, builds property lists. Just like NSJSONSerialization.

So let's give it a whirl:


-(void)decodeMPWDicts:(NSData*)json
{
    MPWMASONParser *parser=[MPWMASONParser parser];
    NSArray* plistResult = [parser parsedData:json];
    NSLog(@"MPWMASON %@ with %ld dicts",[plistResult firstObject],[plistResult count]);
}

And the time is, drumroll, ... 0.621 seconds.

Hmm...that's disappointing. We didn't do anything wrong, yet almost 50% slower than NSJSONSerialization. Well, those dang Apple engineers do know what they're doing after all, and we should probably just give up.

Well, not so fast. Let's at least check out what we did wrong. Unleash the Cracken...er...Instruments!

So that's interesting: the vast majority of time is actually spent in Apple code building the plist. And we have to build the plist. So how does NSJSONSerialization get the same job done faster? Last I checked, with NSPropertyListSerialization, but close enough, they actually use specialised CoreFoundation-based dictionaries that are optimized for the case of having a lot of string keys and having them all in one place during initialization. These are not exposed, CoreFoundation being C-based means non-exposure is very effective and apparently Apple stopped open-sourcing CFLite a while ago.

So how can we do better? Tune in for the next exciting instalment :-)

TOC

Somewhat Less Lethargic JSON Support for iOS/macOS, Part 1: The Status Quo
Somewhat Less Lethargic JSON Support for iOS/macOS, Part 2: Analysis
Somewhat Less Lethargic JSON Support for iOS/macOS, Part 3: Dematerialization
Equally Lethargic JSON Support for iOS/macOS, Part 4: Our Keys are Small but Legion
Less Lethargic JSON Support for iOS/macOS, Part 5: Cutting out the Middleman
Somewhat Faster JSON Support for iOS/macOS, Part 6: Cutting KVC out of the Loop
Faster JSON Support for iOS/macOS, Part 7: Polishing the Parser
Faster JSON Support for iOS/macOS, Part 8: Dematerialize All the Things!
Beyond Faster JSON Support for iOS/macOS, Part 9: CSV and SQLite

Sunday, April 12, 2020

Somewhat Less Lethargic JSON Support for iOS/macOS, Part 2: Analysis

In Part 1: The Status Quo, we saw that something isn't quite right with JSON procsesing in Apple land: while something like simdjson can accomplish the basic parsing task at a rate of 2.5 GB/s and creating objects happens at an equivalent rate of 310 MB/s, Swift's JSON Codable support manages a measly 10 MB/s, underperforming the MacBook Pro's built in SSD by at least 200x and a Gigabit network connection still by factor 10.

Some of the feedback I got indicated that the implications of the data presented in "Status Quo" were not as clear as they should have been, so a little analysis before we dive into code.

The MessagePack decode is the only "pure" Swift Codable decoder. As it is so slow as to make the rest of the graph almost unreadable and was only included for comparison, not actually being a JSON decoder, let's leave it out for now. In addition, let's show how much time of each result is the underlying parser and how much time is spent in object creation.

This chart immediately lays to rest two common hypotheses for the performance issues of Swift Codable:

  1. It's the object creation.

    No.

    That is, yes, object creation is slow compared to many other things, but here it represents only around 3% of the total runtime. Yes, finding a way to reduce that final 3% would also be cool (watch this space!), but how about tackling the 97% first?

  2. It's the fact that it is using NSJSONSerialization and therefore Objective-C under the hood that makes it slow.

    No.

    Again, yes, parsing something to a dictionary-based representation that is more expensive than the final representation is not ideal and should be avoided. This is one of the things we will be doing. However:

    • The NSJSONSerialization part of decoding makes up only 13% of the running time, the remaining 87% are in the Swift decoder part.
    • Turning the dictionaries into objects using Key-Value-Coding, which to me is just about the slowest imaginable mechanism for getting data into an object that's not deliberately adding Rube-Goldberg elements, "only" adds 740ms to the basic NSJSONSerialization's parse from JSON to dictionaries. While this is ~50% more time than the parse to dictionaies and 5x the pure object creaton time, it is still 5x less than the Codable overhead.
    • All the pure Swift parsers are also this slow or slower.
It also shows that stjson is not a contender (not that it ever claimed to be), because it is slower than even Swift's JSONDecoder without actually going to full objects. JASON is significantly faster, but also doesn't go to objects, and for not going to objects is still significantly slower than NSJSONSerialization. That really only leaves the NSJSONSerialization variants as useful comparison points for what is to come, the rest is either too slow, doesn't do what we need it to do, or both.

Here we can see fairly clearly that creating objects instead of dictionaries would be better. Better than creating dictionaries and certainly much better than first creating dictionaries and then objects, as if that weren't obvious. It is also clear that the actual parsing of JSON text doesn't add all that much extra overhead relative to just creating the dictionaries. In fact, just adding the -copy to convert from mutable dictionaries to immutable dictionaries appears to take more time than the parse!

In truth, it's actually not quite that way, because as far as I know, NSJSONSerialization, like its companion NSPropertyListSerialization uses special dictionaries that are cheaper to create from a textual representation.

simdjson

With all that in mind, it should be clear that simdjson, although it would likely take the pure parse time for that down to around 17 ms, is not that interesting, at lest at this stage. What it optimizes is the part that already takes the least time, and is already overwhelmed by even small changes in the way we create our objects.

What this also means is that simdjson will only be useful if it doesn't make object creation slower. This is also a lesson I learned when creating the MAX XML parser: you can't just make the XML parser part as fast as possible, sometimes it makes sense to make the parser itself somewhat slower if that means other parts, such as object creation, significantly faster. Or more generally: it's not enough to have fast components, they have to play well together. Optimization is about systems and architecture. If you want to do it well.

MASON

In the next installment, we will start looking at the actual parser.

TOC

Somewhat Less Lethargic JSON Support for iOS/macOS, Part 1: The Status Quo
Somewhat Less Lethargic JSON Support for iOS/macOS, Part 2: Analysis
Somewhat Less Lethargic JSON Support for iOS/macOS, Part 3: Dematerialization
Equally Lethargic JSON Support for iOS/macOS, Part 4: Our Keys are Small but Legion
Less Lethargic JSON Support for iOS/macOS, Part 5: Cutting out the Middleman
Somewhat Faster JSON Support for iOS/macOS, Part 6: Cutting KVC out of the Loop
Faster JSON Support for iOS/macOS, Part 7: Polishing the Parser
Faster JSON Support for iOS/macOS, Part 8: Dematerialize All the Things!
Beyond Faster JSON Support for iOS/macOS, Part 9: CSV and SQLite

Friday, April 10, 2020

Somewhat Less Lethargic JSON Support for iOS/macOS, Part 1: The Status Quo

I just finished watching Daniel Lemire's talk on the current iteration of simdjson, a JSON parser that clocks in at 2.5GB/s! I've been following Daniel's work for some time now and can't really recommend it highly enough.

This reminded me of a recent twitter conversation where I had offered to contribute a fast, Swift-compatible JSON parser loosely based on MAX, my fast and convenient XML parser. Due to various factors most of which are not under my control, I can't really offer anything that's fast when compared to simdjson, but I can manage something quite a bit less lethargic than what's currently on offer in the Apple and particularly the Swift world.

Environmental assumptions and constraints

My first assumption is that we are going to operate in the Apple ecosystem, and for simplicity's sake I am going to use macOS. Next, I will assume that what we want from our parse(r) are domain objects for further processing within our application (or structs, the difference is not important in this context).

We are going to use the following class with a mix of integer and string instance variables, in Swift:


@objc class TestClass: NSObject, Codable {
    let hi:Int
    let there:Int
    let comment:String
...
}

and the same in Objective-C:


@interface TestClass : NSObject

@property (nonatomic) long hi,there;
@property (nonatomic,strong) NSString *comment;

@end

To make it all easy to measure, we are going to use one million objects, which we are going to initialise with increasing integers and the constant string "comment". This yields the same 44MB JSON file with different serialisation methods, which can be correctly parsed by all the parsers tested. This is obviously a very simple class an file structure, but I think it gives a reasonable approximation for real-world use.

The first thing to check is how quickly we can create these objects straight in code, without any parsing.

That should give us a good upper bound for the performance we can achieve when parsing to domain objects.


#define COUNT 1000000
-(void)createObjects
{
    NSMutableArray *objResult=[NSMutableArray arrayWithCapacity:COUNT+20];
    for ( int i=0;i<COUNT;i++ ) {
        TestClass *cur=[TestClass new];
        cur.hi=i;
        cur.there=i;
        cur.comment=@"comment";
        [objResult addObject:cur];
    }
    NSLog(@"Created objects in code w/o parsing %@ with %ld objects",objResult[0],[objResult count]);
}

On my Quad Core, 2.7Ghz MBP '18, this runs in 0.141 seconds. Although we aren't actually parsing, it would mean that just creating all the objects that would result from parsingg our 44MB JSON file would yield a rate of 312 MB/s.

Wait a second! 312MB/s is almost 10x slower than Daniel Lemire's parser, the one that actually parses JSON, and we are only creating the objects that would result if we were parsing, without doing any actual parsing.

This is one of the many unintuitive aspects of parsing performance: the actual low-level, character-level parsing is generally the least important part for overall performance. Unless you do something crazy like use NSScanner. Don't use NSScanner. Please.

One reason this is unintuitive is that we all learned that performance is dominated by the innermost loop, and character level processing is the innermost loop. But when you have magnitudes in performance differences and inner and outer loops differ by less than that amount, the stuff happennnig in the outer loop can dominate.

NSJSONSerialization

Apple's JSON story very much revolves around NSJSONSerialization, very much like most of the rest of its serialization story revolves around the very similar NSPropertyListSerialization class. It has a reasonable quick implementation, turning the 44 MB JSON file into an NSArrray of NSDictionary instances in 0.421 seconds when called from Objective-C, for a rate of 105 MB/s. From Swift, it takes 0.562 seconds, for 78 MB/s.

Of course, that gets us to a property list (array of dicts, in this case), not to the domain objects we actually want.

If you read my book (did I mention my book? Oh, I think I did), you will know that this type of dictonary representation is fairly expensive: expensive to create, expensive in terms of memory consumption and expensive to access. Just creating dictionaries equivalent to the objects we created before takes 0.321 seconds, so around 2.5x the time for creating the equivalent objects and a "rate" of 137 MB/s relative to our 44 MB JSON file.


-(void)createDicts
{
    NSMutableArray *objResult=[NSMutableArray arrayWithCapacity:COUNT+20];
    for ( int i=0;i<COUNT;i++ ) {
        NSMutableDictionary *cur=[NSMutableDictionary dictionary];
        cur[@"hi"]=@(i);
        cur[@"there"]=@(i);
        cur[@"comment"]=@"comment";
        [objResult addObject:cur];
    }
    NSLog(@"Created dicts in code w/o parsing %@ with %ld objects",objResult[0],[objResult count]);
}

Creating the dict in a single step using a dictionary literal is not significantly faster, but creating an immutable copy of the mutable dict after we're done filling brings the time to half a second.

Getting from dicts to objects is typically straightforward, if tedious: just fetch the entry of the dictionary and call the corresponding setter with the value thus retrieved from the dictionary. As this isn't production code and we're just trying to get some bounds of what is possible, there is an easier way: just use Key Value Coding with the keys found in the dictionary. The combined code, parsing and then creating the objects is shown below:


-(void)decodeNSJSONAndKVC:(NSData*)json
{
    NSArray *keys=@[ @"hi", @"there", @"comment"];
    NSArray *plistResult=[NSJSONSerialization JSONObjectWithData:json options:0 error:nil];
    NSMutableArray *objResult=[NSMutableArray arrayWithCapacity:plistResult.count+20];
    for ( NSDictionary *d in plistResult) {
        TestClass *cur=[TestClass new];
        for (NSString *key in keys) {
            [cur setValue:d[key] forKey:key];
        }
        [objResult addObject:cur];
    }
    NSLog(@"NSJSON+KVC %@ with %ld objects",objResult[0],[objResult count]);
}

Note that KVC is slow. Really slow. Order-of-magnitude slower than just sending messages kind of slow, and so it has significant impact on the total time, which comes to a total of 1.142 seconds including parsing and object creation, or just shy of 38 MB/s.

Swift JSON Coding

For the first couple of releases of Swift, JSON support by Apple was limited to a wrapped NSJSONSerialization, with the slight performance penalty already noted. As I write in my book (see sidebar), many JSON "parsers" were published, but none of these with the notable exception of the Big Nerd Ranch's Freddy were actual parses, they all just transformed the arrays and dictionaries returned by NSJSONSerialization into Swift objects. Performance was abysmal, with around 25x overhead in addition to the basic NSJSONSerialization parse.

Apple's Swift Codable promised to solve all that, and on the convenience front it certainly does a great job.


    func readJSONCoder(data:Data) -> [TestClass] {
        NSLog("Swift Decoding")
        let coder=JSONDecoder( )
        let array=try! coder.decode([TestClass].self, from: data)
        return array
    }

(All the forcing is because this is just test code, please don't do this in production!). Alas, performance is still not great: 4.39 seconds, or 10 MB/s. That's 10x slower than the basic NSJSONSerialization parse and 4x slower than our slow but simple complete parse via NSJSONSerialization and KVC.

However, it is significantly faster than the previous third-party JSON to Swift objects "parsers", to the tune of 3-4x. This is the old "first mark up 400% then discount 50%" sales trick applied to performance, except that the relative numbers are larger.

Third Party JSON Parsers

I looked a little at third party JSON parsers, particularly JASON, STJSON and ZippyJSON.

STTJSON does not make any claims to speed and manages to clock in at 5 seconds, or just under 10 MB/s. JASON bills itself as a "faster" JSON parser (they compare to SwiftyJSON), and does reasonably well at 0.75 seconds or 59 MB/s. However both of these parse to their own internal representation, not to domain objects (or structs), and so should be compared to NSJSONSerialization, at which point they both disappoint.

Probably the most interesting of these is ZippyJSON, as it uses Daniel Lemire's simdjson and is Codable compatible. Alas, I couldn't get ZippyJSON to compile, so I don't have numbers, but I will keep trying. They claim around 3x faster than Apple's JSONDecoder, which would make it the only parser to be at least in the same ballpark as the trivial NSJSONSerialization + KVC method I showed above.

Another interesting tidbit comes from ZippyJSON's README, under the heading "Why is it so much faster".

Apple's version first converts the JSON into an NSDictionary using NSJSONSerialization and then afterwards makes things Swifty. The creation of that intermediate dictionary is expensive.
This is true by itself: first converting to an intermediate representation is slow, particularly one that's as heavy-weight as property lists. However, it cannot be the primary reason, because creating that expensive representation only takes 1/8th of the total running time. The other 7/8ths is Codable apparently talking to itself. And speaking very s-l-o-w-l-y while doing that.

To corroborate, I also tried a the Flight-School implementation of Codable for MessagePack, which obviously does not use NSJSONSerialization. It makes no performance claims and takes 18 seconds to decode the same objects we used in the JSON files, of course with a different file that's 34 MB in size. Normalized to our 44 MB file that would be 2.4 MB/s.

MAX and MASON

So where does that leave us? Considering what simdjs shows is theoretically possible with JSON parsing, we are not in a good place, to put it mildly. 2.5 GB/s vs. 10 MB/s with Apple's JSONDecoder, several times slower than NSJSONSerialization, which isn't exactly a speed daemon and around 30x slower than pure object creation. Comically bad might be another way of putting it. At least we're being entertained.

What can I contribute? Well, I've been through most of this once before with XML and the result was/is MAX (Messaging API for XML), a parser that is not just super-fast itself (though no SIMD), but also presents APIs that make it both super-convenient and also super-fast to go directly from the XML to an object-representation, either as a tree or a stream of domain objects while using mostly constant memory. Have I mentioned my book? Yeah, it's in the book, in gory detail.

Anyway, XML has sorta faded, so the question was whether the same techniques would work for a JSON parser. The answer is yes, roughly, though with some added complexity and less convenience because JSON is a less informative file format than XML. Open- and close-tags really give you a good heads-up as to what's coming that "{" just does not.

The goal will be to produce domain objects at as close to the theoretical maximum of slightly more than 300 MB/s as possible, while at the same time making the parser convenient to use, close to Swift Codable in convenience. It won't support Codable per default, as the overheads seem to be too high, but ZippyJSON suggests that an adapter wouldn't be too hard.

That parser is MPWMASONParser, and no, it isn't done yet. In its initial state, it parses JSON to dictionaries in 0.58 seconds, or 76 MB/s and slightly slower than NSJSONSerialization.

So we have a bit of way to go, come join me on this little parsing performance journey!

TOC

Somewhat Less Lethargic JSON Support for iOS/macOS, Part 1: The Status Quo
Somewhat Less Lethargic JSON Support for iOS/macOS, Part 2: Analysis
Somewhat Less Lethargic JSON Support for iOS/macOS, Part 3: Dematerialization
Equally Lethargic JSON Support for iOS/macOS, Part 4: Our Keys are Small but Legion
Less Lethargic JSON Support for iOS/macOS, Part 5: Cutting out the Middleman
Somewhat Faster JSON Support for iOS/macOS, Part 6: Cutting KVC out of the Loop
Faster JSON Support for iOS/macOS, Part 7: Polishing the Parser
Faster JSON Support for iOS/macOS, Part 8: Dematerialize All the Things!
Beyond Faster JSON Support for iOS/macOS, Part 9: CSV and SQLite

Wednesday, April 8, 2020

Swift Initialization, SwiftUI and Function Builders: Called It!

Back in 2014, I wrote a post titled Remove features for greater power, aka: Swift and Objective-C initializers. In this post, I compared the IMHO insane language rules for initialisation in Swift (at the time 14 pages in the Swift book) with the complete lack of such rules in Objective-C, or Smalltalk for that matter.

Chris was so kind to leave a comment stating that my desire for simplicity was incompatible with some specific goals they had for the language. My response was that maybe those goals were incompatible with simplicity. It's a matter of priorities.

A prediction I made was that these rules, despite or more likely because of their complexity, would not be sufficient. And that turned out to be correct, as predicted, people turned to workarounds, just like they did with C++ and Java constructors.

Well, turns out I was correct beyond my wildest dreams: what are SwiftUI Function Builders if not a way to create/initialize complex object structures?

So I'll just come out and say that I called it. :-)

And while I obviously agree that a way to write down complex object structures is useful and important, and the mechanism is once again very clever, I will go out on a limb and claim that the pain that people are encountering now due to weird interactions with the language and type-system is not just due to an immature implementation and growing pains. Of course things will get better, but the fundamental problems of complexity, restrictions, non-obvious interactions with the type-system etc. are essential, not accidental, and therefore can be expected to be with us for good.

UPDATE (2024)

I guess the Swift team finally cottoned on to it: "By formalizing Objective-C's initialization conventions, we've ended up with a tower of complexity where users find it easier to do the wrong thing..."

Sunday, April 5, 2020

Why any Fundamental Improvement in Software has to be a Generalisation

A dynamic I see playing out again and again when it comes to software is the tension between incrementalism and radical change. On the one hand, there is a justified sense, backed by a lot of experience, that just tweaking what we have really doesn't cut it, that it's just rearranging the deck chairs on the Titanic. We obviously need radical change.

On the other hand, radical change that assumes we need to throw away what we (think we) know doesn't really cut it either, and the problem of all that existing software and the techniques and technology we used to create it isn't just the pragmatics of the situation, with huge investments in code and know-how. The fact that we are actually capable of creating all this software means that the radical position of "throw it all away, it's wrong" isn't really tenable. Yes, there is something wrong with it, but it cannot actually be completely wrong.

So we are faced with a dilemma: incremental change and radical change are both obviously right and both obviously wrong. And so we get a lot of shouting at each other, a lot of "change", but not a whole lot of progress.

The only way out I see is that change has to be both radical while also including the status quo, and the only way I can see of achieving that is if it is a generalisation, sort of like quantum mechanics generalised classical mechanics, superseding classical mechanics but still including it as a special case. (Or how circles were generalised to ellipses etc.)

Saturday, December 14, 2019

The Four Stages of Objective-Smalltalk

One of the features that can be confusing about Objective-Smalltalk is that it actually has several parts that are each significant on their own, so frequently will focus on just one of these (which is fine!), but without realising that the other parts also exist, which is unfortunate as they are all valuable and complement each other. In fact, they can be described as stages that are (logically) built on top of each other.

1. WebScript 2 / "Shasta"

Objective-C has always had great integration with other languages, particularly with a plethora of scripting languages, from Tcl to Python and Ruby to Lisp and Scheme and their variants etc. This is due not just to the fact that the runtime is dynamic, but also that it is simple and C-based not just in terms of being implemented in C, but being a peer to C.

However, all of these suffer from having two somewhat disparate languages, with competing object models, runtimes, storage strategies etc. One language that did not have these issues was WebScript, part of WebObjects and essentially Objective-C-Script. The language was interpreted, a peer in which you could even implement categories on existing Objective-C objects, and so syntactically compatible that often you could just copy-paste code between the two. So close to the ideal scripting language for that environment.

However, the fact that Objective-C is already a hybrid with some ugly compromises means that these compromises often no longer make sense at all in the WebScript environment. For example, Objective-C strings need an added "@" character because plain double quotes are already taken by C strings, but there are no C strings in WebScripts. Primitive types like int can be declared, but are really objects, the declaration is a dummy, a NOP. Square brackets for message sends are needed in Objective-C to distinguish messages from the rest of the C syntax, but the that's also irrelevant in WebScript. And so on.

So the first stage of Objective-Smalltalk was/is to have all the good aspects of WebScript, but without the syntactic weirdness needed to match the syntactic weirdness of Objective-C that was needed because Objective-C was jammed into C. I am not the only one who figured out the obvious fact that such a language is, essentially, a variant of Smalltalk, and I do believe this pretty much matches what Brent Simmons called Shasta.

Implementation-wise, this works very similarly to WebScript in that everything in the language is an object and gets converted to/from primitives when sending or receiving messages as needed.

This is great for a much more interactive programming model than what we have/had (and the one we have seems to be deteriorating as we speak):

And not just for isolated fragments, but for interacting with and tweaking full applications as they are running:

2. Objective-C without the C

Of course, getting rid of the (syntactic) weirdnesses of Objective-C in our scripting language means that it is no longer (syntactically) compatible with Objective-C. Which is a shame.

It is a shame because this syntactic equivalence between Objective-C and WebScript meant that you could easily move code between them. Have a script that has become stable and you want to reuse it? Copy and paste that code into an Objective-C file and you're good to go. Need it faster? Same. Have some Objective-C code that you want to explore, create variants of etc? Paste it into WebScript. Such a smooth integration between scripting and "programming" is rare and valuable.

The "obvious" solution is to have a native AOT-compiled version of this scripting language and use it to replace Objective-C. Many if not all other scripting languages have struggled mightily with becoming a compiled language, either not getting there at all or requiring JIT compilers of enormous size, complexity, engineering effort and attack surface.

Since the semantic model of our scripting language ist just Objective-C, we know that we can AOT-compile this language with a fairly straightforward compiler, probably a lot simpler than even the C/Objective-C compilers currently used, and plugging into the existing toolchain. Which is nice.

The idea seems so obvious, but apparently it wasn't.

Everything so far would, taken together, make for a really nice replacement for Objective-C with a much more productive and, let's face it, fun developer experience. However, even given the advantages of a simpler language, smoothly integrated scripting/programming and instant builds, it's not really clear that yet another OO language is really sufficient, for example the Etoilé project or the eero language never went anywhere, despite both being very nice.

3. Beyond just Objects: Architecture Oriented Programming

Ever since my Diplomarbeit, Approaches to Composition and Refinement in Object-Oriented Design back in 1997, I've been interested in Software Architecture and Architecture Description Languages (ADLs) as a way of overcoming the problems we have when constructing larger pieces of software.

One thing I noticed very early is that the elements of an ADL closely match up with and generalise the elements of a programming language, for example an object-oriented language: object generalises to component, message to connector. So it seemed that any specific pogramming language is just a specialisation or instantiation of a more general "architecture language".

To explore this idea, I needed a language that was amenable to experimentation, by being both malleable enough as to allow a metasystem that can abstract away from objects and messages and simple/small enough to make experimentation feasible. A simple variant of Smalltalk would do the trick. More mature variants tend to push you towards building with what is there, rather than abstracting from it, they "...eat their young" (Alan Kay).

So Objective-Smalltalk fits the bill perfectly as a substrate for architecture-oriented programming. In fact, its being built on/with Objective-C, which came into being largely to connect the C/Unix world with the Smalltalk world, means it is already off to a good start.

What to build? How about not reinventing the wheel and simply picking the (arguably) 3 most successful/popular architectural styles:

  • OO (subsuming the other call/return styles)
  • Unix Pipes and Filters
  • REST
Again, surprisingly, at least to me, even these specific styles appear to align reasonably well with the elements we have in a programming language. OO is already well-developed in (Objective-)Smalltalk, dataflow maps to Smalltalk's assignment operator, which needed to be made polymorphic anyway, and REST at least partially maps to non-message identifiers, which also are not polymorphic in Smalltalk.

Having now built all of these abstractions into Objective-Smalltalk, I have to admit again to my surprise how well they work and work together. Yes, it was my thesis, and yes, I can now see confirmation bias everywhere, but it was also a bit of a long-shot.

4. Architecture Oriented Metaprogramming

The architectural styles described above are implemented in frameworks and their interfaces hard-coded into the language implementation. However, with three examples , it should now be feasible to create linguistic support for defining the architectural styles in the language itself, allowing users to define and refine their own architectural styles. This is ongoing work.

What now?

One of the key takeaways from this is that each stage is already quite useful, and probably a worthy project all by itself, it just gets Even Better™ with the addition of later stages. Another is that I need to get back to getting stage ready, as it wasn't actually needed for stage 3, at least not initially.

Thursday, November 14, 2019

Presenting (in) Objective-Smalltalk

2019 has been the year that I have started really talking about Objective-Smalltalk in earnest, because enough of the original vision is now in place.

My first talk was at the European Smalltalk User Group's (ESUG) annual conference in my old hometown of Cologne: (pdf)

This year's ESUG was was my first since Essen in 2001, and it almost seemed like a bit of a timewarp. Although more than half the talks were about Pharo, the subjects seemed mostly the same as back when: a bit of TDD, a bit of trying to deal with native threads (exactly the same issues I struggled with when I was doing the CocoaSqueak VM), a bit of 3D graphics that weren't any better than 3D graphics in other environments, but in Smalltalk.

One big topic was getting large (and very profitable) Smalltalk code-bases running on mobile devices such as iPhones. The top method was transpiling to JavaScript, another translating the VM code to JavaScript and then having that run off-the-shelf images. Objective-Smalltalk can also be put in this class, with a mix of interpretation and native compilation.

My second talk, I was at Germany's oldest Mac conference, Macoun in Frankfurt. The videos from there usually take a while, but here was a reaction:

"Anyone who wants a glimpse at the future should have watched @mpweiher's talk"

Aww, shucks, thanks, but I'll take it. :-)

I also had two papers accepted at SPLASH '19, one was Standard Object Out: Streaming Objects with Polymorphic Write Streams at the Dynamic Languages Symposium, the other was Storage Combinators at Onward!.

Anyway, one aspect of those talks that I didn't dwell on is that the presentations themselves were implemented in Objective-Smalltalk, in fact the definitions were Objective-Smalltalk expressions, complex object literals to be precise.

What follows is an abridged version of the ESUG presentation:


controller := #ASCPresentationViewController{
    #Name : 'ESUG Demo'.
    #Slides : #(

      #ASCChapterSlide { 
               #text : 'Objective-SmallTalk'.
               #subtitle : 'Marcel Weiher (@mpweiher)'
         }  ,

        #ASCBulletSlide{ 
             #title : 'Objective-SmallTalk'.
             #bullets : #( 
                'Embeddable SmallTalk language (Mac, iOS, Linux, Windows)',
                'Objective-C framework (peer/interop)',
                'Generalizes Objects+Messages to Components+Connectors',
                'Enable composition by solving Architectural Mismatch',
             )
        } ,
      #ASCBulletSlide{ 
             #title : 'The Gentle Tyranny of Call/Return'.
             #bullets : #( 
                'Feymnan: we name everything just a little wrong',
                'Multiparadigm: Procedural, OO and FP!',
                "Guy Steele: it's no longer about completion",
                "Oscar Nierstrasz: we were told we could just model the domain",
                "Andrew Black: good OO students antropmorphise the objects",
             )
        } ,

         #ProgramVsSystem { 
              #lightIntensities : #( 0.2 , 0.7 )
              
         }  ,


       #ASCSlideWithFigure{ 
             #delayInSeconds : 5.0.
             #title : 'Objects and Messages'.
             #bullets : #( 
                'Objective-C compatible semantics',
                'Interpreted and native-compiled',
                '"C" using type annotations',
                'Higher Order Messaging',
                'Framework-oriented development',
                'Full platform integration',
             )
        } ,
  

       #ASCBulletSlide{ 
             #title : 'Pipes and Filters'.
             #bullets : #( 
                'Polymorphic Write Streams (DLS ''19)',
                '#writeObject:anObject',
                'Triple Dispatch + Message chaining',
                'Asynchrony-agnostic',
                'Streaming / de-materialized objects',
                'Serialisation, PDF/PS (Squeak), Wunderlist, MS , To Do',
                'Outlook: filters generalise methods?',
            )
        } ,
 
       #ASCBulletSlide{ 
             #title : 'In-Process REST'.
             #bullets : #( 
                'What real large-scale networks use',
                'Polymorphic Identifiers',
                'Stores',
                'Storage Combinators',
                'Used in a number of applications',
             )
        } ,


       #ASCBulletSlide{ 
             #title : 'Polymorphic Identifiers'.
             #bullets : #( 
                'All identifiers are URIs',
                "var:hello := 'World!",
                'file:{env:HOME}/Downloads/site := http://objective.st',
                'slider setValueHolder: ref:var:celsius',
             )
        } ,

       #ASCBulletSlide{ 
             #title : 'Storage Combinators'.
             #bullets : #( 
                'Onward! ''19',
                'Combinator exposes + consumes REST interfaces',
                'Uniform interface (REST) enables pluggability',
                'Narrow, semantically tight interface enables intermediaries',
                '10x productivity/code improvments',
             )
        } ,


      #ImageSlide{ 
               #text : 'Simple Composed Store'.
               #imageURL : '/Users/marcel/Documents/Writing/Dissertation/Papers/StorageCombinators/disk-cache-json-aligned.png'.
               #xOffset : 2.0 .
               #imageScale : 0.8
         }  , 
      #ASCBulletSlide{ 
             #title : 'Outlook'.
             #bullets : #( 
                'Port Stores and Polymorphic Write Streams',
                'Documentation / Sample Code',
                'Improve native compiler',
                'Tooling (Debugger)',
                'You! (http://objective.st)',
             )
        }  ,


      #ASCChapterSlide { 
               #text : 'Q&A   http://objective.st'.
               #subtitle : 'Marcel Weiher (@mpweiher)'
         }  ,
      )
}. 


There are a number of things going on here:
  • Complex object literals
  • A 3D presentation framework
  • Custom behavior via custom classes
  • Framework-oriented programming
Let's look at these in turn.

Complex object literals

Objective-Smalltalk has literals for arrays (really: ordered collections) and dictionaries, like many other languages now. Array literals are taken from Smalltalk, with a hash and round braces: #(). Unlike other Smalltalks, entries are separated via commas, so #( 1,2,3) rather than #( 1 2 3 ). For dictionaries, I borrowed the curly braces from Objective-C, so #{}.

This gives us the ability to specify complex property lists directly in code. A common idiom in Mac/iOS development circles is to initialize objects from property lists, so something like the following:


presentation = [[MyPresentation alloc] initWithDictionary:aDictionary];

All complex object literals really do is add a little bit of syntactic support for this idiom, by noticing that the two respective character at the start of array and dictionay literals give us a space to put a name, a class name, between those two characters:


presentation := #MyPresentation{ ... };

This will parse the text between the curly brackets as a dictionary and then initialize a MyPresentation object with that dictionary using the exact -initWithDictionary: message given above. This may seem like a very minor convenience, and it is, but it actually makes it possible to simply write down objects, rather than having to write code that constructs objects. The difference is subtle but significant.

The benefit becomes more obvious once you have nested structures. A normal plist contains no specific class information, just arrays, dictionaries numbers and strings, and in the Objective-C example, that class information is provided externally, by passing the generic plist to a specific class instance.

(JSON has a similar problem, which is why I still prefer XML for object encoding.)

So either that knowledge must also be provided externally, for example by the implicit knowledge that all substructure is uniform, or custom mechanisms must be devised to encode that information inside the dictionaries or arrays. Ad hoc. Every single time.

Complex object identifiers create a common mechanism for this: each subdictionary or sub-array can be tagged with the class of the object to create, and there is a convenient and distinct syntax to do it.

A 3D presentation framework

One of the really cool wow! effects of Alan Kay's Squeak demos is always when he breaks through the expected boundaries of a presentation with slides and starts live programming and interactive sketching on the slide. The effect is verey similar to when characters break the "fourth wall", and tends to be strongest on the very jaded, who were previously dismissive of the whole presentation.

Alas, a drawback is that those presentations in Squeak tend to look a bit amateurish and cartoonish, not at all polished.

Along came the Apple SceneKit Team's presentations, which were done as Cocoa/SceneKit applications. Which is totally amazing, as it allows arbitrary programmability and integration with custom code, just like Alan's demos, but with a lot more polish.

Of course, an application like that isn't reusable, the effort is pretty high and interactivity low.

I wonder what we could do about that?

First: turn the presentation application into a framework (Slides3D). Second, drive that framework interactively with Objective-Smalltalk from my Workspace-like "Smalltalk" application: presentation.txt. After a bit of setup such as loading the framework (framework:Slides3D load.) and defining a few custom slide classes, it goes on to define the presentation using the literal shown above and then starts the presentation by telling the presentation controller to display itself in a window.


framework:Slides3D load.     
class ProgramVsSystem : ASCSlide {
   var code.
   var system.
   ...
}.
class ImageSlide : ASCSlide { 
     var text.
     var image.


      #ASCChapterSlide { 
               #text : 'Q&A   http://objective.st'.
               #subtitle : 'Marcel Weiher (@mpweiher)'
         }  ,
      )
}. 

controller := #ASCPresentationViewController{
    #Name : 'ESUG Demo'.
    #Slides : #(

      #ASCChapterSlide { 
               #text : 'Objective-SmallTalk'.
               #subtitle : 'Marcel Weiher (@mpweiher)'
         }  ,

       ...
      )
}. 
     
controller view openInWindow:'Objective-SmallTalk (ESUG 2019)'. 

Voilà: highly polished, programmatically driven presentations that I can edit interactively and with a somewhat convenient format. Of course, this is not a one-off for presentations: the same mechanism can be used to define other object hierarchise, including but not limited to interactive GUIs.

Framework-oriented programming

Which brings us to the method behind all this madness: the concept I call framework-oriented programming.

The concept is worth at least another article or two, but at its most basic boils down to: for goodness sake, put the bulk of your code in frameworks, not in an application. Even if all you are building is an application. One app that does this right is Xcode. On my machine, the entire app bundle is close to 10GB. But the actual Xcode binary in /Applications/Xcode.app/Contents/MacOS? 41KB. Yes, Kilobytes. And most of that is bookkeeping and boilerplate, it really just contains a C main() function, which I presume largely matches the one that Xcode generates.

Why?

Simple: an Apple framework (i.e.: a .framework bundle) is at least superficially composable, but a .app bundle is not. You can compose frameworks into bigger frameworks, and you can take a framework and use it in a different app. This is difficult to impossible with apps (and no, kludged-together AppleScript concoctions don't count).

And doing it is completely trivial: after you create an app project, just create a framework target alongside the app target, add that framework to the app and then add all code and resources to the framework target instead of to the app target. Except for the main() function. If you already have an app, just move the code to the framework target, making adjustments to bundle loading code (the relevant bundle is now the framework and no longer the app/main bundle). This is what I did to derive Slides3D from the WWDC 2013 SceneKit App.

What I've described so fa is just code packaging. If you also organize the actual code as an object-oriented framework, you will notice that with time it will evolve into a black-box framework, with objects that are created, configured and composed. This is somewhat tedious to do in the base language (see: creating Views programmatically), so the final evolutionary step is considered a DSL (Hello, SwiftUI!). However, most of this DSL tends to be just creating, configuring and connecting objects. In other words: complex object literals.

Monday, November 11, 2019

What Alan Kay Got Wrong About Objects

One of the anonymous reviewers of my recently published Storage Combinators paper (pdf) complained that hiding disk-based, remote, and local abstractions behind a common interface was a bad idea, citing Jim Waldo's A Note on Distributed Computing.

Having read both this and the related 8 Fallacies of Distributed Computing a while back, I didn't see how this would apply, and re-reading confirmed my vague recollections: these are about the problems of scaling things up from the local case to the distributed case, whereas Storage Combinators and In-Process REST are about scaling things down from the distributed case to the local case. Particularly the Waldo paper is also very specifically about objects and messages, REST is a different beast.

And of course scaling things down happens to be time-honored tradtition with a pretty good track record:

In computer terms, Smalltalk is a recursion on the notion of computer itself. Instead of dividing "computer stuff" into things each less strong than the whole—like data structures, procedures, and functions which are the usual paraphernalia of programming languages—each Smalltalk object is a recursion on the entire possibilities of the computer. Thus its semantics are a bit like having thousands and thousands of computers all hooked together by a very fast network.
Mind you, I think this is absolutely brilliant: in order to get something that will scale up, you simply start with something large and then scale it down!.

But of course, this actually did not happen. As we all experienced scaling local objects and messaging up to the distributed case did not (CORBA, SOAP,...), and as Waldo explains, cannot, in fact, work. What gives?

My guess is that the method described wasn't actually used: when Alan came up with his version of objects, there were no networks with thousands of computers. And so Alan could not actually look at how they communicated, he had to imagine it, it was a Gedankenexperiment. And thus objects and messages were not a scaled-down version of an actual larger thing, they were a scaled down version of an imagined larger thing.

Today, we do have a large network of computers, with not just thousands but billions of nodes. And they communicate via HTTP using the REST architectural style, not via distributed objects and messages.

So maybe if we took that communication model and scaled it down, we might be able to do even better than objects and messages, which already did pretty brilliantly. Hence In-Process REST, Polymorphic Identifiers and Storage Combinators, and yes, the results look pretty good so far!

The big idea is "messaging" -- that is what the kernal of Smalltalk/Squeak is all about (and it's something that was never quite completed in our Xerox PARC phase). The Japanese have a small word -- ma -- for "that which is in between" -- perhaps the nearest English equivalent is "interstitial". The key in making great and growable systems is much more to design how its modules communicate rather than what their internal properties and behaviors should be. Think of the internet -- to live, it (a) has to allow many different kinds of ideas and realizations that are beyond any single standard and (b) to allow varying degrees of safe interoperability between these ideas.

So of course Alan is right after all, just not about objects and messages, which are too specific: "ma", or "interstitialness" or "connector" is the big idea, messaging is just one incarnation of that idea.

Thursday, November 7, 2019

Instant Builds

One of the goals I am aiming for in Objective-Smalltalk is instant builds and effective live programming.

A month ago, I got a package from an old school friend: my old Apple ][+, which I thought I had given as a gift, but he insisted had been a long-term loan. That machine featured 48KB of DRAM and a 1 MHz, 8 bit 6502 processor that took multiple cycles for even the simplest instructions, had no multiply instructions and almost no registers. Yet, when I turn it on it becomes interactive faster than the CRT warms up, and the programming experience remains fully interactive after that. I type something in, it executes. I change the program, type "RUN" and off it goes.

Of course, you can also get that experience with more complex systems, Smalltalk comes to mind, but the point is that it doesn't take the most advanced technology or heroic effort to make systems interactive, what it takes is making it a priority.


But here we are indeed.

Now Swift is only one example of this, it's a current trend, and of course these systems do claim that they provide benefits that are worth the wait. From optimizations to static type-checking with type-inference, so that "once it compiles, it works". This is deemed to be (a) 100% worthwhile despite the fact that there is no scientific evidence backing up these claims (a paper which claimed that it had the evidence was just shredded at this year's OOPSLA) and (b) essentially cost-free. But of course it isn't cost free:

So when everyone zigs, I zag, it's my contrarian nature. Where Swift's message was, essentially "there is too much Smalltalk in Objective-C", my contention is that there is too little Smalltalk in Objective-C (and also that there is too little "Objective" in Smalltalk, but that's a different topic).

Smalltalk was perfectly interactive in its own environment on high end late 70s and early 80s hardware. With today's monsters of computation, there is no good reason, or excuse for that matter, to not be interactive even when taken into the slightly more demanding Unix/macOS/iOS development world. That doesn't mean there aren't loads of reasons, they're just not any good.

So Objective-Smalltalk will be fast, it will be live or near-live at all times, and it will have instant builds. This isn't going to be rocket science, mostly, the ingredients are as follows:

  1. An interpreter
  2. Late binding
  3. Separate compilation
  4. A fast and simple native compiler
Let's look at these in detail.

An interpreter

The basic implementation of Objective-Smalltalk is an AST-walking interpreter. No JIT, not even a simple bytecode interpreter. Which is about as pessimal as possible, but our machines are so incredibly fast, and a lot of our tasks simple enough or computational steering enough that it actually does a decent enough job for many of those tasks. (For more on this dynamic, see The Death of Optimizing Compilers by Daniel J. Bernstein)

And because it is just an interpreter, it has no problems doing its thing on iOS:

(Yes, this is in the simulator, but it works the same on an actual device)

Late Binding

Late binding nicely decouples the parts of our software. This means that the compiler has very little information about what happens and can't help a lot in terms of optimization or checking, something that always drove the compiler folks a little nuts ("but we want to help and there's so much we could do"). It enables strong modularity and separate compilation. Objective-Smalltalk is as late-bound in its messaging as Objective-C or Smalltalk are, but goes beyond them by also late-binding identifiers, storage and dataflow with Polymorphic Identifiers (ACM, pdf), Storage Combinators (ACM, pdf) and Polymorphic Write Streams (ACM, pdf).

Allowing this level of flexibility while still not requiring a Graal-level Helden-JIT to burn away all the abstractions at runtime will require careful design of the meta-level boundaries, but I think the technically desirable boundaries align very well with the conceptually desirable boundaries: use meta-level facilities to define the language you want to program in, then write your program.

It's not making these boundaries clear and freely mixing meta-level and base-level programming that gets us in not just conceptual trouble, but also into the kinds of technical trouble that the Heldencompilers and Helden-JITs have to bail us out of.

Separate Compilation

When you have good module boundaries, you can get separate compilation, meaning a change in file (or other code-containing entity if you don't like files) does not require changes to other files. Smalltalk had this. Unix-style C programming had this, and the concept of binary libraries (with the generalization to frameworks on macOS etc.). For some reason, this has taken more and more of a back-seat in macOS and iOS development, with full source inclusion and full builds becoming the norm in the community (see CocoaPods) and for a long time being enforced by Apple by not allowing user-define dynamic libraries on iOS.

While Swift allows separate compilation, this can have such severe negative effects on both performance and compile times that compiling everything on any change has become a "best practice". In fact, we now have a build option "whole module optimization with optimizations turned off" for debugging. I kid you not.

Objective-Smalltalk is designed to enable "Framework-oriented-programming", so separate compilation is and will remain a top priority.

A fast and simple native compiler

However, even with an interpreter for interactive adjustments, separate compilation due to good modularity and late binding, you sometimes want to do a full build, or need to rebuild a large part of the codebase.

Even that shouldn't take forever, and in fact it doesn't need to. I am totally with Jonathan Blow on this subject when he says that compiling a medium size project shouldn't really more than a second or so.

My current approach for getting there is using TinyCC's backend as the starting point of the backend for Objective-Smalltalk. After all, the semantics are (mostly) Objective-C and Objective-C's semantics are just C. What I really like about tcc is that it goes so brutally directly to outputting CPU opcode as binary bytes.


static void gcall_or_jmp(int is_jmp)
{
    int r;
    if ((vtop->r & (VT_VALMASK | VT_LVAL)) == VT_CONST &&
	((vtop->r & VT_SYM) && (vtop->c.i-4) == (int)(vtop->c.i-4))) {
        /* constant symbolic case -> simple relocation */
        greloca(cur_text_section, vtop->sym, ind + 1, R_X86_64_PLT32, (int)(vtop->c.i-4));
        oad(0xe8 + is_jmp, 0); /* call/jmp im */
    } else {
        /* otherwise, indirect call */
        r = TREG_R11;
        load(r, vtop);
        o(0x41); /* REX */
        o(0xff); /* call/jmp *r */
        o(0xd0 + REG_VALUE(r) + (is_jmp << 4));
    }
}

No layers of malloc()ed intermediate representations here! This aligns very nicely with the streaming/messaging approach to high-performance I've taken elsewhere with Polymorphic Write Streams (see above), so I am pretty confident I can make this (a) work and (b) simple/elegant while keeping it (c) fast.

How fast? I obviously don't know yet, but tcc is a fantastic starting point. The following is the current (=wrong) ObjectiveTcc code to drive tcc to build a function that sends a single message:


-(void)generateMessageSendTestFunctionWithName:(char*)name
{
    SEL flagMsg=@selector(setMsgFlag);
    [self functionOnlyWithName:name returnType:VT_INT argTypes:"" body:^{
        [self pushFunctionPointer:objc_msgSend];
        [self pushObject:self];
        [self pushPointer:flagMsg];
        [self call:2];
    }];
}

How often can I do this in one second? On my 2018 high spec but 13" MBP: 300,000 times. Including in-memory linking (though not much of that happening in this example), not including Mach-O generation as that's not implemented yet and writing the whole shebang to disk. I don't anticipate either of these taking appreciably additional time.

If we consider this 2 "lines" of code, one for the function/method header and one for the message, then we can generate binary for 600KLOC/s. So having a medium size program compile and link in about a second or so seems eminently doable, even if I manage to slow the raw Tcc performance down by about an order of magnitude.

(For comparison: the Swift code base that motivated the Rome caching system for Carthage was clocking in at around 60 lines per second with the then Swift compiler. So even with an anticipated order of magnitude slowdown we'd still be 1000x faster. 1000x is good enough, it's the difference between 3 seconds and an hour.)

What's the downside? Tcc doesn't do a lot of optimization. But that's OK as (a) the sorts of optimizations C compilers and backends like LLVM do aren't much use for highly polymorphic and late-bound code and (b) the basics get you around 80% of the way (c) most code doesn't need that much optimization (see above) and (d) machines have become really fast.

And it helps that we aren't doing crazy things like initially allocating function-local variables on the heap or doing function argument copying via vtables that require require leaning on the optimizer to get adequate performance (as in: not 100x slower..).

Defense in Depth

While any of these techniques might be adequate some of the time, it's the combination that I think will make the Objective-Smalltalk tooling a refreshing, pleasant and highly productive alternative to existing toolchains, because it will be reliably fast under all circumstances.

And it doesn't really take (much) rocket science, just a willingness to make this aspect a priority.

Saturday, April 27, 2019

What's Going Down at the TIOBE Index? Swift, Surprisingly

Last month I expressed my surprise at the fact that Objective-C was recovering its rankings in the TIOBE index, not quite to the lofty #3 spot it enjoyed a while ago, but to a solid 10, once again surpassing Swift, which had dropped to #17.

This month, Swift has dropped to #19 almost looking like it's going to fall out of the top 20 altogether.

Strange times.

Wednesday, April 3, 2019

Accessors Have Message Obsession

Just came across and older post by Nat Pryce on Message Obsession, which he describes as the opposite end of a spectrum from Primitive Obsession.

The example is a set of commands for moving a robot:


-moveNorth.
-moveSouth.
-moveWest.
-moveEast.

Although the duplication is annoying, the bigger problem is that there are two things, the verb "move" and a direction argument, mushed together into the message name. And that can cause further problems down the road:
"It’s awkward to work with this kind of interface because you can’t pass around, store or perform calculations on the direction at all."

He argues, convincingly IMHO, that the combined messages should be replaced by a single move: message with a separate direction argument. The current fashion would be to make direction an enum, but he (wisely, IMHO) turns it into a class that can encode different directions:
-move:direction.

class Direction {
  ...
}

So far so good. However...

...we have this message obsessions at a massively larger scale with accessors.


-attribute.
-setAttribute:newValue.

Every single attribute of every single class gets its own accessor or accessor pair, again with the action (get/set) mushed together with the name of the attribute to work on. The solution is the same as for the directions in Nat's example: there are only two actual messages, with reified identifiers.
-get:identifier.
-set:identifier to:value.

These, of course, correspond to the GET and PUT HTTP verbs. Properties, now available in a number of mainstream languages, are supposed to address this issue, but they only really address to 2:1 problem (getter and setter for an attribute). The much bigger N:2 problem (method pair for every attribute) remains unaddressed, and particularly you also cannot pass around, store or perform calculations on the identifier.

And it turns out that passing those identifiers around performing calculations on them is tremendously powerful, even if you don't have language support. Without language support, the interface between the world of reified identifiers and objects can be a bit awkward.

Saturday, March 23, 2019

What's up at the TIOBE Index? Surprisingly, Objective-C

When Apple introduced Swift, Objective-C quickly dropped down from its number 3 spot in the TIOBE index. Way down. And it certainly seemed obvious that from that day on, this was the only direction it would ever go.

Imagine my surprise when I looked earlier this March and found it back up, no, not in the lofty heights it used to occupy, but at least in tenth place (up from 14th a year earlier), and actually surpassing Swift again, which dropped by almost half in its percent rating and from 12th to 17th place in the rankings.

What's going on here?

Tuesday, March 19, 2019

LISP Macros, Delayed Evaluation and the Evolution of Smalltalk

At a recent Clojure Berlin Meetup, Veit Heller gave an interesting talk on Macros in Clojure. The meetup was very enjoyable, and the talk also brought me a little closer to understanding the relationship between functions and macros and a bit of Smalltalk evolution that had so far eluded me.

The question, which has been bugging me for some time, is when do we actually need metaprogramming facilities like macros, and why? After all, we already have functions and methods for capturing and extracting common functionality. A facile answer is that "Macros extend the language", but so do functions, in their way. Another answer is that you have to use Macros when you can't make progress any other way, but that doesn't really answer the question either.

The reason the question is relevant is, of course, that although it is fun to play around with powerful mechanisms, we should always use the least powerful mechanism that will accomplish our goal, as it will be easier to program with, easier to understand, easier to analyse and build tools for, and easier to maintain.

Anyway, the answer in this case seemed to be that macros were needed in order to "delay evaluation", to send unevaluated parameters to the macros. A quick question to the presenter confirmed that this was the case for most of the examples. Which begs the question: if we had a generic mechanism for delaying evluation, could we have used plain functions (or methods) instead, and indeed the answer was that this was the case.

One of the examples was a way to build your own if, which most languages have built in, but Smalltalk famously implements in the class library: there is an ifTrue:ifFalse: message that takes two blocks (closures) as parameters. The True class evaluates the first block parameter and ignores the second, the False class evaluates the second block parameter and ignores the first.

The Clojure macro example worked almost exactly the same way, but where Smalltalk uses blocks to delay evaluation, the example used macros. So where LISP might use macros, Smalltalk uses blocks. That macros and blocks might be related was new to me, and took me a while to process. Once I had processed it, a bit of Smalltalk history that I had always struggled with, this bit about Smalltalk-76, suddenly made sense:



Why did it "have to" provide such a mechanism? It doesn't say. It says this mechanism was replaced by the equivalent blocks, but blocks/anonymous functions seem quite different from alternate argument-passing mechanisms. Huh?

With this new insight, it suddenly makes sense. Smalltalk-72 just had a token-stream, there were no "arguments" as such, the new method just took over parsing the token stream and picked up the paramters from there. In a sense, the ultimate macro system and ultimately powerful, but also quite unusable, incomprehensible, unmaintainable and not compilable. In that system, "arguments" are per-definition unevaluated and so you can do all the macro-like magic you want.

Dan's Smalltalk-76 effort was largely about compiling for better performance and having a stable, comprehensible and composable syntax. But there are times you still need unevaluated arguments, for example if you want to implement an if that only evaluates one of its branches, not both of them, without baking it into the language. Smalltalk did not have a macro mechanism, and it no longer had the Smalltalk-72 token-stream where un-evaluated "arguments" came for free, so yes, there "had" to be some sort of mechanism for unevaluated arguments.

Hence the open-colon syntax.

And we have a progression of: Smalltalk-72 token stream → Smalltalk-76 open colon parameters → Smalltalk-80 blocks.
All serving the purpose of enabling macro-like capabilities without actually having macros by providing a general language facility for passing un-evaluated parameters.

Aha!