Friday, April 24, 2020

Faster JSON Support for iOS/macOS, Part 7: Polishing the Parser

A convenient setback

One thing that you may have noticed last time around was that we were getting the instance variable names from the class, but then also still manually setting the common keys manually. That's a bit of duplicated and needlessly manual effort, because the common keys are exactly those ivar names.

However, the two pieces of information are in different places, the ivar names in the builder and the common strings in the in the parse itself. One way of consolidating this information is by creating a convenience intializer for decoding to objects as follows:



-initWithClass:(Class)classToDecode
{
    self = [self initWithBuilder:[[[MPWObjectBuilder alloc] initWithClass:classToDecode] autorelease]];
    [self setFrequentStrings:(NSArray*)[[[classToDecode ivarNames] collect] substringFromIndex:1]];
    return self;
}

We still compute the ivar names twice, but that's not really such a big deal, so something we can fix later, just like the issue that we should probably be using property names instead of instance variable names that in the case of properties we have to post-process to get rid of the underscores added by ivar synthesis.

With that, the code to parse to objects simplifies to the following, very similar to what you would see in Swift with JSONDecoder.


-(void)decodeMPWDirect:(NSData*)json
{
    MPWMASONParser *parser=[[MPWMASONParser alloc] initWithClass:[TestClass class]];
    NSArray* objResult = [parser parsedData:json];
}

So, quickly verifying that performance is still the same (always do this!) and...oops! Performance dropped significantly, from 441ms to over 700ms. How could such an innocuous change lead to a 50% performance regression?

The profile shows that we are now spending significantly more time in MPWSmallStringTable's objectForKey: method, where it gets the bytes out of the NSString/CFString, but why that should be the case is a bit mysterious, since we changed virtually nothing.

A little further sleuthing revealed that the strings in question are now instances of NSTaggedPointerString, where previously they were instances of __NSCFConstantString. The latter has a pointer to its byte-oriented character orientation, which it can simply return, while the former cleverly encodes the characters in the pointer itself, so it first has to reconstruct that byte representation. The method of constructing that representation and computing the size of such a representation also appears to be fairly generic and slow via a stream.

This isn't really easy to solve, since the creation of NSTaggedPointerStrring instances is hardwired pretty deep in CoreFoundation with no way to disable this "optimization". Although it would be possible to create a new NSString subclass with a byte buffer, make sure to convert to that class before putting instances in the lookup table, that seems like a lot of work. Or we could just revert this convenience.

Damn the torpedoes and full speed ahead!

Alternatively, we really wanted to get rid of this whole process of packing character data into NSString instances just to immediately unpack them again, so let's leave the regression as is and do that instead.

Where previously the builder had a NSString *key instance vaiable, it now has a char *keyStr and a int keyLen. The string-handling case in the JSON parser is now split betweeen the key and the non-key casse, with the non-key case still doing the conversion, but the key-case directly sending the char* and length to the builder.


			case '"':
                parsestring( curptr , endptr, &stringstart, &curptr  );
				if ( curptr[1] == ':' ) {
                    [_builder writeKeyString:stringstart length:curptr-stringstart];
					curptr++;
					
				} else {
                    curstr = [self makeRetainedJSONStringStart:stringstart length:curptr-stringstart];
					[_builder writeString:curstr];
				}
                curptr++;
				break;

This means that at least temporarily, JSON escape handling is disabled for keys. It's straightforward to add back, makeRetainedJSONStringStart:length: does all its processing in a character buffer, only converting to a string object at the very end.


-(void)writeString:(NSString*)aString
{
    if ( keyStr ) {
        MPWValueAccessor *accesssor=OBJECTFORSTRINGLENGTH(self.accessorTable, keyStr, keyLen);
        [accesssor setValue:aString forTarget:*tos];
        keyStr=NULL;
    } else {
        [self pushObject:aString];
    }
}

If there is a key, we are in a dictionary, otherwise an array (or top-level). In the dictionary case, we can now fetch the ValueAccessor via the OBJECTFORSTRINGLENGTH() macro.

The results are encouraging: 299ms, or 147 MB/s.

The MPWPlistBuilder also needs to be adjusted: as it builds and NSDictionary and not an object, it actually needs the NSString key, but the parser no longer delivers those. So it just creates them on the fly:


-(NSString*)key
{
    NSString *key=nil;
    if ( keyStr) {
        if ( _commonStrings ) {
            key=OBJECTFORSTRINGLENGTH(_commonStrings, keyStr, keyLen);
        }
        if ( !key ) {
            key=[[[NSString alloc] initWithBytes:keyStr length:keyLen encoding:NSUTF8StringEncoding] autorelease];
        }
    }
    return key;
}

Surprisingly, this makes the dictionary parsing code slightly faster, bringing up to par with NSSJSSONSerialization at 421ms.

Eliminating NSNumber

Our use of NSNumber/CFNumber values is very similar to our use of NSString for keys: the parser wraps the parsed number in the object, the builder then unwraps it again.

Changing that, initially just for integers, is straightforward: add an integer-valued message to the builder protocol and implement it.


-(void)writeInteger:(long)number
{
    if ( keyStr ) {
        MPWValueAccessor *accesssor=OBJECTFORSTRINGLENGTH(_accessorTable, keyStr, keyLen);
        [accesssor setIntValue:number forTarget:*tos];
        keyStr=NULL;
    } else {
        [self pushObject:@(number)];
    }
}

The actual integer parsing code is not in MPWMASONParser but its superclasss, and as we don't want to touch that for now, let's just copy-paste that code, modifying it to return a C primitive type instead of an object.


-(long)longElementAtPtr:(const char*)start length:(long)len
{
    long val=0;
    int sign=1;
    const char *end=start+len;
    if ( start[0] =='-' ) {
        sign=-1;
        start++;
    } else if ( start[0]=='+' ) {
        start++;
    }
    while ( start < end && isdigit(*start)) {
        val=val*10+ (*start)-'0';
        start++;
    }
    val*=sign;
    return val;
}

I am sure there are better ways to turn a string into an int, but it will do for now. Similarly to the key/string distinction, we now special case integers.
                if ( isReal) {
                    number = [self realElement:numstart length:curptr-numstart];

                    [_builder writeString:number];
                } else {
                    long n=[self longElementAtPtr:numstart length:curptr-numstart];
                    [_builder writeInteger:n];
                }

Again, not pretty, but we can clean it up later.

Together with using direct instance variable access instead of properties to get to the accessorTable, this yields a very noticeable speed boost:

229 ms, or 195 MB/s.

Nice.

Discussion

What happened here? Just random hacking on the profile and replacing nice object-oriented programming with ugly but fast C?

Although there is obviously some truth in that, profiles were used and more C primitive types appeared, I would contend that what happened was a move away from objects, and particularly away from generic and expensive Foundation objects ("Foundation oriented programming"?) towards message oriented programming.

I'm sorry that I long ago coined the term "objects" for this topic because it gets many people to focus on the lesser idea.

The big idea is "messaging" -- that is what the kernal of Smalltalk/Squeak is all about (and it's something that was never quite completed in our Xerox PARC phase). The Japanese have a small word -- ma -- for "that which is in between" -- perhaps the nearest English equivalent is "interstitial". The key in making great and growable systems is much more to design how its modules communicate rather than what their internal properties and behaviors should be.

It turns out that message oriented programming (or should we call it Protocol Oriented Programming?) is where Objective-C shines: coarse-grained objects, implemented in C, that exchange messages, with the messages also as primitive as you can get away with. That was the idea, and when you follow that idea, Objective-C just hums, you get not just fast, but also flexible and architecturally nicely decoupled objects: elegance.

The combination of objects + primitive messages is very similar to another architecturally elegant and productive style: Unix pipes and filters. The components are in C and can have as rich an internal structure as you want, but they have to talk to each other via byte-streams. This can also be made very fast, and also prevents or at least reduces coupling between the components.

Another aspect is the tension between an API for use and an API for reuse, particularly within the constraints of call/return. When you get tasked with "Create a component + API for parsing JSON", something like NSJSONSerialization is something you almost have to come up with: feed it JSON, out comes parsed JSON. Nothing could be more convenient to use for "parsing JSON".

MPWMASONParser on the other hand is not convenient at all when viewed in isolation, but it's much more capable of being smoothly integrated into a larger processing chain. And most of the work that NSJSONSerialization did in the name of convenience is now just wasted, it doesn't make further processing any easier but sucks up enormous amounts of time.

Anyway, let's look at the current profile:

First, times are now small enough that high-resolution (100┬Ás) sampling is now necessary to get meaningful results. Second, the NSNumber/CFNumber and NSString packing and unpacking is gone, with an even bigger chunk of the remaining time now going to object creation. objc_msgSend() is now starting to actually become noticeable, as is the (inefficient) character level parsing. The accessors of our test objects start to appear, if barely.

With the work we've done so far, we've improved speed around 5x from where we started, and at 195 MB/s are almost 20x faster than Swift's JSONDecoder.

Can we do better? Stay tuned.

Note

I can help not just Apple, but also you and your company/team with performance and agile coaching, workshops and consulting. Contact me at info at metaobject.com.

No comments: