Showing posts with label Open Source. Show all posts
Showing posts with label Open Source. Show all posts

Tuesday, April 14, 2020

Somewhat Less Lethargic JSON Support for iOS/macOS, Part 3: Dematerialization

In the previous in instalments, we looked at and analysed the status quo for JSON parsing on Apple platforms in general and Swift in particular and it wasn't all that promising: we know that parsing to an intermediate representation of Foundation plist types (dictionaries, arrays, strings, numbers) is one of the worst possible ideas, yet it is the fastest we have. We know that creating objects from JSON is, or at least should be, the slowest part of this, yet it is by far the fastest, and last, not least, we also know is the slowest possible way to transfer values to those objects, yet Swift Coding somehow manages to be several times slower.

So either we're wrong about all of these things we know, always a distinct possibility, or there is something fishy going on. My vote is on the latter, and while figuring out exactly what fishy thing is going on would probably be a fascinating investigation for an Apple performance engineer, I prefer proof by creation:

Just make something that doesn't have these problems. In that case you not only know where the problem is, you also have a better alternative to use.

MASON

Without much further ado, here is the definition of the MPWMASONParser class:
@class MPWSmallStringTable;
@protocol MPWPlistStreaming;

@interface MPWMASONParser : MPWXmlAppleProplistReader {
	BOOL inDict;
	BOOL inArray;
	MPWSmallStringTable *commonStrings;
}

@property (nonatomic, strong) id  builder;

-(void)setFrequentStrings:(NSArray*)strings;

@end

What it does is send messages of the MPWPlistStreaming protocol to its builder property. So a Message-oriented parser for JaSON, just like MAX is the Message oriented API for XML.

The implementation-history is also reflected in the fact that it is a subclass of MPWXmlAppleProplistReader, which itself is a subclass of MPWMAXParser>. The core of the implementation is a loop that handles JSON syntax and sends one-way messages for the different elements to the builder. It looks very similar to loops in other simple parsers (and probably not at all like the crazy SIMD contortioins of simdjson). When done, it returns whatever the builder constructed.


-parsedData:(NSData*)jsonData
{
	[self setData:jsonData];
	const char *curptr=[jsonData bytes];
	const char *endptr=curptr+[jsonData length];
	const char *stringstart=NULL;
	NSString *curstr=nil;
	while (curptr < endptr ) {
		switch (*curptr) {
			case '{':
				[_builder beginDictionary];
				inDict=YES;
				inArray=NO;
				curptr++;
				break;
			case '}':
				[_builder endDictionary];
				curptr++;
				break;
			case '[':
				[_builder beginArray];
				inDict=NO;
				inArray=YES;
				curptr++;
				break;
			case ']':
				[_builder endArray];
				curptr++;
				break;
			case '"':
                parsestring( curptr , endptr, &stringstart, &curptr  );
                curstr = [self makeRetainedJSONStringStart:stringstart length:curptr-stringstart];
				curptr++;
				if ( *curptr == ':' ) {
					[_builder writeKey:curstr];
					curptr++;
					
				} else {
					[_builder writeString:curstr];
				}
				break;
			case ',':
				curptr++;
				break;
			case '-':
			case '0':
			case '1':
			case '2':
			case '3':
			case '4':
			case '5':
			case '6':
			case '7':
			case '8':
			case '9':
			{
				BOOL isReal=NO;
				const char *numstart=curptr;
				id number=nil;
				if ( *curptr == '-' ) {
					curptr++;
				}
				while ( curptr < endptr && isdigit(*curptr) ) {
					curptr++;
				}
				if ( *curptr == '.' ) {
					curptr++;
					while ( curptr < endptr && isdigit(*curptr) ) {
						curptr++;
					}
					isReal=YES;
				}
				if ( curptr < endptr && (*curptr=='e' | *curptr=='E') ) {
					curptr++;
					while ( curptr < endptr && isdigit(*curptr) ) {
						curptr++;
					}
					isReal=YES;
				}
                number = isReal ?
                            [self realElement:numstart length:curptr-numstart] :
                            [self integerElementAtPtr:numstart length:curptr-numstart];

				[_builder writeString:number];
				break;
			}
			case 't':
				if ( (endptr-curptr) >=4  && !strncmp(curptr, "true", 4)) {
					curptr+=4;
					[_builder pushObject:true_value];
				}
				break;
			case 'f':
				if ( (endptr-curptr) >=5  && !strncmp(curptr, "false", 5)) {
					// return false;
					curptr+=5;
					[_builder pushObject:false_value];

				}
				break;
			case 'n':
				if ( (endptr-curptr) >=4  && !strncmp(curptr, "null", 4)) {
					[_builder pushObject:[NSNull null]];
					curptr+=4;
				}
				break;
			case ' ':
			case '\n':
				while (curptr < endptr && isspace(*curptr)) {
					curptr++;
				}
				break;

			default:
				[NSException raise:@"invalidcharacter" format:@"JSON invalid character %x/'%c' at %td",*curptr,*curptr,curptr-(char*)[data bytes]];
				break;
		}
	}
    return [_builder result];

}

It almost certainly doesn't correctly handle all edge-cases, but doing so is unlikely to impact overall performance.

Dematerializing Property Lists with MPWPlistStreaming

Above, I mentioned that MASON is message-oriented, and that its main purpose is sending messages of the MPWPlistStreaming protocol to its builder. Here is that protocol:


@protocol MPWPlistStreaming

-(void)beginArray;
-(void)endArray;
-(void)beginDictionary;
-(void)endDictionary;
-(void)writeKey:aKey;
-(void)writeString:aString;
-(void)writeNumber:aNumber;
-(void)writeObject:anObject forKey:aKey;
-(void)pushContainer:anObject;
-(void)pushObject:anObject;

@end

What this enables is using property lists as an intermediate format without actually instantiating them, instead sending the messages we would have sent if we had a property list. Protocol Oriented Programming, anyone? Oh, I forgot, you can only do that in Swift...

The same protocol can also be used on the output side, then you get something like Standard Object Out.

Trying it out

By default, MPWMASONParser sets its builder to an instance of MPWPlistBuilder, which, as the name hints, builds property lists. Just like NSJSONSerialization.

So let's give it a whirl:


-(void)decodeMPWDicts:(NSData*)json
{
    MPWMASONParser *parser=[MPWMASONParser parser];
    NSArray* plistResult = [parser parsedData:json];
    NSLog(@"MPWMASON %@ with %ld dicts",[plistResult firstObject],[plistResult count]);
}

And the time is, drumroll, ... 0.621 seconds.

Hmm...that's disappointing. We didn't do anything wrong, yet almost 50% slower than NSJSONSerialization. Well, those dang Apple engineers do know what they're doing after all, and we should probably just give up.

Well, not so fast. Let's at least check out what we did wrong. Unleash the Cracken...er...Instruments!

So that's interesting: the vast majority of time is actually spent in Apple code building the plist. And we have to build the plist. So how does NSJSONSerialization get the same job done faster? Last I checked, with NSPropertyListSerialization, but close enough, they actually use specialised CoreFoundation-based dictionaries that are optimized for the case of having a lot of string keys and having them all in one place during initialization. These are not exposed, CoreFoundation being C-based means non-exposure is very effective and apparently Apple stopped open-sourcing CFLite a while ago.

So how can we do better? Tune in for the next exciting instalment :-)

TOC

Somewhat Less Lethargic JSON Support for iOS/macOS, Part 1: The Status Quo
Somewhat Less Lethargic JSON Support for iOS/macOS, Part 2: Analysis
Somewhat Less Lethargic JSON Support for iOS/macOS, Part 3: Dematerialization
Equally Lethargic JSON Support for iOS/macOS, Part 4: Our Keys are Small but Legion
Less Lethargic JSON Support for iOS/macOS, Part 5: Cutting out the Middleman
Somewhat Faster JSON Support for iOS/macOS, Part 6: Cutting KVC out of the Loop
Faster JSON Support for iOS/macOS, Part 7: Polishing the Parser
Faster JSON Support for iOS/macOS, Part 8: Dematerialize All the Things!
Beyond Faster JSON Support for iOS/macOS, Part 9: CSV and SQLite

Wednesday, March 12, 2014

Looking at a scripting language...and imagining the consequences

After thinking about the id subset and being pointed to WebScript, Brent Simmons imagines a scripting language. I have to admit I have been imagining pretty much the same language...and at some time decided to stop imagining and start building Objective-Smalltalk:

  • Peer of Objective-C: objects are Objective-C objects, methods are Objective-C methods, added to the runtime and indistinguishable from the outside. "You can subclass UIViewController, or write a category on it."
    <void>alertView:alertView clickedButtonAtIndex:<int>buttonIndex
       self newGame.
       self view setNeedsDisplay.
    
    The example is from the site, it was copied from an actual program. As you can see, interoperability with the C parts of Objective-C is still necessary, but not bothersome.
  • It has blocks:
    <void>deleteFile:filename
       thumbs := self thumbsView subviews.
       viewsToRemove := thumbs selectWhereValueForKey:'filename' isEqual:filename.
       aView := viewsToRemove firstObject.
    
       UIView animateWithDuration:0.4
              animations: [ aView setAlpha: 0.0. ]
              completion: [ aView removeFromSuperview. 
                            UIView animateWithDuration: 0.2
                                   animations: [ self thumbsView layoutSubviews. ]
                                   completion: [ 3  ]. 
                          ].
       url := self urlForFile:aFilename.
       NSFileManager defaultManager removeItemAtURL:url  error:nil.
       (self thumbsView afterDelay:0.4) setNeedsLayout.
    
    This example was also copied from an actual small educational game that was ported over from Flash.

    You also get Higher Order Messaging, Polymorpic Identifiers etc.

  • Works with the toolchain: this is a a little more tricky, but I've made some progress...part of that is an llvm based native compiler, part is tooling that enables some level of integration with Xcode, part is a separate toolset that has comparable or better capabilities.

While Objective-Smalltalk would not require shipping source code with your applications, due to the native compiler, it would certainly allow it, and in fact my own BookLightning imposition program has been shipping with part of its Objective-Smalltalk source hidden its Resources folder for about a decade or so. Go ahead, download it, crack it open and have a look! I'll wait here while you do.

Did you have a look? The part that is in Smalltalk is the distilled (but very simple) imposition algorithm shown here.


	drawOutputPage:<int>pageNo
		isBackPage := (( pageNo / 2 ) intValue ) isEqual: (pageNo / 2 ).

		pages:=self pageMap objectAtIndex:pageNo.
		page1:=pages integerAtIndex:0.
		page2:=pages integerAtIndex:1.

		self drawPage:page1 andPage:page2 flipped:(self shouldFlipPage:pageNo).

	drawPage:<int>page1 andPage:<int>page2 flipped:<int>flipped
		drawingStream := self drawingStream.
		base := MPWPSMatrix matrixRotate:-90.

		drawingStream saveGraphicsState.
		flipped ifTrue: [ drawingStream concat:self flipMatrix ].
		width := self inRect x.

		self drawPage:page1 transformedBy:(base matrixTranslatedBy: (width * -2) y:0). 
		self drawPage:page2 transformedBy:(base matrixTranslatedBy: (width * -1) y:0). 
		
		drawingStream restoreGraphicsState.
Page imposition code

What this means is that any user of BookLightning could adapt it to suit their needs, though I am pretty sure that none have done so to this date. This is partly due to the fact that this imposition algorithm is too limited to allow for much variation, and partly due to the fact that the feature is well hidden and completely unexpected.

There are two ideas behind this:

  1. Open Source should be more about being able to tinker with well-made apps in useful ways, rather than downloading and compiling gargantuan and incomprehensible tarballs of C/C++ code.
  2. There is no hard distinction between programming and scripting. A higher level scripting/programming language would not just make developer's jobs easier, it could also enable the sort of tinkering and adaptation that Open Source should be about.
I don't think the code samples shown above are quite at the level needed to really enable tinkering, but maybe they can be a useful contribution to the discussion.