Showing posts with label parsing. Show all posts
Showing posts with label parsing. Show all posts

Monday, April 20, 2020

Somewhat Faster JSON Support for iOS/macOS, Part 6: Cutting KVC out of the Loop

Last time, we actually made some significant headway by taking advantage of the dematerialisation of the plist intermediate representation. So instead of first producing an array of dictionaries, we went directly from JSON to the final object representation.

This got us down from around 1.1 seconds to a little over 600 milliseconds.

It was accomplished by using the Key Value Coding method setValue:forKey: to directly set the attributes of the objects from the parsed JSON. Oh, and instantiating those objects in the first place, instead of dictionaries.

That this should be so much faster than most other methods, for example beating Swift's JSONDecoder() by a cool 7x, is a little surprising, given that KVC is, as I mentioned in the first article of the series, the slowest mechanism for getting data in and out of objcets short of deliberate Rube Goldber Mechanisms.

What is KVC and why is it slow?

Key Value Coding was a core part of NeXT's Enterprise Object Framework, introduced in 1994.
Key-value coding is a data access mechanism in which the properties of an object are accessed indirectly by key or name, rather than directly as fields or by invocation of accessor methods. It is used throughout Enterprise Objects but is perhaps most useful to you when accessing data in relationships between enterprise objects.

Key-value coding enables the use of keypaths to traverse relationships. For example, if a Person entity has a relationship called toPhoto whose destination entity (called PersonPhoto) contains an attribute called photo, you could access that data by traversing the keypath toPhoto.photo from within a Person object.

Keypaths are just one way key-value coding is an invaluable feature of Enterprise Objects. In general, though, it is most useful in providing a consistent way to access an object's data. Rather than needing to know if an object's data members have accessor methods, what the names of those accessor methods are, or if the data is accessible through fields, all you need to know are the keys that represent an object’s data. Key-value coding automatically finds the data, regardless of how the object provides its data. In this context, key-value coding satisfies the classic design pattern principle to “encapsulate the things that varies.”

It still is an extremely powerful programming technique that lets us write algorithms that work generically with any object properties, and is currently the basis for CoreData, AppleScript support, Key Value Observing and Bindings. (Though I am somewhat skeptical of some of these, not least for performance reasons, see The Siren Call of KVO and (Cocoa) Bindings). It was also part of the inspiration for Polymorphic Identifiers.

The core of KVC are the valueForKey: and setValue:forKey: messages, which have default implementations in NSObject. These default implementations take the NSString key, derive an accessor message from that key and then send the message, either setting or returning a value. If the value that the underlying message takes/returns is a non-object type, then KVC wraps/unwraps as necessary.

If this sounds expensive, then that's because it is. To derive the set accessor from the key, the first character of the key has to be capitalized, the the string "set" prepended and the string converted to an Objective-C selector (SEL). In theory, this has to be done on every call to one of the KVC methods, and it has to be done with NSString objects, which do a fantastic job of representing human-visible text, but are a bit heavy-weight for low-level work.

Doing the full computation on every invocation would be way too expensive, so Apple caches some of the intermediate results. As there is no obvious place to put those intermediate results, they are placed in global hash tables, keyed by class and property/key name. However, even those lookups are still significantly more expensive than the final set or get property accesss, and we have to do multiple lookups. Since theses tables have to be global, locking is also required.

ValueAccessor

All this expense could be avoided if we had a custom object to mediate the access, rather than a naked NSString. That object could store those computed values, and then provide fast and generic access to arbitrary properties. Enter MPWValueAccesssor (.h .m).

A word of warning: unlike MPWStringtable, MPWValueAccesssor is mostly experimental code. It does have tests and largely works, but it is incomplete in many ways and also contains a bunch of extra and probably extraneous ideas. It is sufficient for our current purpose.

The core of this class is the AccessPathComponent struct.


typedef struct {
    Class   targetClass;
    int     targetOffset;
    SEL     getSelector,putSelector;
    IMP0    getIMP;
    IMP1    putIMP;
    id      additionalArg;
    char    objcType;
} AccessPathComponent;

This struct contains a number of different ways of getting/setting the data:
  1. the integer offset into the object where the ivar is located
  2. a pair of Objective-C selectors/message names, one for getting, one for setting.
  3. a pair of function pointers to the Objective-C methods that the respective selectors resolve to
  4. the additional arg is the key, to be used for keyed access
The getIMP and putImp are initialized to objc_msgSend(), so they can always be used. If we bind the ValueAccessor to a class, those function pointers get resolved to the actual getter/setter methods. In addition the objcType gets set to the type of the instance variable, so we can do automatic conversions like KVC. (This was some code I actually had to add between the last instalment and the current one.)

The key takeaway is that all the string processing and lookup that KVC needs to do on every call is done once during initialization, after that it's just a few messages and/or pre-resolved function calls.

Hooking up the ValueAccessor

Adapting the MPWObjectBuilder (.h .m) to use MPWValueAccessor was much easier than I had expected. Thee following shows the changes made:
@property (nonatomic, strong) MPWSmallStringTable *accessorTable;

...

-(void)setupAcceessors:(Class)theClass
{
    NSArray *ivars=[theClass ivarNames];
    ivars=[[ivars collect] substringFromIndex:1];
    NSMutableArray *accessors=[NSMutableArray arrayWithCapacity:ivars.count];
    for (NSString *ivar in ivars) {
        MPWValueAccessor *accessor=[MPWValueAccessor valueForName:ivar];
        [accessor bindToClass:theClass];
        [accessors addObject:accessor];
    }
    MPWSmallStringTable *table=[[[MPWSmallStringTable alloc] initWithKeys:ivars values:accessors] autorelease];
    self.accessorTable=table;
}

-(void)writeObject:anObject forKey:aKey
{
    MPWValueAccessor *accesssor=[self.accessorTable objectForKey:aKey];
    [accesssor setValue:anObject forTarget:*tos];
}



The bulk of the changes come as part of the new -setupAccessors: method. It first asks the class what its instance variables are, creates a value accessor for that instance variabl(-name), binds the accessor to the class and finally puts the accessors in a lookup table keyed by name.

The -writeObject:forKey: method is modified to look up and use a value accessor instead of using KVC.

Results

The parsing driver code didn't have to be changed, re-running it on our non-representative 44 MB JSON file yields the following time:

441 ms.

Now we're really starting to get somewhere! This is just shy of 100 MB/s and 10x faster then Swift's JSONDecoder, and within 5% of raw NSJSONSerialization.

Analysis and next steps

Can we do better? Why yes, glad you asked. Let's have a look at the profile.

First thing to note is that object-creation (beginDictionary) is now the #1 entry under the parse, as it should be. This is another indicator that we are not just moving in the right direction, but also closing in on the endgame.

However, there is still room for improvement. For example, although actually searching the SmallStringTable for the ValueAccessor (offsetOfCStringWithLengthInTableOfLength()) takes only 2.7% of the time, about the same as getting the internal char* out of a CFString via the fast-path (CFStringGetCStringPtr()), the total time for the -objectForKey: is a multiple of that, at 13%. This means that unwrapping the NSString takes more time than doing the actual work. Wrapping the char* and length into an NSString also takes significant time, and all of this work is redundant...we would be better of just passing along the char* and length.

A similar wrap/unwrap situation occurs with integers, which we first turn into NSNumbers, only to immediately get the integer out again so we can set it.

objc_msgSend() also starts getting noticeable, so looking at a bit of IMP-caching and just eliminating unnecessary indirection also seems like a good idea.

That's another aspect of optimization work: while the occasional big win is welcome, getting to truly outstanding performance means not being satisfied with that, but slogging through all the small-ish seeming detail.

Note

I can help not just Apple, but also you and your company with performance and agile coaching, workshops and consulting. Contact me at info at metaobject.com.

TOC

Somewhat Less Lethargic JSON Support for iOS/macOS, Part 1: The Status Quo
Somewhat Less Lethargic JSON Support for iOS/macOS, Part 2: Analysis
Somewhat Less Lethargic JSON Support for iOS/macOS, Part 3: Dematerialization
Equally Lethargic JSON Support for iOS/macOS, Part 4: Our Keys are Small but Legion
Less Lethargic JSON Support for iOS/macOS, Part 5: Cutting out the Middleman
Somewhat Faster JSON Support for iOS/macOS, Part 6: Cutting KVC out of the Loop
Faster JSON Support for iOS/macOS, Part 7: Polishing the Parser
Faster JSON Support for iOS/macOS, Part 8: Dematerialize All the Things!
Beyond Faster JSON Support for iOS/macOS, Part 9: CSV and SQLite

Sunday, April 12, 2020

Somewhat Less Lethargic JSON Support for iOS/macOS, Part 2: Analysis

In Part 1: The Status Quo, we saw that something isn't quite right with JSON procsesing in Apple land: while something like simdjson can accomplish the basic parsing task at a rate of 2.5 GB/s and creating objects happens at an equivalent rate of 310 MB/s, Swift's JSON Codable support manages a measly 10 MB/s, underperforming the MacBook Pro's built in SSD by at least 200x and a Gigabit network connection still by factor 10.

Some of the feedback I got indicated that the implications of the data presented in "Status Quo" were not as clear as they should have been, so a little analysis before we dive into code.

The MessagePack decode is the only "pure" Swift Codable decoder. As it is so slow as to make the rest of the graph almost unreadable and was only included for comparison, not actually being a JSON decoder, let's leave it out for now. In addition, let's show how much time of each result is the underlying parser and how much time is spent in object creation.

This chart immediately lays to rest two common hypotheses for the performance issues of Swift Codable:

  1. It's the object creation.

    No.

    That is, yes, object creation is slow compared to many other things, but here it represents only around 3% of the total runtime. Yes, finding a way to reduce that final 3% would also be cool (watch this space!), but how about tackling the 97% first?

  2. It's the fact that it is using NSJSONSerialization and therefore Objective-C under the hood that makes it slow.

    No.

    Again, yes, parsing something to a dictionary-based representation that is more expensive than the final representation is not ideal and should be avoided. This is one of the things we will be doing. However:

    • The NSJSONSerialization part of decoding makes up only 13% of the running time, the remaining 87% are in the Swift decoder part.
    • Turning the dictionaries into objects using Key-Value-Coding, which to me is just about the slowest imaginable mechanism for getting data into an object that's not deliberately adding Rube-Goldberg elements, "only" adds 740ms to the basic NSJSONSerialization's parse from JSON to dictionaries. While this is ~50% more time than the parse to dictionaies and 5x the pure object creaton time, it is still 5x less than the Codable overhead.
    • All the pure Swift parsers are also this slow or slower.
It also shows that stjson is not a contender (not that it ever claimed to be), because it is slower than even Swift's JSONDecoder without actually going to full objects. JASON is significantly faster, but also doesn't go to objects, and for not going to objects is still significantly slower than NSJSONSerialization. That really only leaves the NSJSONSerialization variants as useful comparison points for what is to come, the rest is either too slow, doesn't do what we need it to do, or both.

Here we can see fairly clearly that creating objects instead of dictionaries would be better. Better than creating dictionaries and certainly much better than first creating dictionaries and then objects, as if that weren't obvious. It is also clear that the actual parsing of JSON text doesn't add all that much extra overhead relative to just creating the dictionaries. In fact, just adding the -copy to convert from mutable dictionaries to immutable dictionaries appears to take more time than the parse!

In truth, it's actually not quite that way, because as far as I know, NSJSONSerialization, like its companion NSPropertyListSerialization uses special dictionaries that are cheaper to create from a textual representation.

simdjson

With all that in mind, it should be clear that simdjson, although it would likely take the pure parse time for that down to around 17 ms, is not that interesting, at lest at this stage. What it optimizes is the part that already takes the least time, and is already overwhelmed by even small changes in the way we create our objects.

What this also means is that simdjson will only be useful if it doesn't make object creation slower. This is also a lesson I learned when creating the MAX XML parser: you can't just make the XML parser part as fast as possible, sometimes it makes sense to make the parser itself somewhat slower if that means other parts, such as object creation, significantly faster. Or more generally: it's not enough to have fast components, they have to play well together. Optimization is about systems and architecture. If you want to do it well.

MASON

In the next installment, we will start looking at the actual parser.

TOC

Somewhat Less Lethargic JSON Support for iOS/macOS, Part 1: The Status Quo
Somewhat Less Lethargic JSON Support for iOS/macOS, Part 2: Analysis
Somewhat Less Lethargic JSON Support for iOS/macOS, Part 3: Dematerialization
Equally Lethargic JSON Support for iOS/macOS, Part 4: Our Keys are Small but Legion
Less Lethargic JSON Support for iOS/macOS, Part 5: Cutting out the Middleman
Somewhat Faster JSON Support for iOS/macOS, Part 6: Cutting KVC out of the Loop
Faster JSON Support for iOS/macOS, Part 7: Polishing the Parser
Faster JSON Support for iOS/macOS, Part 8: Dematerialize All the Things!
Beyond Faster JSON Support for iOS/macOS, Part 9: CSV and SQLite

Friday, April 10, 2020

Somewhat Less Lethargic JSON Support for iOS/macOS, Part 1: The Status Quo

I just finished watching Daniel Lemire's talk on the current iteration of simdjson, a JSON parser that clocks in at 2.5GB/s! I've been following Daniel's work for some time now and can't really recommend it highly enough.

This reminded me of a recent twitter conversation where I had offered to contribute a fast, Swift-compatible JSON parser loosely based on MAX, my fast and convenient XML parser. Due to various factors most of which are not under my control, I can't really offer anything that's fast when compared to simdjson, but I can manage something quite a bit less lethargic than what's currently on offer in the Apple and particularly the Swift world.

Environmental assumptions and constraints

My first assumption is that we are going to operate in the Apple ecosystem, and for simplicity's sake I am going to use macOS. Next, I will assume that what we want from our parse(r) are domain objects for further processing within our application (or structs, the difference is not important in this context).

We are going to use the following class with a mix of integer and string instance variables, in Swift:


@objc class TestClass: NSObject, Codable {
    let hi:Int
    let there:Int
    let comment:String
...
}

and the same in Objective-C:


@interface TestClass : NSObject

@property (nonatomic) long hi,there;
@property (nonatomic,strong) NSString *comment;

@end

To make it all easy to measure, we are going to use one million objects, which we are going to initialise with increasing integers and the constant string "comment". This yields the same 44MB JSON file with different serialisation methods, which can be correctly parsed by all the parsers tested. This is obviously a very simple class an file structure, but I think it gives a reasonable approximation for real-world use.

The first thing to check is how quickly we can create these objects straight in code, without any parsing.

That should give us a good upper bound for the performance we can achieve when parsing to domain objects.


#define COUNT 1000000
-(void)createObjects
{
    NSMutableArray *objResult=[NSMutableArray arrayWithCapacity:COUNT+20];
    for ( int i=0;i<COUNT;i++ ) {
        TestClass *cur=[TestClass new];
        cur.hi=i;
        cur.there=i;
        cur.comment=@"comment";
        [objResult addObject:cur];
    }
    NSLog(@"Created objects in code w/o parsing %@ with %ld objects",objResult[0],[objResult count]);
}

On my Quad Core, 2.7Ghz MBP '18, this runs in 0.141 seconds. Although we aren't actually parsing, it would mean that just creating all the objects that would result from parsingg our 44MB JSON file would yield a rate of 312 MB/s.

Wait a second! 312MB/s is almost 10x slower than Daniel Lemire's parser, the one that actually parses JSON, and we are only creating the objects that would result if we were parsing, without doing any actual parsing.

This is one of the many unintuitive aspects of parsing performance: the actual low-level, character-level parsing is generally the least important part for overall performance. Unless you do something crazy like use NSScanner. Don't use NSScanner. Please.

One reason this is unintuitive is that we all learned that performance is dominated by the innermost loop, and character level processing is the innermost loop. But when you have magnitudes in performance differences and inner and outer loops differ by less than that amount, the stuff happennnig in the outer loop can dominate.

NSJSONSerialization

Apple's JSON story very much revolves around NSJSONSerialization, very much like most of the rest of its serialization story revolves around the very similar NSPropertyListSerialization class. It has a reasonable quick implementation, turning the 44 MB JSON file into an NSArrray of NSDictionary instances in 0.421 seconds when called from Objective-C, for a rate of 105 MB/s. From Swift, it takes 0.562 seconds, for 78 MB/s.

Of course, that gets us to a property list (array of dicts, in this case), not to the domain objects we actually want.

If you read my book (did I mention my book? Oh, I think I did), you will know that this type of dictonary representation is fairly expensive: expensive to create, expensive in terms of memory consumption and expensive to access. Just creating dictionaries equivalent to the objects we created before takes 0.321 seconds, so around 2.5x the time for creating the equivalent objects and a "rate" of 137 MB/s relative to our 44 MB JSON file.


-(void)createDicts
{
    NSMutableArray *objResult=[NSMutableArray arrayWithCapacity:COUNT+20];
    for ( int i=0;i<COUNT;i++ ) {
        NSMutableDictionary *cur=[NSMutableDictionary dictionary];
        cur[@"hi"]=@(i);
        cur[@"there"]=@(i);
        cur[@"comment"]=@"comment";
        [objResult addObject:cur];
    }
    NSLog(@"Created dicts in code w/o parsing %@ with %ld objects",objResult[0],[objResult count]);
}

Creating the dict in a single step using a dictionary literal is not significantly faster, but creating an immutable copy of the mutable dict after we're done filling brings the time to half a second.

Getting from dicts to objects is typically straightforward, if tedious: just fetch the entry of the dictionary and call the corresponding setter with the value thus retrieved from the dictionary. As this isn't production code and we're just trying to get some bounds of what is possible, there is an easier way: just use Key Value Coding with the keys found in the dictionary. The combined code, parsing and then creating the objects is shown below:


-(void)decodeNSJSONAndKVC:(NSData*)json
{
    NSArray *keys=@[ @"hi", @"there", @"comment"];
    NSArray *plistResult=[NSJSONSerialization JSONObjectWithData:json options:0 error:nil];
    NSMutableArray *objResult=[NSMutableArray arrayWithCapacity:plistResult.count+20];
    for ( NSDictionary *d in plistResult) {
        TestClass *cur=[TestClass new];
        for (NSString *key in keys) {
            [cur setValue:d[key] forKey:key];
        }
        [objResult addObject:cur];
    }
    NSLog(@"NSJSON+KVC %@ with %ld objects",objResult[0],[objResult count]);
}

Note that KVC is slow. Really slow. Order-of-magnitude slower than just sending messages kind of slow, and so it has significant impact on the total time, which comes to a total of 1.142 seconds including parsing and object creation, or just shy of 38 MB/s.

Swift JSON Coding

For the first couple of releases of Swift, JSON support by Apple was limited to a wrapped NSJSONSerialization, with the slight performance penalty already noted. As I write in my book (see sidebar), many JSON "parsers" were published, but none of these with the notable exception of the Big Nerd Ranch's Freddy were actual parses, they all just transformed the arrays and dictionaries returned by NSJSONSerialization into Swift objects. Performance was abysmal, with around 25x overhead in addition to the basic NSJSONSerialization parse.

Apple's Swift Codable promised to solve all that, and on the convenience front it certainly does a great job.


    func readJSONCoder(data:Data) -> [TestClass] {
        NSLog("Swift Decoding")
        let coder=JSONDecoder( )
        let array=try! coder.decode([TestClass].self, from: data)
        return array
    }

(All the forcing is because this is just test code, please don't do this in production!). Alas, performance is still not great: 4.39 seconds, or 10 MB/s. That's 10x slower than the basic NSJSONSerialization parse and 4x slower than our slow but simple complete parse via NSJSONSerialization and KVC.

However, it is significantly faster than the previous third-party JSON to Swift objects "parsers", to the tune of 3-4x. This is the old "first mark up 400% then discount 50%" sales trick applied to performance, except that the relative numbers are larger.

Third Party JSON Parsers

I looked a little at third party JSON parsers, particularly JASON, STJSON and ZippyJSON.

STTJSON does not make any claims to speed and manages to clock in at 5 seconds, or just under 10 MB/s. JASON bills itself as a "faster" JSON parser (they compare to SwiftyJSON), and does reasonably well at 0.75 seconds or 59 MB/s. However both of these parse to their own internal representation, not to domain objects (or structs), and so should be compared to NSJSONSerialization, at which point they both disappoint.

Probably the most interesting of these is ZippyJSON, as it uses Daniel Lemire's simdjson and is Codable compatible. Alas, I couldn't get ZippyJSON to compile, so I don't have numbers, but I will keep trying. They claim around 3x faster than Apple's JSONDecoder, which would make it the only parser to be at least in the same ballpark as the trivial NSJSONSerialization + KVC method I showed above.

Another interesting tidbit comes from ZippyJSON's README, under the heading "Why is it so much faster".

Apple's version first converts the JSON into an NSDictionary using NSJSONSerialization and then afterwards makes things Swifty. The creation of that intermediate dictionary is expensive.
This is true by itself: first converting to an intermediate representation is slow, particularly one that's as heavy-weight as property lists. However, it cannot be the primary reason, because creating that expensive representation only takes 1/8th of the total running time. The other 7/8ths is Codable apparently talking to itself. And speaking very s-l-o-w-l-y while doing that.

To corroborate, I also tried a the Flight-School implementation of Codable for MessagePack, which obviously does not use NSJSONSerialization. It makes no performance claims and takes 18 seconds to decode the same objects we used in the JSON files, of course with a different file that's 34 MB in size. Normalized to our 44 MB file that would be 2.4 MB/s.

MAX and MASON

So where does that leave us? Considering what simdjs shows is theoretically possible with JSON parsing, we are not in a good place, to put it mildly. 2.5 GB/s vs. 10 MB/s with Apple's JSONDecoder, several times slower than NSJSONSerialization, which isn't exactly a speed daemon and around 30x slower than pure object creation. Comically bad might be another way of putting it. At least we're being entertained.

What can I contribute? Well, I've been through most of this once before with XML and the result was/is MAX (Messaging API for XML), a parser that is not just super-fast itself (though no SIMD), but also presents APIs that make it both super-convenient and also super-fast to go directly from the XML to an object-representation, either as a tree or a stream of domain objects while using mostly constant memory. Have I mentioned my book? Yeah, it's in the book, in gory detail.

Anyway, XML has sorta faded, so the question was whether the same techniques would work for a JSON parser. The answer is yes, roughly, though with some added complexity and less convenience because JSON is a less informative file format than XML. Open- and close-tags really give you a good heads-up as to what's coming that "{" just does not.

The goal will be to produce domain objects at as close to the theoretical maximum of slightly more than 300 MB/s as possible, while at the same time making the parser convenient to use, close to Swift Codable in convenience. It won't support Codable per default, as the overheads seem to be too high, but ZippyJSON suggests that an adapter wouldn't be too hard.

That parser is MPWMASONParser, and no, it isn't done yet. In its initial state, it parses JSON to dictionaries in 0.58 seconds, or 76 MB/s and slightly slower than NSJSONSerialization.

So we have a bit of way to go, come join me on this little parsing performance journey!

TOC

Somewhat Less Lethargic JSON Support for iOS/macOS, Part 1: The Status Quo
Somewhat Less Lethargic JSON Support for iOS/macOS, Part 2: Analysis
Somewhat Less Lethargic JSON Support for iOS/macOS, Part 3: Dematerialization
Equally Lethargic JSON Support for iOS/macOS, Part 4: Our Keys are Small but Legion
Less Lethargic JSON Support for iOS/macOS, Part 5: Cutting out the Middleman
Somewhat Faster JSON Support for iOS/macOS, Part 6: Cutting KVC out of the Loop
Faster JSON Support for iOS/macOS, Part 7: Polishing the Parser
Faster JSON Support for iOS/macOS, Part 8: Dematerialize All the Things!
Beyond Faster JSON Support for iOS/macOS, Part 9: CSV and SQLite