Monday, January 12, 2009

iPhone XML performance

Shortly after becoming an iPhone developer, I found a clever little piece of example code called XML Performance (login required). Having done some high performance XML processing code that works on the iPhone, I was naturally intrigued.

The example pits Cocoa's NSXMLParser against a custom parser based on libxml2, the benchmark is downloading a top 300 list of songs from iTunes.

More responsiveness using libxml2 instead of NSXMLParser

Based on my previous experience, I was expecting libxml2 to be noticeably faster, but with the advantage in processing speed being less and less important with lower and lower I/O data rates (WiFi to 3G to Edge), as I/O would start to completely overwhelm processing. Was I ever wrong!

While my expectations were technically correct for overall performance, I had completely failed to take responsiveness into account. Depending on the network selected, the NSXMLParser sample would appear to hang for 3 to 50 seconds before starting to show results. Needless to say, that is an awful user experience. The libxml example, on the other hand, would start displaying some results almost immediately. While it also was a bit faster in the total time taken, this effect seemed pretty insignificant compared to the fact that results were arriving continually pretty much during the entire time.

The difference, of course, is incremental processing. Whereas NSXMLParser's -initWithContentsOfURL: method apparently downloads the entire document first and then begins processing, the libxml2-based code in the sample downloads the XML in small chunks and processes those chunks immediately.

Alas, going with libxml2 has clear and significant disadvantages, with the code that uses libxml2 being around twice the size of the NSXMLParser-based code, at around 150 lines (non-comment, non-whitespace). If you have worked with NSXMLParser before, you will know that that is already pretty painful, so just imagine that particular brand of joy doubled, with the 150 lines of code giving you the simplest of parsers, with just 5 tags processed. Fortunately, there is a simpler way.

A simpler way: Objective-XML's SAX

Assuming you have already written a Cocoa-(Touch-)based parser using NSXMLParser, all you need to do is include Objective-XML in your projects and replace the reference to NSXMLParser with a reference to MPWSAXParser, everything else will work just as before. Well, the same except for being significantly faster (even faster than libxml2) and now also more responsive on slow connections due to incremental processing.

I have to admit that not having incremental processing was a "feature" Objective-XML shared with NSXMLParser until very recently, due to my not taking into account the fact that latency lags bandwidth. This silly oversight has now been fixed, with both MPWMAXParser and MPWSAXParser sporting URL-based parsing methods that do incremental processing.

So that's all there is to it, Objective-XML provides a drop-in replacement for NSXMLParser that has all the performance and responsiveness-benefits of a libxml2-based solution without the coding horror.

Even simpler: Messaging API for XML (MAX)

However, even a Cocoa version of the SAX API represents a pretty low-bar in terms of ease of coding. With MAX, Objective-XML provides an API that can do the same job much more simply. MAX naturally integrates XML processing with Objective-C messaging using the following two main features:
  • Clients get sent element-specific messages for processing
  • The parser handles nesting, controlled by the client
The following code for building Song objects out of iTunes <item> elements illustrates these two features:
-itemElement:(MPWXMLAttributes*)children attributes:(MPWXMLAttributes*)attributes parser:(MPWMAXParser*)p
{
  Song *song=[[Song alloc] init];
  [song setArtist:[children objectForTag:artist_tag]];
  [song setAlbum:[children objectForTag:album_tag]];
  [song setTitle:[children objectForTag:title_tag]];
  [song setCategory:[children objectForTag:category_tag]];
  [song setReleaseDate:[parseFormatter dateFromString:[children objectForTag:releasedate_tag]]];
  [self parsedSong:song];
  [song release];
  return nil;
}
MAX sends the -itemElement:attributes:parser: message to its client whenever it has encountered a complete <item> element, so there is no need for the client to perform string processing on tag names or manage partial state as in a SAX parser. The method constructs a song object using data from the <item> element's child elements which it then passes directly to the rest of the app via the parsedSong: message. It does not return an value, so MAX will not build a tree at this level.

Artist, album, title and category are the values of nested child elements of the <item> element. The (common) code shared by all these child-elements gets the character content of the respective elements and is shown below:

-defaultElement:children attributes:atrs parser:parser
{
	return [[children combinedText] retain];
}
Unlike the <item> processing code, which did not return a value, this method does return a value. MAX uses this return value to build a DOM-like structure which is then consumed by the next higher-level, in this case the -itemElement:attributes:parser: method shown above. Unlike a traditional DOM, the MAX tree structure is built out of domain-specific objects returned incrementally by the client.

These two pieces of sample code demonstrate how MAX can act like both a DOM parser or a SAX parser, controlled simply by wether the processing methods return objects (DOM) or not (SAX). They also demonstrated both element-specific and generic processing.

In the iTunes Song parsing example, I was able to build a MAX parser using about half the code required for the NSXMLParser-based example, a ratio that I have also encountered in larger projects. What about performance? It is slightly better than MPWSAXParser, so also somewhat better than libxml2 and significantly better than NSXMLParser.

Summary and Conclusion

The slightly misnamed XML Performance sample code for the iPhone demonstrates how important managing latency is for perceived end user performance, while showing only very little in terms of actual XML processing performance.

While ably demonstrating the performance problems of NSXMLParser, the sample code's solution of using libxml2 is really not a solution, due to the significant increase in code complexity. Objective-XML provides both a drop-in replacement for NSXMLParser with all the performance and latency benefits of the libxml2 solution, as well as a new API that is not just faster, but also much more straightforward than either NSXMLParser or libxml2.

13 comments:

Jens Ayton said...

These two pieces of sample code demonstrate how MAX can act like both a SAX parser or a MAX parser, controlled simply by wether the processing methods return objects (DOM) or not (SAX).

I believe the second MAX should read DOM.

Marcel Weiher said...

Thanks, fixed.

Mohan Vaze said...

I downloaded the Objective-XML-5.0.tar file and built mpwxmlkit. This project is giving compilation errors. Is there is any documentation on how to use these frameworks in projects?

Seth said...
This comment has been removed by the author.
Seth said...

Hi -

I just came across this posting and found it very interesting and informative - thank you!

I was wondering:
Is this still true in light of the iPhone 3.0 SDK? In other words, have they made performance improvements in NSXMLPARSER in 3.0?

Thanks,
Seth

etienne said...

i am not really a newbie but on some fronts I am. Where can i download the MAX parser files. The keywords objective, max, xml, download do not narrow google down enough

Unknown said...

The post scenarios are obviously dealing with uncompressed XML which would allow for incremental loading, however, this is only useful for very simple schemas that do not have complex nesting or referential characteristics. Does anyone have any experience with gzipped XML payloads or Binary XML on the IPhone?

Anonymous said...
This comment has been removed by a blog administrator.
stonemonk said...

just ran across this project while looking for a quicker drop in replacement for the SAX style NSXMLParser (using it for both iPhone and Mac)

got it working, but am a little disappointed with the performance after reading the website. you claim order of magnitude improvement for both speed and memory. I found it to be only 40% faster, and to utilize same memory.. It seems you are only doing streaming for networked files. I am reading huge local files. the biggest of which even cause my mac to run out of memory. would be nice to see streaming option for local files also.

also, oddly, using the debug version of your Core framework seems to be about 20% faster than the release build.. seems a little odd.

-a

Anonymous said...

Check out http://www.TBXML.co.uk for a super-fast, lightweight, easy to use XML parser!

Marcel Weiher said...

@Seth: as far as I know, NSXMLParser and friends have not been significantly improved.

@etienne: http://www.metaobject.com/downloads/Objective-C/Objective-XML-5.0.1.tgz

@Charles: MAX handles complex schemas just fine...how would schema-depth be related to compression?

@stonemonk: this looks very much like you are I/O bound. Alas, Objective-XML can't yet make your hard-disk run faster :-( I would be interested in finding out more about the files you are parsing and how you are parsing them. You will also likely see a lot of variance in the runs as soon as you start swapping to disk.

@Anonymous: TBXML looks interesting, though the fact that it is a DOM-style parser puts some limits on performance.

stonemonk said...

@Marcel: I ended up using libxml2 directly, found a 4x improvement over O-XML which makes sense I suppose. Those dictionaries of attributes getting pushed to me in startElement were killing me. I wrote a thin ObjC wrapper to libxml2 just to forward the callbacks. Now I really am IO bound, but the speed is acceptable, and even my largest files load.

Anonymous said...
This comment has been removed by a blog administrator.