Shortly after becoming an iPhone developer, I found a clever little piece of example code called
XML Performance (login required). Having done some
high performance XML processing code that works on the iPhone, I was naturally intrigued.
The example pits Cocoa's NSXMLParser against a custom parser based on libxml2, the benchmark is downloading a top 300 list of songs from iTunes.
More responsiveness using libxml2 instead of NSXMLParser
Based on my previous experience, I was expecting libxml2 to be noticeably faster, but with the advantage in processing speed being less and less important with lower and lower I/O data rates (WiFi to 3G to Edge), as I/O would start to completely overwhelm processing. Was I ever wrong!
While my expectations were technically correct for overall performance, I had completely failed to take responsiveness into account. Depending on the network selected, the NSXMLParser sample would appear to hang for 3 to 50 seconds before starting to show results. Needless to say, that is an awful user experience. The libxml example, on the other hand, would start displaying some results almost immediately. While it also was a bit faster in the total time taken, this effect seemed pretty insignificant compared to the fact that results were arriving continually pretty much during the entire time.
The difference, of course, is incremental processing. Whereas NSXMLParser's -initWithContentsOfURL: method apparently downloads the entire document first and then begins processing, the libxml2-based code in the sample downloads the XML in small chunks and processes those chunks immediately.
Alas, going with libxml2 has clear and significant disadvantages, with the code that uses libxml2 being around twice the size of the NSXMLParser-based code, at around 150 lines (non-comment, non-whitespace). If you have worked with NSXMLParser before, you will know that that is already pretty painful, so just imagine that particular brand of joy doubled, with the 150 lines of code giving you the simplest of parsers, with just 5 tags processed. Fortunately, there is a simpler way.
A simpler way: Objective-XML's SAX
Assuming you have already written a Cocoa-(Touch-)based parser using NSXMLParser, all you need to do is include Objective-XML in your projects and replace the reference to NSXMLParser with a reference to MPWSAXParser, everything else will work just as before. Well, the same except for being significantly faster (even faster than libxml2) and now also more responsive on slow connections due to incremental processing.
I have to admit that not having incremental processing was a "feature" Objective-XML shared with NSXMLParser until very recently, due to my not taking into account the fact that latency lags bandwidth. This silly oversight has now been fixed, with both MPWMAXParser and MPWSAXParser sporting URL-based parsing methods that do incremental processing.
So that's all there is to it, Objective-XML provides a drop-in replacement for NSXMLParser that has all the performance and responsiveness-benefits of a libxml2-based solution without the coding horror.
Even simpler: Messaging API for XML (MAX)
However, even a Cocoa version of the SAX API represents a pretty low-bar in terms of ease of coding. With MAX, Objective-XML provides an API that can do the same job much more simply. MAX naturally integrates XML processing with Objective-C messaging using the following two main features:
- Clients get sent element-specific messages for processing
- The parser handles nesting, controlled by the client
The following code for building Song objects out of iTunes <item> elements illustrates these two features:
-itemElement:(MPWXMLAttributes*)children attributes:(MPWXMLAttributes*)attributes parser:(MPWMAXParser*)p
{
Song *song=[[Song alloc] init];
[song setArtist:[children objectForTag:artist_tag]];
[song setAlbum:[children objectForTag:album_tag]];
[song setTitle:[children objectForTag:title_tag]];
[song setCategory:[children objectForTag:category_tag]];
[song setReleaseDate:[parseFormatter dateFromString:[children objectForTag:releasedate_tag]]];
[self parsedSong:song];
[song release];
return nil;
}
MAX sends the -itemElement:attributes:parser: message to its client whenever it has encountered a complete <item> element, so there is no need for the client to perform string processing on tag names or
manage partial state as in a SAX parser.
The method constructs a song object using data from the <item> element's child elements which it then passes directly to the rest of the app via the parsedSong: message. It does not return an value, so MAX will not build a tree at this level.
Artist, album, title and category are the values of nested child elements of the <item> element. The (common) code shared by all these child-elements gets the character content of the respective elements and is shown below:
-defaultElement:children attributes:atrs parser:parser
{
return [[children combinedText] retain];
}
Unlike the <item> processing code, which did not return a value, this method does return a value. MAX uses this return value to build
a DOM-like structure which is then consumed by the next higher-level, in this case the -itemElement:attributes:parser: method shown above. Unlike a traditional DOM, the MAX tree structure is built out of domain-specific objects returned incrementally by the client.
These two pieces of sample code demonstrate how MAX can act like both a DOM parser or a SAX parser, controlled simply by wether the processing methods return objects (DOM) or not (SAX). They also demonstrated both element-specific and generic processing.
In the iTunes Song parsing example, I was able to build a MAX parser using about half the code required for the NSXMLParser-based example, a ratio that I have also encountered in larger projects. What about performance? It is slightly better than MPWSAXParser, so also somewhat better than libxml2 and significantly better than NSXMLParser.
Summary and Conclusion
The slightly misnamed XML Performance sample code for the iPhone demonstrates how important managing latency is for perceived end user performance, while showing only very little in terms of actual XML processing performance.
While ably demonstrating the performance problems of NSXMLParser, the sample code's solution of using libxml2 is really not a solution, due to the significant increase in code complexity. Objective-XML provides both a drop-in replacement for NSXMLParser with all the performance and latency benefits of the libxml2 solution, as well as a new API that is not just faster, but also much more straightforward than either NSXMLParser or libxml2.