Sunday, May 9, 2010

iPhone XML: from DOM to incremental SAX

In my last post, I extended Ray Wenderlich's XML parser comparison to MAX, and performance seemed to be about 2x better than the nearest competitor, TBXML and around 7x faster than NSXMLParser, Cocoa Touch's built-in XML parser.

While the resulting code has been posted here, I haven't yet explained how I got there.  First, I downloaded both Ray's project and Objective-XML 5.3.   iPhone OS requires a bit of trickiness to get a reusable library, partly due to the fact that frameworks or dynamic libraries are not supported, partly due to the fact that simulator and device are not just different Xcode architectures of a single SDK, and not even just different SDKs, but actually different platforms.  If anyone can tell me how to create a normal target that can compile/links for both the simulator and the device, I'd love to hear about it!

So, in order to compile for iPhoneOS, you'll need to select the iXmlKit target:


You'll need to build twice, changing Active SDK settings once to the the device and once to the simulator.  You will then have two copies of the libiXmlKit.a library, one in the directory Release-iphoneos, the other in Release-iphonsimulator (both relative to your base build directory):


marcel@mpwls[LD]ls -la ~/programming/Build/Release-iphoneos/ 
-rw-r--r--   1 marcel  staff   521640 May  9 19:52 libiXmlKit.a


These two copies then need to be joined together using the  lipo command to create a single library that can be used both with the simulator and the device.

lipo -create Release-iphoneos/libiXmlKit.a Release-iphonesimulator/libiXmlKit.a -output  Release/libiXmlKit.a

(Newer Objective-XML versions will have a shell-script target that automates this process).  Once I had the fat libiXmlKit.a library, I created a new MAX group in the XMLPerformance project, and copied both the library and the MAX header file into that group:



I then created a new MAX Song parser class:

#import "iTunesRSSParser.h"


@interface MAXSongParser : iTunesRSSParser {

  id parser;

  NSDateFormatter *parseFormatter;



The implementation is also fairly straightforward, with your basic init method:


#import "MAXSongParser.h"

#import "MPWMAXParser.h"

#import "Song.h"


@implementation MAXSongParser




 self=[super init];

    parseFormatter = [[NSDateFormatter alloc] init];

    [parseFormatter setDateStyle:NSDateFormatterLongStyle];

    [parseFormatter setTimeStyle:NSDateFormatterNoStyle];

    MPWMAXParser *newParser=[MPWMAXParser parser];

    [newParser setUndefinedTagAction:MAX_ACTION_NONE];

    [newParser setHandler:self forElements:[NSArray arrayWithObjects:                 @"item",@"album",@"title",@"channel",@"rss",@"category",nil]

               inNamespace:nil prefix:@"" map:nil];

    [newParser setHandler:self forElements:[NSArray arrayWithObjects:



             prefix:@"" map:nil];

    parser=[newParser retain];

   return self;


The MAX parser is initialized in this init method.  We define the elements we care about using the two "setHandler:forElements:inNamespace:prefix:map:" messages, one for each namespace we will be dealing with.  In the default (RSS) namespace, we are interested in the "item", "album", "title", "channel", "rss" and "category" elements.  In Apple's special "itms" namespace, we will handle "releasdate", "artist" and "album".  Setting MAX_ACTION_NONE as the undefined tag action means that the parser will ignore elements not listed as interesting and all their sub-elements.

Songs are created in the -itemElement:... method, which turns the relevant child-elements of the item element into Song attributes:

-itemElement:children attributes:attributes parser:parser

   Song *song=[[Songalloc] init];
   [song setAlbum:[children objectForUniqueKey:@"album"]];
   [song setTitle:[children objectForUniqueKey:@"title"]];
   [song setArtist:[children objectForUniqueKey:@"artist"]];
   [song setAlbum:[children objectForUniqueKey:@"album"]];
   [song setCategory:[children objectForUniqueKey:@"category"]];
   [song setReleaseDate:[parseFormatter dateFromString:[children objectForUniqueKey:@"releasedate"]] ];
   return song;

Two more methods make the actual parse process complete:  <channel> elements have one or more <item> elements, so we want to return all of them, using "objectsForKey:":

-channelElement:children attributes:attributes parser:parser
  return [[children objectsForKey:@"item"] retain];

Finally, there are a bunch of elements that we have defined interest in but treat identically...these can be handled using the "default" element handler:


-defaultElement:children attributes:attributes parser:parser
  return [[children lastObject] retain];


That concludes the routines that actually parse the XML into objects, now for kicking off the parser. With the timing code removed, the method is fairly straightforward:


- (void)downloadAndParse:(NSURL *)url {
   id pool=[NSAutoreleasePool new];
   [parserparse: [NSData dataWithContentsOfURL:url]];
   for ( id song in [parserparseResult] ) {
       [self performSelectorOnMainThread:@selector(parsedSong:)
             withObject:song waitUntilDone:NO];
   [pool release];
With the timing code, it all gets a bit messier:


- (void)downloadAndParse:(NSURL *)url {
   id pool=[NSAutoreleasePool new];
   [self performSelectorOnMainThread:@selector(downloadStarted) withObject:nil waitUntilDone:NO];
   NSData *data=[NSData dataWithContentsOfURL:url];
   [self performSelectorOnMainThread:@selector(downloadEnded) withObject:nil waitUntilDone:NO];
   NSTimeInterval start = [NSDate timeIntervalSinceReferenceDate];
   for ( id song in [parserparseResult] ) {
      [selfperformSelectorOnMainThread:@selector(parsedSong:) withObject:song waitUntilDone:NO];
   NSTimeInterval duration = [NSDatetimeIntervalSinceReferenceDate] - start;
   [selfperformSelectorOnMainThread:@selector(addToParseDuration:) withObject:[NSNumbernumberWithDouble:duration] waitUntilDone:NO];
   [selfperformSelectorOnMainThread:@selector(parseEnded) withObject:nilwaitUntilDone:NO];

[[NSURLCachesharedURLCache] removeAllCachedResponses];

 [pool release];



This produces a non-incremental DOM-style parser, so we first download the data, then process it into a DOM and finally transfer the processed objects to the main thread for display.  It differs from other DOM-syle XML parsers in that it actually produces domain objects (as a Domain Object Model parser arguably should), rather than a generic XML DOM that must then be converted to objects.

Turning the DOM-style parser into a SAX-stye parser is almost completely trivial.  Instead of returning the Song objects at the end of itemElement:..


   [song setReleaseDate:[parseFormatterdateFromString:[children objectForUniqueKey:@"releasedate"]] ];
  return song;



we instead pass them to the delegate there and return nil so no tree is constructed:

   [song setReleaseDate:[parseFormatterdateFromString:[children objectForUniqueKey:@"releasedate"]] ];

[self performSelectorOnMainThread:@selector(parsedSong:)

withObject:song waitUntilDone:NO];

[song release];

return nil;



This means we can also remove the "channelElement" method and the for loop in downloadAndParse: that passed the Song objects to the main thread.  This is a SAX-style parser (though it doesn't use the SAX methods and does produce domain objects), but it is still non-incremental because it first downloads all the data and then parses it.  If we want to turn the SAX parser into an incremental parser that overlaps processing with downloading, there is one final tweak that fortunately further simplifies the downloadAndParse method (again with timing code removed):


- (void)downloadAndParse:(NSURL *)url {
   id pool=[NSAutoreleasePoolnew];
   [[NSURLCachesharedURLCache] removeAllCachedResponses];
   [pool release];

While this is probably best not only in terms of performance and responsiveness, but also in terms of code size, it doesn't play well with the XMLPerformance example, because there are no external measurement hooks that allow us to separate downloading from parsing for performance measurement purposes.

In addition, the XMLPerformance example is odd in that is both multi-threaded and measure real time rather than CPU time when measuring parse performance.  The reason this is odd is that when both parsing and display are active, the scheduler can take the CPU away from the XML parsing thread at any time and switch to the display thread, but this time will be counted as XML parsing time by the measurement routines.  This is obviously incorrect and penalizes all the incremental parsers, which is why Ray's comparison showed all the DOM parsers as performing better than the SAX parsers.

I hope these explanations show how to create different styles of parsers using MAX.





No comments: