Showing posts with label performance. Show all posts
Showing posts with label performance. Show all posts

Wednesday, November 7, 2012

Cocoa / Objective-C performance course at Kloster Eberbach

Stefanie Seitz is organizing a Cocoa / Objective-C performance course at Kloster Eberbach, held by yours truly, December 9-12th.

I am sure it'll be fun and hope it will be instructive. If you're interested, give Stefanie a shout!

Update: December was a bit optimistic for all involved, so we're tentatively moving this out to the beginning of March. I'll post the exact date once it has settled.

Thursday, November 1, 2012

CGLayer performance and quality

The CoreGraphics CGLayer object seems to keep causing confusion. I hope I can clear that up a bit.

The intent of CGLayer was to have some sort of object for optimized repeated drawing, similar to how you would call a drawing procedure multiple times, have multiple instances of a view or simply draw the same PDF or bitmap image multiple times. The difference was supposed to be that the drawing layer could do secret magic "stuff" to optimize the repeated instance drawings. Not giving away the specific representation or optimizations performed means that whatever is done can be adapted both to different contexts and with changing technology.

Currently, that primarily means a texture stored on the graphics card, and that is and was one of the possible optimizations. However, the reasoning for such an opaque optimized object goes back further, at least to NeXTStep/Rhapsody: DisplayPostscript was implemented as a server process, and despite Mach's fast memory-remapping messages, shipping images back and forth between the client and server was not an optimization, and so you couldn't use (client side) bitmaps for optimized drawing. Well you could, but it wouldn't be optimized.

With Quartz becoming a client-side drawing library in Mac OS X, some of the reasons for these distinctions disappeared, but the distinctions (such as NSBitmapImageRep vs. NSCachedImageRep) remained. I think there were plans for CGLayer to become a kind of NSCachedImageRep on steroids, for example maintaining OpenGL display lists, but as far as I could tell, those plans never really panned out: in my tests drawing a CGLayer always looked the same as drawing a cached bitmap, and also performed similarly.

In 10.5, most of the accumulated cruft was cleaned up, NSBitmapImageRep for examples is now only a fairly thin wrapper around its underlying CGImage, and NSCachedImageRep was deprecated completely in 10.6, as it no longer has any advantages. With these changes, drawing a CGImage or a NSBitmapImageRep became as fast as possible, with CGImages stored on the graphics card. At this point CGLayer was essentially abandoned, and for drawing to the screen, you can just as easily use a NSBitmapImageRep, CGImage, UIImage etc.

So is CGLayer now completely useless? Not quite! It turns out that when drawing to a PDF context, CGLayer will generate a PDF Form object, which stores the original drawing commands (vectors, text, images). This is obviously much better than drawing a bitmap, both in terms of quality and in terms of file size and performance.

So if your plans include printing or generating a PDF, then I would recommend taking a look at CGLayer for repeated drawing of (vector) content.

Monday, October 15, 2012

CoreGraphics, patterns and resolution independence (not just) for retina displays

In a recent post with followup, Mark Granoff demonstrates how to intelligently deal with the need for higher resolution backgrounds by using CoreGraphics pattern images, particularly using the [UIColor colorWithPatternImage:] method. However, he does wonder why he still has to deal with retina resolution issues at some points in the code, when "…the docs say that CoreGraphics handles scaling issues automatically."

That's a good question, and the answer lies in the fact that the example uses pattern images and mask images, rather than CoreGraphics patterns and geometric primitives. Once you explicitly ask for bitmap representations, you will be dealing with pixels and different resolution. The clue is to avoid going to pixels as much and as long as possible. The doughnut shape, for example, can easily be achieved using basic geometry and a little knowledge of the Postscript/PDF fill rules.

Using the standard "nonzero-winding-number" rule, a doughnut effect can be achieved by having the two arcs that are nested inside each other drawn in opposite directions. That's one of the reasons the extra "clockwise" parameter exists.


  NSPoint centerPoint = NSMakePoint([view frame].size.width/2, 150);
  [context arcWithCenter:centerPoint
           radius:50 
           startDegrees:0
           endDegrees:360  
           clockwise:YES];
  [context arcWithCenter:centerPoint
           radius:100
           startDegrees:0
           endDegrees:360  
           clockwise:NO];
  [context fill];

(The code examples here use MPWDrawingContext for convenience, pure CoreGraphics code tends to be two to three times more verbose). The second way to achieve the doughnut would be to just use the even/odd fill rule, in which case the direction doesn't matter. matter.

Patterns can also be specified geometrically, or rather with callbacks to draw the pattern shape. Objective-C Blocks are really a perfect fit for specifying these sorts of callbacks, but were only introduced much later than the CoreGraphics pattern callback API. The following code shows how to specify the diamond pattern via an Objective-C block, courtesy of some glue API provided by MPWDrawingContext.


        NSSize patternSize=NSMakeSize(16,16);
        id diamond = [context laterWithSize:patternSize
                              content:^(id  context){
            id red = [context colorRed:1.0 green:0.0 blue:0.0 alpha:1.0];
            [context setFillColor:red];
            [[context moveto:patternSize.width/2 :2] 
				lineto:patternSize.width-2 :patternSize.height/2];
            [[context lineto:patternSize.width/2 :patternSize.height-2]
				lineto:2 :patternSize.height/2];
            [[context closepath] fill];
        }];
        [context setFillColor:diamond];
        [[context nsrect:[[self view] frame]] fill];

The "laterWithSize:content:" message creates a callback object that not only encapsulates the block, but also implements a -CGColor method so the callback can be used directly as a color in -setFillColor:.

With all the graphics specified using pure geometry, CoreGraphics can now do its thing and automatically handle varying device resolutions, wether it's a retina display or a zoomable interface or even print, all without ever having to deal with the different resolutions in code. Although I haven't tested it, the code should also use less memory, because it doesn't create potentially large temporary bitmaps, and for the cherry on top it's also a fraction of the code. CoreGraphics rules!

Forked project on github.

Sunday, April 8, 2012

Why Testing Matters for Performance

While performance certainly matters for testing, due to fast tests not breaking the feedback cycle that's so important in being agile (rather than "Doing Agile", which just isn't the same thing), the reverse is also true: with a trusted set of unit tests, you are much more likely to show the courage needed to make bold changes to working code in order to make it go faster.

So performance and testing are mutually dependent.

Tuesday, March 27, 2012

CocoaHeads Berlin Performance Talk

Last Wednesday, March 21st 2012, I held a talk on Cocoa Performance at the monthly Berlin CocoaHeads meeting. Thanks to everyone for the kind reception and benignly overlooking the fact that the end of the talk was a bit hurried.

Although the slides were primarily supportive and do not necessarily stand on their own, they are now available online by popular request (2.3MB, Keynote).

And a PDF version (6.3MB).

30k requests/s, aka wrk is fast

I just discovered wrk, a small and very fast http load testing tool. My previous experiments with µhttp-based MPWSideWeb first maxed out at around 5K requests per second, and after switching to httperf, I got up to around 10K per second. Using wrk on the same machine as previously, I now get this:
marcel@nomad[~]wrk  -c 100 -r 300000 http://localhost:8082/hi
Making 300000 requests to http://localhost:8082/hi
  2 threads and 100 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency     4.60ms    2.51ms  11.47ms   62.40%
    Req/Sec    14.98k     0.99k   16.00k    52.80%
  300002 requests in 9.71s, 21.46MB read
Requests/sec:  30881.96
Transfer/sec:      2.21MB
marcel@nomad[~]curl http://localhost:8082/hi
Hello World!


So a nice 30K requests per second on a MacBook Air, and wrk was only running at around 60% CPU, whereas httperf tended to be pegged at 100%. The web-server in question is a minimal server like sinatra.rb set up to return a simple "Hello world!".
marcel@localhost[scripts]cat memhttpserver.stsh 
#!/usr/local/bin/stsh
context loadFramework:'MPWSideWeb'
server := MPWHTTPServer new.
server setPort: 8082.
stdout println: 'memhttpserver listening on port ',server port stringValue.
server start:nil.

scheme:base := MPWSiteMap scheme.
base:/hi := 'Hello World!'.
server setDelegate: scheme:base .

shell runInteractiveLoop

marcel@localhost[scripts]./memhttpserver.stsh 
memhttpserver listening on port 8082
>

So I definitely have a new favorite http performance tester!

30k entries (II), aka computers have RAM, and they can do I/O, too...

Let's assume a document storage system with an assumed maximum working set of 30K documents. Let's also assume we want to store some tags, maybe 10 per document, encoded as 32 bit integers (8 bits tag type, 24 bits tag value used as an index). That would be:
    30K documents x 10 tags/document x 4 bytes/tag = 
                           300K tags x 4 bytes/tag =
                                      1200 K bytes = 1.2 MB
Even assuming 2:1 bloat due to to overhead gives us 2.4 MB, which should not just fit comfortably into the RAM of a modern computer or a cellphone, it actually fits comfortably into the L3 cache of an Intel Core i7 with 8-10MB to spare.

What about getting that data into RAM? The slowest hard drives (non-SSD) I could find using a quick web search had a transfer rate of better than 48MB/s and a seek time of around 10ms, so the 2.4MB in question should be in memory in around:

 10ms + 2.4MB / (48MB/s) = 
           10ms + 0.05 s =
           10ms +  50 ms =  60 ms
So less than 1/10th of a second to read it in, and a moderately fast SSD reduces that to 10ms.

EDIT: fixed embarrassing typo (L1 -> L3 cache).

30k entries (I), aka computers are fast

Let's assume a document storage system with an assumed maximum working set of 30K documents. Let's say we want to assign tags to documents and search based on those tags, with an average of 10 tags per document. Using the most brain-dead algorithm available, linear scan of the document entries and string comparison on the tags, what would it take to search through those documents? Could we maintain immediate feedback?

Measuring quickly on my laptop reveals that strcmp() takes around 8ns for a long matching string and 2ns for a non-match in the first character (with first character optimization). Splitting the difference and thus not taking into account that non-matches tend to be more common than matches, let's assume 5ns to compare each tag.

 5ns /tag x 10 tags / document x 30k documents = 
               50ns / document x 30K documents = 
                                      1500K ns =  
                                       1500 µs = 1.5 ms
So an approach that takes longer than, say, 2ms to do such a search can probably be improved.

Of course, we could do something slightly less thoroughly braindead and represent tags using integer, er, tags. A simple integer comparison should be less then one nanosecond, so that would drop the time to below 300 µs. With that, we could do 3000 queries per second, or 300 queries every tenth of second (the generally accepted threshold for interactive performance).

In theory, we could actually start optimizing ever so slightly by storing lists of document ids with each tag and then simply doing set operations on the document lists stored with each tag. But we don't really have to.

Friday, December 9, 2011

Ruby (and Rails) scalability?

Recently I wrote about Node.jsperformance, comparing it to my (still non-public, sorry!) Objective-C based libµhttp wrapper and Apache on my main development machine of the time, a MacBook Pro.

Node.js did really well on tasks that have lots of concurrent requests that are mostly waiting, did OK on basic static serving tasks and not so well on compute-intensive tasks.

Having developed an interest in minimal web-servers, I wondered how Sinatra and, by association, Ruby on Rails would do.

For Sinatra I used the scanty blog engine and the basic "Hello World" example:


require 'sinatra'

get '/hi' do
  "Hello World!"
end

For Ruby on Rails, I used the blog tutorial "out of the box", invoking it with "rails s" to start the server. In addition I also had RoR just serving a static file instead of the database-backed blog. All this on my new dev machine, a 2011 MacBook Air with 1.8 GHz Intel Core i7 and 4 GB of DRAM. I also discovered that httperf is a much better benchmark program for my needs than ab. I used it with 100 requests per connection, a burst length of 100 and a sufficient number of connections to get stable results without taking all day.

Platform# requests/sec
Apache5884
Sinatra Hello World357
Ruby on Rails static page312
Sinatra scanty blog101
Ruby on Rails blog17
This seems really slow, even when doing virtually nothing, and absolutely abysmal when using the typical RoR setup, serving records from an SQL database via templates. I distinctly remember my G5 serving in the thousands of requests/s using WebObjects 6-7 years ago.

Is this considered normal or am I doing something wrong?

Tuesday, February 15, 2011

Only 1 GHz

Tim Bray wonders (well, wondered a while ago) about the excellent perceived performance of the iPad at 'only' 1Ghz.

Hmm

1 GHz actually seems like quite a lot to me. 1000 times an Apple ][, 40 times a NeXT, and the latter was driving a megapixel display. I guess we have gotten used to wasted cycles.

Monday, January 10, 2011

Little Message Dispatch

Brent Simmons's recent notes on threading show a great, limited approach to threading that appears to work well in practice. If you haven't read it and are at all interested in threading on OS X or iOS, I suggest you head over there right now.

I feel much the same way, that is although I think Grand Central Dispatch is awesome, I simply haven't been able to justify spending much time with it, because it usually turns out that my own threading needs so far have been far more modest than what GCD provides. In fact, I find that an approach that's even more constrained than the one based on NSOperationQueue that Brent describes has been working really well in a number of projects.

Instead of queueing up operations and letting them unwind however, I just spawn a single I/O thread (at most a few) and then have that perform the I/O deterministically. This is paired with a downloader that uses the NSURL loading system to download any number of requests in parallel.


- (void)downloadNewsContent
{       
        id pool=[NSAutoreleasePool new];
        
        [[self downloader] downloadRequests:[self thumbnailRequests]];
        [[self downloader] downloadRequests:[self contentRequests]];
        [[self downloader] downloadOnlyRequests:[self imageRequests]];
        [pool release];
}


This loads 3 types of objects: first the thumbnails, then article content, then images associated with the articles. The sequencing is both deliberate (thumbs first, article images cannot be loaded before the article content is present) and simply expressed in the code by the well-known means of just writing the actions one after the other, rather than having those dependencies expressed in call-backs, completion blocks or NSOperation subclasses.

So work is done semi-sequentially in the background, while coordination is done on the main thread, with liberal use of performSelectorOnMainThread. Of course, I make that a little simpler with a couple of HOMs that dispatch messages to threads:

  • async runs the message on a new thread, I use it for long-running, intrinsically self contained work. It is equivalent to performSelectorInBackground: except for being able to take an arbitrary message.
  • asyncOnMainThread and syncOnMainThread are the equivalents of performSelectorOnMainThread, with the waitUntilDone flag set to YES or NO
  • afterDelay: sends he message after the specified delay
Here is a bit of code that shows how to have a dispatch a long-running thread and have it communicate status to the main thread.

-(void)loadSections {
	[[self asyncOnMainThread] showSyncing];
	[[[self sections] do] downloadNewsContent];
	[[self asyncOnMainThread] showDoneSyncing];
}
 ...
 -(IBAction)syncButtonClicked {
	[[self async] loadSections];
}


Brent sums it up quite well in his post:
Here’s the thing about code: the better it is, the more it looks and reads like a children’s book.
Yep.

Sunday, May 9, 2010

iPhone XML: from DOM to incremental SAX

In my last post, I extended Ray Wenderlich's XML parser comparison to MAX, and performance seemed to be about 2x better than the nearest competitor, TBXML and around 7x faster than NSXMLParser, Cocoa Touch's built-in XML parser.

While the resulting code has been posted here, I haven't yet explained how I got there.  First, I downloaded both Ray's project and Objective-XML 5.3.   iPhone OS requires a bit of trickiness to get a reusable library, partly due to the fact that frameworks or dynamic libraries are not supported, partly due to the fact that simulator and device are not just different Xcode architectures of a single SDK, and not even just different SDKs, but actually different platforms.  If anyone can tell me how to create a normal target that can compile/links for both the simulator and the device, I'd love to hear about it!

So, in order to compile for iPhoneOS, you'll need to select the iXmlKit target:

selector.png

You'll need to build twice, changing Active SDK settings once to the the device and once to the simulator.  You will then have two copies of the libiXmlKit.a library, one in the directory Release-iphoneos, the other in Release-iphonsimulator (both relative to your base build directory):

 

marcel@mpwls[LD]ls -la ~/programming/Build/Release-iphoneos/ 
-rw-r--r--   1 marcel  staff   521640 May  9 19:52 libiXmlKit.a

 

These two copies then need to be joined together using the  lipo command to create a single library that can be used both with the simulator and the device.

lipo -create Release-iphoneos/libiXmlKit.a Release-iphonesimulator/libiXmlKit.a -output  Release/libiXmlKit.a

(Newer Objective-XML versions will have a shell-script target that automates this process).  Once I had the fat libiXmlKit.a library, I created a new MAX group in the XMLPerformance project, and copied both the library and the MAX header file into that group:

MAX-in-XMLPerf.png

 

I then created a new MAX Song parser class:

#import "iTunesRSSParser.h"

 

@interface MAXSongParser : iTunesRSSParser {

  id parser;

  NSDateFormatter *parseFormatter;

}

@end

The implementation is also fairly straightforward, with your basic init method:

 

#import "MAXSongParser.h"

#import "MPWMAXParser.h"

#import "Song.h"

 

@implementation MAXSongParser

 

-init

{

 self=[super init];

    parseFormatter = [[NSDateFormatter alloc] init];

    [parseFormatter setDateStyle:NSDateFormatterLongStyle];

    [parseFormatter setTimeStyle:NSDateFormatterNoStyle];

    MPWMAXParser *newParser=[MPWMAXParser parser];

    [newParser setUndefinedTagAction:MAX_ACTION_NONE];

    [newParser setHandler:self forElements:[NSArray arrayWithObjects:                 @"item",@"album",@"title",@"channel",@"rss",@"category",nil]

               inNamespace:nil prefix:@"" map:nil];

    [newParser setHandler:self forElements:[NSArray arrayWithObjects:

                 @"releasedate",@"artist",@"album",nil]

             inNamespace:@"http://phobos.apple.com/rss/1.0/modules/itms/" 

             prefix:@"" map:nil];

    parser=[newParser retain];

   return self;

}

The MAX parser is initialized in this init method.  We define the elements we care about using the two "setHandler:forElements:inNamespace:prefix:map:" messages, one for each namespace we will be dealing with.  In the default (RSS) namespace, we are interested in the "item", "album", "title", "channel", "rss" and "category" elements.  In Apple's special "itms" namespace, we will handle "releasdate", "artist" and "album".  Setting MAX_ACTION_NONE as the undefined tag action means that the parser will ignore elements not listed as interesting and all their sub-elements.

Songs are created in the -itemElement:... method, which turns the relevant child-elements of the item element into Song attributes:

-itemElement:children attributes:attributes parser:parser

{
   Song *song=[[Songalloc] init];
   [song setAlbum:[children objectForUniqueKey:@"album"]];
   [song setTitle:[children objectForUniqueKey:@"title"]];
   [song setArtist:[children objectForUniqueKey:@"artist"]];
   [song setAlbum:[children objectForUniqueKey:@"album"]];
   [song setCategory:[children objectForUniqueKey:@"category"]];
   [song setReleaseDate:[parseFormatter dateFromString:[children objectForUniqueKey:@"releasedate"]] ];
   return song;
}

Two more methods make the actual parse process complete:  <channel> elements have one or more <item> elements, so we want to return all of them, using "objectsForKey:":

-channelElement:children attributes:attributes parser:parser
{
  return [[children objectsForKey:@"item"] retain];
}

Finally, there are a bunch of elements that we have defined interest in but treat identically...these can be handled using the "default" element handler:

 

-defaultElement:children attributes:attributes parser:parser
{
  return [[children lastObject] retain];
}

 

That concludes the routines that actually parse the XML into objects, now for kicking off the parser. With the timing code removed, the method is fairly straightforward:

 

- (void)downloadAndParse:(NSURL *)url {
   id pool=[NSAutoreleasePool new];
   [parserparse: [NSData dataWithContentsOfURL:url]];
   for ( id song in [parserparseResult] ) {
       [self performSelectorOnMainThread:@selector(parsedSong:)
             withObject:song waitUntilDone:NO];
   }
   [pool release];
}
With the timing code, it all gets a bit messier:

 

- (void)downloadAndParse:(NSURL *)url {
   id pool=[NSAutoreleasePool new];
   [self performSelectorOnMainThread:@selector(downloadStarted) withObject:nil waitUntilDone:NO];
   NSData *data=[NSData dataWithContentsOfURL:url];
   [self performSelectorOnMainThread:@selector(downloadEnded) withObject:nil waitUntilDone:NO];
   NSTimeInterval start = [NSDate timeIntervalSinceReferenceDate];
   [parserparse:data];
   for ( id song in [parserparseResult] ) {
      [selfperformSelectorOnMainThread:@selector(parsedSong:) withObject:song waitUntilDone:NO];
   }
   NSTimeInterval duration = [NSDatetimeIntervalSinceReferenceDate] - start;
   [selfperformSelectorOnMainThread:@selector(addToParseDuration:) withObject:[NSNumbernumberWithDouble:duration] waitUntilDone:NO];
   [selfperformSelectorOnMainThread:@selector(parseEnded) withObject:nilwaitUntilDone:NO];

[[NSURLCachesharedURLCache] removeAllCachedResponses];

 [pool release];

}

 

This produces a non-incremental DOM-style parser, so we first download the data, then process it into a DOM and finally transfer the processed objects to the main thread for display.  It differs from other DOM-syle XML parsers in that it actually produces domain objects (as a Domain Object Model parser arguably should), rather than a generic XML DOM that must then be converted to objects.

Turning the DOM-style parser into a SAX-stye parser is almost completely trivial.  Instead of returning the Song objects at the end of itemElement:..

 

   [song setReleaseDate:[parseFormatterdateFromString:[children objectForUniqueKey:@"releasedate"]] ];
  return song;
}

 

 

we instead pass them to the delegate there and return nil so no tree is constructed:

   [song setReleaseDate:[parseFormatterdateFromString:[children objectForUniqueKey:@"releasedate"]] ];

[self performSelectorOnMainThread:@selector(parsedSong:)

withObject:song waitUntilDone:NO];

[song release];

return nil;

}

 

This means we can also remove the "channelElement" method and the for loop in downloadAndParse: that passed the Song objects to the main thread.  This is a SAX-style parser (though it doesn't use the SAX methods and does produce domain objects), but it is still non-incremental because it first downloads all the data and then parses it.  If we want to turn the SAX parser into an incremental parser that overlaps processing with downloading, there is one final tweak that fortunately further simplifies the downloadAndParse method (again with timing code removed):

 

- (void)downloadAndParse:(NSURL *)url {
   id pool=[NSAutoreleasePoolnew];
  [parserparseDataFromURL:url];
   [[NSURLCachesharedURLCache] removeAllCachedResponses];
   [pool release];
}

While this is probably best not only in terms of performance and responsiveness, but also in terms of code size, it doesn't play well with the XMLPerformance example, because there are no external measurement hooks that allow us to separate downloading from parsing for performance measurement purposes.

In addition, the XMLPerformance example is odd in that is both multi-threaded and measure real time rather than CPU time when measuring parse performance.  The reason this is odd is that when both parsing and display are active, the scheduler can take the CPU away from the XML parsing thread at any time and switch to the display thread, but this time will be counted as XML parsing time by the measurement routines.  This is obviously incorrect and penalizes all the incremental parsers, which is why Ray's comparison showed all the DOM parsers as performing better than the SAX parsers.

I hope these explanations show how to create different styles of parsers using MAX.

 

 

 

 

Sunday, May 2, 2010

iPhone XML Performance Revisited

Ray Wenderlich has done a great comparison of iPhone XML parsers, using the same sample I had looked at earlier in the context of responsiveness.

 

As Ray was comparing performance, my hobby-horse, I was obviously curious as to how MAX stacked up against all the upstart competition. Quite well, as it turns out (average of 5 runs on an iPad with 3.2):


Figure 1: total times (seconds)


MAX was about 50% faster than the closest competition, TBXML, at 0.43s vs. 0.61s.

 

However, the XMLPerformance sample is a bit weird in that it measures elapsed time, not CPU time, and is multi-threaded, updating the display as results come in.

 

In order to account for this overhead that has nothing to do with the XML parsers, I added a "Dummy" parser that doesn't actually parse anything, but rather just generates dummy Song entries as quickly as possible. This takes around 0.2 seconds all by itself. Subtracting this non-XML overhead from the total times yields the following results:


Figure 2: XML parse times (seconds)


When measuring just the XML parsers themselves, MAX is around twice as fast as the closest competition and seven times as fast as the built in NSXMLParser.

 

Sweet.

[Update]  I forgot the link to where I had uploaded the source: XMLPerformance.tgz at http://www.metaobject.com/downloads/Objective-C/

Monday, January 25, 2010

Objective-XML 5.3

New in this release:
  • Cocotron targets for Windows support.
  • XMLRPC support.
  • No longer uses 'private' API that was causing AppStore rejections for some iPhone apps using Objective-XML.
  • Support for numeric entitites.
http://www.metaobject.com/downloads/Objective-C/Objective-XML-5.3.tgz.

Friday, November 6, 2009

Why not Objective-C?

Patrick Logan can't understand why projects use C++ rather than Ojective-C. Neither can I.

For the 95% (or more) of code that isn't performance sensitive, it gives you expressiveness very close to Smalltalk, and for the 5% or less that need high performance, it gets you the performance and predictability of C.

Sunday, February 8, 2009

Objective-XML-5.0.1

Just pushed out a minor bugfix release to Objective-XML-5.0:
  • Re-enabled character-set conversion code that had gotten disabled
  • Fixed a compile-error for some targets
  • Other minor improvements
Download here: http://www.metaobject.com/downloads/Objective-C/Objective-XML-5.0.1.tgz

Sunday, January 25, 2009

Objective-XML 5.0

I've just pushed out a new release of Objective-XML, with some pretty significant new features.

Incremental parsing

This feature, which was already discussed a little in an earlier post, is now available in an official release. In short, Objective-XML will now stream data from network data sources (specified by URL) and produce results incrementally, rather than reading all of the data first and then parsing it. This can make a huge difference in responsiveness and perceived performance for slow networks. CPU and memory consumption will be slightly higher because of extra buffering and buffer stitching required, so this should only be used when necessary.

Static iPhone library

Although Objective-XML has always been compatible with the iPhone, previous releases required copying the pre-requisite files into your project. This burden has now been eased by the inclusion of a static library target. You still need to copy the headers, either MPWMAXParser.h or MPWXmlParser.h (or both).

Unique keys

Previous releases of Objective-XML had an -objectForTag:(int)tag method for quickly retrieving attribute or element values.


enum songtags {
  item_tag=10, title_tag, category_tag	
};
...
  [parser setHandler:self forElements:[NSArray arrayWithObjects:@"item",@"title",@"category",nil]
          inNamespace:nil prefix:@"" map:nil tagBase:item_tag];
...
-itemElement:(MPWXMLAttributes*)children attributes:(MPWXMLAttributes*)attributes parser:(MPWMAXParser*)p
{
   ...
   [song setTitle:[children objectForTag:title_tag]];
   ...

Objective-XML adds an -objectForUniqueKey:aKey method that removes the need for these additional integer tags.
...
  [parser setHandler:self forElements:[NSArray arrayWithObjects:@"item",@"title",@"category",nil]
          inNamespace:nil prefix:@"" map:nil];
...
-itemElement:(MPWXMLAttributes*)children attributes:(MPWXMLAttributes*)attributes parser:(MPWMAXParser*)p
{
   ...
   [song setTitle:[children objectForUniqueKey:@"title"]];
   ...


In addition to providing faster access, the integer tags also served to disambiguate tag names that might occur in multiple namespaces. To handle these conflicts, there now is a -objectForUniqueKey:aKey namespace:aNamespace method. The namespace objects required for this disambiguation process are now returned by the -setHandler:... and -declareAttributes:... methods, which were previously void.

Default methods

One of the attractive features of DOM parsers is that they do something useful "out of the box": point a DOM parser at some XML and you get back a generic in-memory representation of that XML that you can then start taking apart. However, once you go down that road, you are stuck with the substantial CPU and memory overheads of that generic representation.

Streaming parser like SAX or MAX can be a lot more efficient, but it takes a lot more time and effort until achieving a first useful result. Default methods overcome this hurdle by also delivering an immediately useful generic representation without any extra work. Unlike a DOM, however, this generic representation can be incrementally replaced by more specialized and efficient processing later on.

Tuesday, January 20, 2009

Cocoa HTML parsing with Objective-XML

Although Objective-XML's MPWSAXParser mostly provides NSXMLParser compatibility it also provides a number of useful additional features. Among these features is the ability to parse HTML files via the settings of two flags: enforceTagNesting and ignoreCase. By default, these are on and off, respectively, which gives you strict XML behavior. However, by setting enforceTagNesting to NO and ignoreCase to YES, you get a SAX parser that will happily and speedily process HTML.

Monday, January 12, 2009

iPhone XML performance

Shortly after becoming an iPhone developer, I found a clever little piece of example code called XML Performance (login required). Having done some high performance XML processing code that works on the iPhone, I was naturally intrigued.

The example pits Cocoa's NSXMLParser against a custom parser based on libxml2, the benchmark is downloading a top 300 list of songs from iTunes.

More responsiveness using libxml2 instead of NSXMLParser

Based on my previous experience, I was expecting libxml2 to be noticeably faster, but with the advantage in processing speed being less and less important with lower and lower I/O data rates (WiFi to 3G to Edge), as I/O would start to completely overwhelm processing. Was I ever wrong!

While my expectations were technically correct for overall performance, I had completely failed to take responsiveness into account. Depending on the network selected, the NSXMLParser sample would appear to hang for 3 to 50 seconds before starting to show results. Needless to say, that is an awful user experience. The libxml example, on the other hand, would start displaying some results almost immediately. While it also was a bit faster in the total time taken, this effect seemed pretty insignificant compared to the fact that results were arriving continually pretty much during the entire time.

The difference, of course, is incremental processing. Whereas NSXMLParser's -initWithContentsOfURL: method apparently downloads the entire document first and then begins processing, the libxml2-based code in the sample downloads the XML in small chunks and processes those chunks immediately.

Alas, going with libxml2 has clear and significant disadvantages, with the code that uses libxml2 being around twice the size of the NSXMLParser-based code, at around 150 lines (non-comment, non-whitespace). If you have worked with NSXMLParser before, you will know that that is already pretty painful, so just imagine that particular brand of joy doubled, with the 150 lines of code giving you the simplest of parsers, with just 5 tags processed. Fortunately, there is a simpler way.

A simpler way: Objective-XML's SAX

Assuming you have already written a Cocoa-(Touch-)based parser using NSXMLParser, all you need to do is include Objective-XML in your projects and replace the reference to NSXMLParser with a reference to MPWSAXParser, everything else will work just as before. Well, the same except for being significantly faster (even faster than libxml2) and now also more responsive on slow connections due to incremental processing.

I have to admit that not having incremental processing was a "feature" Objective-XML shared with NSXMLParser until very recently, due to my not taking into account the fact that latency lags bandwidth. This silly oversight has now been fixed, with both MPWMAXParser and MPWSAXParser sporting URL-based parsing methods that do incremental processing.

So that's all there is to it, Objective-XML provides a drop-in replacement for NSXMLParser that has all the performance and responsiveness-benefits of a libxml2-based solution without the coding horror.

Even simpler: Messaging API for XML (MAX)

However, even a Cocoa version of the SAX API represents a pretty low-bar in terms of ease of coding. With MAX, Objective-XML provides an API that can do the same job much more simply. MAX naturally integrates XML processing with Objective-C messaging using the following two main features:
  • Clients get sent element-specific messages for processing
  • The parser handles nesting, controlled by the client
The following code for building Song objects out of iTunes <item> elements illustrates these two features:
-itemElement:(MPWXMLAttributes*)children attributes:(MPWXMLAttributes*)attributes parser:(MPWMAXParser*)p
{
  Song *song=[[Song alloc] init];
  [song setArtist:[children objectForTag:artist_tag]];
  [song setAlbum:[children objectForTag:album_tag]];
  [song setTitle:[children objectForTag:title_tag]];
  [song setCategory:[children objectForTag:category_tag]];
  [song setReleaseDate:[parseFormatter dateFromString:[children objectForTag:releasedate_tag]]];
  [self parsedSong:song];
  [song release];
  return nil;
}
MAX sends the -itemElement:attributes:parser: message to its client whenever it has encountered a complete <item> element, so there is no need for the client to perform string processing on tag names or manage partial state as in a SAX parser. The method constructs a song object using data from the <item> element's child elements which it then passes directly to the rest of the app via the parsedSong: message. It does not return an value, so MAX will not build a tree at this level.

Artist, album, title and category are the values of nested child elements of the <item> element. The (common) code shared by all these child-elements gets the character content of the respective elements and is shown below:

-defaultElement:children attributes:atrs parser:parser
{
	return [[children combinedText] retain];
}
Unlike the <item> processing code, which did not return a value, this method does return a value. MAX uses this return value to build a DOM-like structure which is then consumed by the next higher-level, in this case the -itemElement:attributes:parser: method shown above. Unlike a traditional DOM, the MAX tree structure is built out of domain-specific objects returned incrementally by the client.

These two pieces of sample code demonstrate how MAX can act like both a DOM parser or a SAX parser, controlled simply by wether the processing methods return objects (DOM) or not (SAX). They also demonstrated both element-specific and generic processing.

In the iTunes Song parsing example, I was able to build a MAX parser using about half the code required for the NSXMLParser-based example, a ratio that I have also encountered in larger projects. What about performance? It is slightly better than MPWSAXParser, so also somewhat better than libxml2 and significantly better than NSXMLParser.

Summary and Conclusion

The slightly misnamed XML Performance sample code for the iPhone demonstrates how important managing latency is for perceived end user performance, while showing only very little in terms of actual XML processing performance.

While ably demonstrating the performance problems of NSXMLParser, the sample code's solution of using libxml2 is really not a solution, due to the significant increase in code complexity. Objective-XML provides both a drop-in replacement for NSXMLParser with all the performance and latency benefits of the libxml2 solution, as well as a new API that is not just faster, but also much more straightforward than either NSXMLParser or libxml2.

Sunday, October 12, 2008

Binary XML

Jimmy Zhang hits the nail on the head when he notes that parsing ASCII text is not the primary problem in XML performance, object allocation is. I was surprised by the same finding when I started working on Objective-XML around a decade ago.

Sean McGrath claims that Binary XML solves the wrong problem.

Yes and no: it doesn't help much with existing structures and parsing methods, but with the right methods, it can be extremely helpful!

Also: "...how weird is it that we have not moved on from the DOM and SAX in terms of "standard" APIs for XML processing?"