Thursday, January 15, 2009

Simple HOM

While it is good to see that Higher Order Messaging is still inspiring new work, I feel a bit guilty that part of that inspiration are sentiments such as the following:

"Still I have yet to find a simple implementation that I like and that does not use private methods. The last thing I want is a relying on classes which can break at any time."
Mea culpa.

While I did explain a bit why the current HOM implementation is a bit gnarly, code probably speaks more loudly than repeated mea-culpas.

So, without further ado, a really simple HOM implementation. An NSArray category provides the interface and does the actual processing:

@interface NSArray(hom)

-collect;

@end

@implementation NSArray(hom)

-(NSArray* )collect:(NSInvocation*)anInvocation
{
  NSMutableArray *resultArray=[NSMutableArray array];
  for (id obj in self ) {
    id resultObject;
    [anInvocation invokeWithTarget:obj];
    [anInvocation getReturnValue:&resultObject];
    [resultArray addObject:resultObject];
  }
  return resultArray;
}

-collect {
  return [HOM homWithTarget:self selector:@selector(collect:)];
}

@end
The fact that NSInvocation deals with pointers to values rather than values makes this a bit longer than it needs to be, but the gist is simple enough: iterate over the array, invoke the invocation, return the result.

That leaves the actual trampoline, which is really just an implementation detail for conveniently creating NSInvocation objects.

@interface HOM : NSProxy {
  id xxTarget;
  SEL xxSelector;
}

@end

@implementation HOM

-(void)forwardInvocation:(NSInvocation*)anInvocation
{
  [xxTarget performSelector:xxSelector withObject:anInvocation];
}

-methodSignatureForSelector:(SEL)aSelector
{
  return [[xxTarget objectAtIndex:0] methodSignatureForSelector:aSelector];
}

-xxinitWithTarget:aTarget selector:(SEL)newSelector
{
  xxTarget=aTarget;
  xxSelector=newSelector;
  return self;
}

+homWithTarget:aTarget selector:(SEL)newSelector
{
  return [[[self alloc] xxinitWithTarget:aTarget selector:newSelector] autorelease];
}

@end
This code compiles without warnings, does not use any private API, and runs on both Leopard and the iPhone. Github: https://github.com/mpw/HOM/.


EDIT (Aug 15 2015): Changed SimpleHOM download link to github repo.

Monday, January 12, 2009

iPhone XML performance

Shortly after becoming an iPhone developer, I found a clever little piece of example code called XML Performance (login required). Having done some high performance XML processing code that works on the iPhone, I was naturally intrigued.

The example pits Cocoa's NSXMLParser against a custom parser based on libxml2, the benchmark is downloading a top 300 list of songs from iTunes.

More responsiveness using libxml2 instead of NSXMLParser

Based on my previous experience, I was expecting libxml2 to be noticeably faster, but with the advantage in processing speed being less and less important with lower and lower I/O data rates (WiFi to 3G to Edge), as I/O would start to completely overwhelm processing. Was I ever wrong!

While my expectations were technically correct for overall performance, I had completely failed to take responsiveness into account. Depending on the network selected, the NSXMLParser sample would appear to hang for 3 to 50 seconds before starting to show results. Needless to say, that is an awful user experience. The libxml example, on the other hand, would start displaying some results almost immediately. While it also was a bit faster in the total time taken, this effect seemed pretty insignificant compared to the fact that results were arriving continually pretty much during the entire time.

The difference, of course, is incremental processing. Whereas NSXMLParser's -initWithContentsOfURL: method apparently downloads the entire document first and then begins processing, the libxml2-based code in the sample downloads the XML in small chunks and processes those chunks immediately.

Alas, going with libxml2 has clear and significant disadvantages, with the code that uses libxml2 being around twice the size of the NSXMLParser-based code, at around 150 lines (non-comment, non-whitespace). If you have worked with NSXMLParser before, you will know that that is already pretty painful, so just imagine that particular brand of joy doubled, with the 150 lines of code giving you the simplest of parsers, with just 5 tags processed. Fortunately, there is a simpler way.

A simpler way: Objective-XML's SAX

Assuming you have already written a Cocoa-(Touch-)based parser using NSXMLParser, all you need to do is include Objective-XML in your projects and replace the reference to NSXMLParser with a reference to MPWSAXParser, everything else will work just as before. Well, the same except for being significantly faster (even faster than libxml2) and now also more responsive on slow connections due to incremental processing.

I have to admit that not having incremental processing was a "feature" Objective-XML shared with NSXMLParser until very recently, due to my not taking into account the fact that latency lags bandwidth. This silly oversight has now been fixed, with both MPWMAXParser and MPWSAXParser sporting URL-based parsing methods that do incremental processing.

So that's all there is to it, Objective-XML provides a drop-in replacement for NSXMLParser that has all the performance and responsiveness-benefits of a libxml2-based solution without the coding horror.

Even simpler: Messaging API for XML (MAX)

However, even a Cocoa version of the SAX API represents a pretty low-bar in terms of ease of coding. With MAX, Objective-XML provides an API that can do the same job much more simply. MAX naturally integrates XML processing with Objective-C messaging using the following two main features:
  • Clients get sent element-specific messages for processing
  • The parser handles nesting, controlled by the client
The following code for building Song objects out of iTunes <item> elements illustrates these two features:
-itemElement:(MPWXMLAttributes*)children attributes:(MPWXMLAttributes*)attributes parser:(MPWMAXParser*)p
{
  Song *song=[[Song alloc] init];
  [song setArtist:[children objectForTag:artist_tag]];
  [song setAlbum:[children objectForTag:album_tag]];
  [song setTitle:[children objectForTag:title_tag]];
  [song setCategory:[children objectForTag:category_tag]];
  [song setReleaseDate:[parseFormatter dateFromString:[children objectForTag:releasedate_tag]]];
  [self parsedSong:song];
  [song release];
  return nil;
}
MAX sends the -itemElement:attributes:parser: message to its client whenever it has encountered a complete <item> element, so there is no need for the client to perform string processing on tag names or manage partial state as in a SAX parser. The method constructs a song object using data from the <item> element's child elements which it then passes directly to the rest of the app via the parsedSong: message. It does not return an value, so MAX will not build a tree at this level.

Artist, album, title and category are the values of nested child elements of the <item> element. The (common) code shared by all these child-elements gets the character content of the respective elements and is shown below:

-defaultElement:children attributes:atrs parser:parser
{
	return [[children combinedText] retain];
}
Unlike the <item> processing code, which did not return a value, this method does return a value. MAX uses this return value to build a DOM-like structure which is then consumed by the next higher-level, in this case the -itemElement:attributes:parser: method shown above. Unlike a traditional DOM, the MAX tree structure is built out of domain-specific objects returned incrementally by the client.

These two pieces of sample code demonstrate how MAX can act like both a DOM parser or a SAX parser, controlled simply by wether the processing methods return objects (DOM) or not (SAX). They also demonstrated both element-specific and generic processing.

In the iTunes Song parsing example, I was able to build a MAX parser using about half the code required for the NSXMLParser-based example, a ratio that I have also encountered in larger projects. What about performance? It is slightly better than MPWSAXParser, so also somewhat better than libxml2 and significantly better than NSXMLParser.

Summary and Conclusion

The slightly misnamed XML Performance sample code for the iPhone demonstrates how important managing latency is for perceived end user performance, while showing only very little in terms of actual XML processing performance.

While ably demonstrating the performance problems of NSXMLParser, the sample code's solution of using libxml2 is really not a solution, due to the significant increase in code complexity. Objective-XML provides both a drop-in replacement for NSXMLParser with all the performance and latency benefits of the libxml2 solution, as well as a new API that is not just faster, but also much more straightforward than either NSXMLParser or libxml2.

Sunday, January 11, 2009

Best of Show, MacWorld 2009

Since I recently became the Mac tech lead for Livescribe, responsible for delivering the Mac desktop software, I am happy to report that not only did we meet all of our target dates, we also won Best of Show at MacWorld 2009.

Spending 3 days at the booth was both exhausting and rewarding, the enthusiasm exhibited by customers was absolutely mind-blowing.

Sunday, December 21, 2008

Unit test the class

Travis Griggs comes to the conclusion that unit test objects should map 1:1 to classes under test.

I agree.

In fact, I would go a bit further: tests should be an integral part of a class. While this helps avoid negative outcomes such as parallel class hierarchies or having code and tests diverge, it more importantly simplifies the test/code relationship and drives home the point that code is incomplete without its tests.

While I was working with JUnit on a reasonably large Java system, both finding a good place for a particular test and finding the tests for a specific class became quite burdensome after a while.

For this reason MPWTest simply asks classes to test themselves. Furthermore, only frameworks are tested, so the test tool simply loads each framework to test, enumerates the classes within that particular framework and then runs the tests it finds. TestCases and TestSuites are implicitly created from this structure, removing most of the administrative burdens of unit testing, and also any explicit dependence of the tests on the testing framework.

Having no dependencies on the testing framework makes it easier to ship tests in production code without having to also ship the testing framework. While this may sound odd at first, it avoids potential issues with code compiled for testing being different than code destined to be shipped, and further reinforces the idea that tests are an integral part of each class, rather than an optional add-on.

Sunday, October 12, 2008

Binary XML

Jimmy Zhang hits the nail on the head when he notes that parsing ASCII text is not the primary problem in XML performance, object allocation is. I was surprised by the same finding when I started working on Objective-XML around a decade ago.

Sean McGrath claims that Binary XML solves the wrong problem.

Yes and no: it doesn't help much with existing structures and parsing methods, but with the right methods, it can be extremely helpful!

Also: "...how weird is it that we have not moved on from the DOM and SAX in terms of "standard" APIs for XML processing?"

Sunday, August 10, 2008

Code is not an asset

Michael Feathers wonders how to go beyond technical debt. I have been wondering about this for some time, and I think the answer is to account for code as a liability, not an asset.

The functionality that the code has, the value it delivers is an asset, but the code itself is a liability. This easily explains how refactoring and removing code add value, as long as functionality is preserved.