Thanks to Todd Blanchard for providing the necessary impetus to learn git.
Tuesday, February 1, 2011
Objective-XML and MPWFoundation now available on github
Tuesday, January 18, 2011
On switching away from CoreData
Rather, the issues we have had with CoreData were additional complexity and more importantly gratuitous dependencies that, at least for our application, were not offset by noticeable benefits.
One of the most significant structural dependencies is that CoreData requires all your model classes to be subclasses of NSManagedObject, a class provided by CoreData. This may not seem like a big problem at first, but it gets in the way of defining a proper DomainModel, which should always be independent. The Java community actually figured this out a while ago, which is why there was a recent move to persistence frameworks supporting POJOs. (Of course, POOO doesn't have quite the same ring to it, and also the Java frameworks were a lot more heavy-handed than CoreData). The model is where your value is, it should be unencumbered. For example, when we started looking at the iPhone, there was no CoreData there, so we faced the prospect of duplicating all our model code.
In addition to initially not having CoreData, the iPhone app also used (and still uses) a completely different persistence mechanism (more feed oriented), and there were other applications where yet a third persistence mechanism was used (more document centric than DB-centric, with an externally defined file format). A proper class hierarchy would have had an abstract superclass without any reference to a specific persistence mechanism, but capturing the domain knowledge of our model. With CoreData, this hierarchy was impossible.
Since we had externally defined file formats in every case, we had to write an Atomic Store adapter and thus also couldn't really benefit from CoreData's change management. When we did the move, it turned out that the Atomic Store adapter we had written was significantly more code than just serializing and de-serializing the XML ourselves.
Another benefit of CoreData is its integration with Bindings, but that also turned out to be of little use to us. The code we managed to save with Bindings was small and trivial, whereas the time and effort to debug bindings when they went wrong or to customize them for slightly specialized needs was very, very large. So we actually ditched Bindings a long time before we got rid of CoreData.
So why was CoreData chosen in the first place? Since I wasn't around for that decision, I don't know 100%, but as far as I can tell it was mostly "Shiny Object Syndrome". CoreData and Bindings were new Apple technologies at the time, therefore they had to be used.
So are there any lessons here? The first would be to avoid Shiny Object Syndrome. By all means have fun and play around, but not in production code. Second and related is to really examine your needs. CoreData is probably highly appropriate in many contexts, it just wasn't in ours. Finally, it would be a huge improvement if CoreData were to support Plain Old Objective-C Objects. In fact, if that were the case we probably would not have to ditch it.
Monday, January 10, 2011
Little Message Dispatch
I feel much the same way, that is although I think Grand Central Dispatch is awesome, I simply haven't been able to justify spending much time with it, because it usually turns out that my own threading needs so far have been far more modest than what GCD provides. In fact, I find that an approach that's even more constrained than the one based on NSOperationQueue that Brent describes has been working really well in a number of projects.
Instead of queueing up operations and letting them unwind however, I just spawn a single I/O thread (at most a few) and then have that perform the I/O deterministically. This is paired with a downloader that uses the NSURL loading system to download any number of requests in parallel.
This loads 3 types of objects: first the thumbnails, then article content, then images associated with the articles. The sequencing is both deliberate (thumbs first, article images cannot be loaded before the article content is present) and simply expressed in the code by the well-known means of just writing the actions one after the other, rather than having those dependencies expressed in call-backs, completion blocks or NSOperation subclasses.
- (void)downloadNewsContent { id pool=[NSAutoreleasePool new]; [[self downloader] downloadRequests:[self thumbnailRequests]]; [[self downloader] downloadRequests:[self contentRequests]]; [[self downloader] downloadOnlyRequests:[self imageRequests]]; [pool release]; }
So work is done semi-sequentially in the background, while coordination is done on the main thread, with liberal use of performSelectorOnMainThread. Of course, I make that a little simpler with a couple of HOMs that dispatch messages to threads:
- async runs the message on a new thread, I use it for long-running, intrinsically self contained work. It is equivalent to performSelectorInBackground: except for being able to take an arbitrary message.
- asyncOnMainThread and syncOnMainThread are the equivalents of performSelectorOnMainThread, with the waitUntilDone flag set to YES or NO
- afterDelay: sends he message after the specified delay
Brent sums it up quite well in his post:
-(void)loadSections { [[self asyncOnMainThread] showSyncing]; [[[self sections] do] downloadNewsContent]; [[self asyncOnMainThread] showDoneSyncing]; } ... -(IBAction)syncButtonClicked { [[self async] loadSections]; }
Here’s the thing about code: the better it is, the more it looks and reads like a children’s book.Yep.
Tuesday, January 4, 2011
Node.js performance? µhttpd performance!
Of course, there is also a significant body of research on this topic, showing for example that user-level thread implementations tend to get very similar performance to event-based servers. There is also the issue that the purity of "no blocking APIs" is somewhat naive on a modern Unix, because blocking on I/O can happen in lots of different non-obvious places. At the very least, you may encounter a page-fault, and this may even be desirable in order to use memory mapped files.
In those cases, the fact that you have purified all your APIs makes no difference, you are still blocked on I/O, and if you've completely foregone kernel threads like node.js appears to do, then your entire server is now blocked!
Anyway, baving seen some interesting node.js benchmarking, I was obviously curious to see how my little embedded Objective-C http-server based on the awesome GNU microhttp stacked up.
The baseline is a typical static serving test, where Apache (out-of-the box configuration on Mac OS X client) serves a small static file and the two app servers serve a small static string.
| Platform | # requests/sec |
| Static (via Apache) | 6651.58 |
| Node.js | 5793.44 |
| MPWHttp | 8557.83 |
| Platform | # requests/sec |
| Static (via Apache) | - |
| Node.js | 88.48 |
| MPWHttp | 47.04 |
| Platform | # requests/sec |
| Static (via Apache) | - |
| Node.js | 9.62 |
| MPWHttp | 7698.65 |
To make the comparison a little bit more fair, I added an xor with a randomly initialized value so that the optimizer could not remove the loop (verified by varying the loop count).
| Platform | # requests/sec |
| Static (via Apache) | - |
| Node.js | 9.62 |
| MPWHttp | 222.9 |
Cross-checking on my 8 core Mac Pro gave the following results:
| Platform | # requests/sec |
| Static (via Apache) | - |
| Node.js | 10.72 |
| MPWHttp | 1011.86 |
In conclusion, I think it is fair to say that node.js succeeds admirably in a certain category of tasks: lots of concurrency, lots of blocked I/O, very little computation, very little memory use so we don't page fault. In more typical mixes with some concurrency, some computation some I/O and a bit of memory use (so chances of paging), a more balanced approach may be better.
Thursday, December 2, 2010
This is not LISP
(apply + (take 1000 (iterate inc 1)))Hmm..I find the following preferable:
(1 to: 1000 ) reduce + 0
Sunday, May 9, 2010
iPhone XML: from DOM to incremental SAX
In my last post, I extended Ray Wenderlich's XML parser comparison to MAX, and performance seemed to be about 2x better than the nearest competitor, TBXML and around 7x faster than NSXMLParser, Cocoa Touch's built-in XML parser.
While the resulting code has been posted here, I haven't yet explained how I got there. First, I downloaded both Ray's project and Objective-XML 5.3. iPhone OS requires a bit of trickiness to get a reusable library, partly due to the fact that frameworks or dynamic libraries are not supported, partly due to the fact that simulator and device are not just different Xcode architectures of a single SDK, and not even just different SDKs, but actually different platforms. If anyone can tell me how to create a normal target that can compile/links for both the simulator and the device, I'd love to hear about it!
So, in order to compile for iPhoneOS, you'll need to select the iXmlKit target:
You'll need to build twice, changing Active SDK settings once to the the device and once to the simulator. You will then have two copies of the libiXmlKit.a library, one in the directory Release-iphoneos, the other in Release-iphonsimulator (both relative to your base build directory):
marcel@mpwls[LD]ls -la ~/programming/Build/Release-iphoneos/
-rw-r--r-- 1 marcel staff 521640 May 9 19:52 libiXmlKit.a
These two copies then need to be joined together using the lipo command to create a single library that can be used both with the simulator and the device.
lipo -create Release-iphoneos/libiXmlKit.a Release-iphonesimulator/libiXmlKit.a -output Release/libiXmlKit.a
(Newer Objective-XML versions will have a shell-script target that automates this process). Once I had the fat libiXmlKit.a library, I created a new MAX group in the XMLPerformance project, and copied both the library and the MAX header file into that group:
I then created a new MAX Song parser class:
#import "iTunesRSSParser.h"
@interface MAXSongParser : iTunesRSSParser {
id parser;
NSDateFormatter *parseFormatter;
}
@end
The implementation is also fairly straightforward, with your basic init method:
#import "MAXSongParser.h"
#import "MPWMAXParser.h"
#import "Song.h"
@implementation MAXSongParser
-init
{
self=[super init];
parseFormatter = [[NSDateFormatter alloc] init];
[parseFormatter setDateStyle:NSDateFormatterLongStyle];
[parseFormatter setTimeStyle:NSDateFormatterNoStyle];
MPWMAXParser *newParser=[MPWMAXParser parser];
[newParser setUndefinedTagAction:MAX_ACTION_NONE];
[newParser setHandler:self forElements:[NSArray arrayWithObjects: @"item",@"album",@"title",@"channel",@"rss",@"category",nil]
inNamespace:nil prefix:@"" map:nil];
[newParser setHandler:self forElements:[NSArray arrayWithObjects:
@"releasedate",@"artist",@"album",nil]
inNamespace:@"http://phobos.apple.com/rss/1.0/modules/itms/"
prefix:@"" map:nil];
parser=[newParser retain];
return self;}
The MAX parser is initialized in this init method. We define the elements we care about using the two "setHandler:forElements:inNamespace:prefix:map:" messages, one for each namespace we will be dealing with. In the default (RSS) namespace, we are interested in the "item", "album", "title", "channel", "rss" and "category" elements. In Apple's special "itms" namespace, we will handle "releasdate", "artist" and "album". Setting MAX_ACTION_NONE as the undefined tag action means that the parser will ignore elements not listed as interesting and all their sub-elements.
Songs are created in the -itemElement:... method, which turns the relevant child-elements of the item element into Song attributes:
-itemElement:children attributes:attributes parser:parser
{Song *song=[[Songalloc] init];
[song setAlbum:[children objectForUniqueKey:@"album"]];
[song setTitle:[children objectForUniqueKey:@"title"]];
[song setArtist:[children objectForUniqueKey:@"artist"]];
[song setAlbum:[children objectForUniqueKey:@"album"]];
[song setCategory:[children objectForUniqueKey:@"category"]];
[song setReleaseDate:[parseFormatter dateFromString:[children objectForUniqueKey:@"releasedate"]] ];
return song;}
Two more methods make the actual parse process complete: <channel> elements have one or more <item> elements, so we want to return all of them, using "objectsForKey:":
-channelElement:children attributes:attributes parser:parser{return [[children objectsForKey:@"item"] retain];
}
Finally, there are a bunch of elements that we have defined interest in but treat identically...these can be handled using the "default" element handler:
-defaultElement:children attributes:attributes parser:parser{return [[children lastObject] retain];
}
That concludes the routines that actually parse the XML into objects, now for kicking off the parser. With the timing code removed, the method is fairly straightforward:
- (void)downloadAndParse:(NSURL *)url {
id pool=[NSAutoreleasePool new];
[parserparse: [NSData dataWithContentsOfURL:url]];for ( id song in [parserparseResult] ) {
[self performSelectorOnMainThread:@selector(parsedSong:)
withObject:song waitUntilDone:NO];
}
[pool release];}
With the timing code, it all gets a bit messier:
- (void)downloadAndParse:(NSURL *)url {
id pool=[NSAutoreleasePool new];
[self performSelectorOnMainThread:@selector(downloadStarted) withObject:nil waitUntilDone:NO];
NSData *data=[NSData dataWithContentsOfURL:url];
[self performSelectorOnMainThread:@selector(downloadEnded) withObject:nil waitUntilDone:NO];NSTimeInterval start = [NSDate timeIntervalSinceReferenceDate];
[parserparse:data];
for ( id song in [parserparseResult] ) {
[selfperformSelectorOnMainThread:@selector(parsedSong:) withObject:song waitUntilDone:NO];
}
NSTimeInterval duration = [NSDatetimeIntervalSinceReferenceDate] - start;
[selfperformSelectorOnMainThread:@selector(addToParseDuration:) withObject:[NSNumbernumberWithDouble:duration] waitUntilDone:NO];
[selfperformSelectorOnMainThread:@selector(parseEnded) withObject:nilwaitUntilDone:NO];
[[NSURLCachesharedURLCache] removeAllCachedResponses];
[pool release];
}
This produces a non-incremental DOM-style parser, so we first download the data, then process it into a DOM and finally transfer the processed objects to the main thread for display. It differs from other DOM-syle XML parsers in that it actually produces domain objects (as a Domain Object Model parser arguably should), rather than a generic XML DOM that must then be converted to objects.
Turning the DOM-style parser into a SAX-stye parser is almost completely trivial. Instead of returning the Song objects at the end of itemElement:..
[song setReleaseDate:[parseFormatterdateFromString:[children objectForUniqueKey:@"releasedate"]] ];
return song;}
we instead pass them to the delegate there and return nil so no tree is constructed:
[song setReleaseDate:[parseFormatterdateFromString:[children objectForUniqueKey:@"releasedate"]] ];
[self performSelectorOnMainThread:@selector(parsedSong:)
withObject:song waitUntilDone:NO];
[song release];
return nil;
}
This means we can also remove the "channelElement" method and the for loop in downloadAndParse: that passed the Song objects to the main thread. This is a SAX-style parser (though it doesn't use the SAX methods and does produce domain objects), but it is still non-incremental because it first downloads all the data and then parses it. If we want to turn the SAX parser into an incremental parser that overlaps processing with downloading, there is one final tweak that fortunately further simplifies the downloadAndParse method (again with timing code removed):
- (void)downloadAndParse:(NSURL *)url {
id pool=[NSAutoreleasePoolnew];
[parserparseDataFromURL:url];[[NSURLCachesharedURLCache] removeAllCachedResponses];
[pool release];}
While this is probably best not only in terms of performance and responsiveness, but also in terms of code size, it doesn't play well with the XMLPerformance example, because there are no external measurement hooks that allow us to separate downloading from parsing for performance measurement purposes.
In addition, the XMLPerformance example is odd in that is both multi-threaded and measure real time rather than CPU time when measuring parse performance. The reason this is odd is that when both parsing and display are active, the scheduler can take the CPU away from the XML parsing thread at any time and switch to the display thread, but this time will be counted as XML parsing time by the measurement routines. This is obviously incorrect and penalizes all the incremental parsers, which is why Ray's comparison showed all the DOM parsers as performing better than the SAX parsers.
I hope these explanations show how to create different styles of parsers using MAX.
Sunday, May 2, 2010
iPhone XML Performance Revisited
Ray Wenderlich has done a great comparison of iPhone XML parsers, using the same sample I had looked at earlier in the context of responsiveness.
As Ray was comparing performance, my hobby-horse, I was obviously curious as to how MAX stacked up against all the upstart competition. Quite well, as it turns out (average of 5 runs on an iPad with 3.2):
Figure 1: total times (seconds)
MAX was about 50% faster than the closest competition, TBXML, at 0.43s vs. 0.61s.
However, the XMLPerformance sample is a bit weird in that it measures elapsed time, not CPU time, and is multi-threaded, updating the display as results come in.
In order to account for this overhead that has nothing to do with the XML parsers, I added a "Dummy" parser that doesn't actually parse anything, but rather just generates dummy Song entries as quickly as possible. This takes around 0.2 seconds all by itself. Subtracting this non-XML overhead from the total times yields the following results:
Figure 2: XML parse times (seconds)
When measuring just the XML parsers themselves, MAX is around twice as fast as the closest competition and seven times as fast as the built in NSXMLParser.
Sweet.
[Update] I forgot the link to where I had uploaded the source: XMLPerformance.tgz at http://www.metaobject.com/downloads/Objective-C/
