At its core, Postscript is a stack-oriented, dynamically typed and highly polymorphic interpreted programming language. So implementing Postscript with Objective-C objects is actually not just convenient when you want to get Objective-C objects out, it is also a good match for the semantics of the language.
So all is good, right? Well, we also need to make sure that performance is competitive, otherwise there really isn't much of a point. How do we find out if performance is competitive? Fortunately, we have the gold standard handily available: Adobe's interpreter was not just used in NeXT's DisplayPostscript, but is also available as the PS Normalizer on Mac OS X . So let's test performance with a little Postscript program:
%! usertime 0 1 1000000 { 4 mul pop } bind for usertime exch sub dup == 20 20 moveto /Times-Roman 24 selectfont 100 string cvs show ( ms) show showpageThe program times a loop that multiplies some numbers one million times. It exercises a good deal of the basic execution machinery in the Postscript language: stack manipulation, procedure invocation, array access (a procedure is just an array with the executable bit set), looping and arithmetic. The loop is timed with the usertime command, which returns CPU time used in milliseconds.
This test clocks in at 513 ms (513 ns per iteration) in Preview, which isn't too shabby.
1. The problem
As proof of concept, let's code up some Objective-C equivalent of what the Postscript interpreter has to do in this loop. That should give us a good lower bound for the time taken (lower bound because there will be additional interpretation overhead, and Postscript semantics are slightly more complicated). We need a stack, some number objects and a bit of arithmetic. Easy:id startcounter=[NSNumber numberWithInt:0]; id endcounter=[NSNumber numberWithInt:1000000]; id counter=startcounter; id four=[NSNumber numberWithInt:4]; while ( [counter intValue] < [endcounter intValue] ) { int intResult; id result; [stack addObject:counter]; [stack addObject:four]; intResult = [[stack lastObject] intValue] * [[stack objectAtIndex:[stack count]-2] intValue]; result=[NSNumber numberWithInt:intResult]; [stack removeLastObject]; [stack removeLastObject]; [stack addObject:result]; [stack removeLastObject]; counter=[NSNumber numberWithInt:[counter intValue]+1]; }Sadly, this takes 4.8 µs per iteration, so our 'lower' bound is almost 10 times slower than our target, and that's without accounting for interpretation. Clearly not good enough. What if we get rid of all that silly stack manipulation code and use a plain C loop?
id b=[NSNumber numberWithInt:4]; for (i=0;i < 10000000;i++) { id a=[NSNumber numberWithInt:i]; id c=[NSNumber numberWithInt:[a intValue] * [b intValue]]; }
2. Mutable State
Objective-C is an imperative object oriented language, meaning objects can change state. However, we have treated numbers as immutable value objects, requiring them to be recreated from scratch. Allocating objects tends to be around 25x more costly than an Objective-C message send, so what if we don't allocate new integer objects, but instead reuse an existing one and just change its value? It turns out we can't use NSNumber for this as it doesn't allow its value to be set, so we need a (trivial) wrapper class for a single integer.id b=[MPWInteger numberWithInt:4]; id a=[MPWInteger numberWithInt:0]; id c=[MPWInteger numberWithInt:0]; for (i=0;i <10000000;i++) { [a setIntValue:i]; [c setIntValue:[a intValue] * [b intValue]]; }That's more like it: 50ns per iteration is 100x better than our first attempt and also 10x better than the target we're aiming for. So taking advantage of mutable state makes our basic plan possible, at least in principle. Of course, we now have to reintroduce the stack and add interpretation.
3. Save the planet
Alas, it turns out that the interpreter really does need fresh instances. While it will discard them quickly in most cases, it sometimes stores them away meaning we can't statically reuse objects the way we did above.Instead, we need to figure out a way to recycle temporary objects so we can reuse them without spending a lot of time. The common way to do this is to keep a pool of objects from which requests for new MPWInteger instances are satisfied. However, due to the unpredictable nature of the interpreted code, we cannot use the explicit checkin/checkout policy such pools usually require.
Instead we make the pool a circular buffer and use the retain count to verify that an object can be reused. When we get to a position in the pool that has an object, we can reuse that object if the retain count is one, meaning that only the pool has a valid reference. If the retain count of the object is greater than one, someone other than the pool is holding on to the object and it cannot be reused (yet), so we need to get another instance.
This logic is encapsulated in the class MPWObjectCache, which can be used very similarly to a class (factory object) in creating new instances.
MPWObjectCache* intCache=[[MPWObjectCache alloc] initWithCapacity:20 class:[MPWInteger class]]; id b=[MPWInteger integer:5]; for (i=0;i < 1000000;i++) { id a=GETOBJECT(intCache); id d=GETOBJECT(intCache); [a setIntValue:i]; [d setIntValue:[a intValue] * [b intValue]];This code runs in 100ns per iteration, so we now have a solution that gives us new or safely recycled objects quickly enough to build on with the confidence the end result will perform acceptably.
3 comments:
Thank you for a very interesting post.
Is there some code available for your object cache?
If not could you expand on the implementation of the object cache?
Do you have a favourite resource for programming techniques to effectively use the runtime? I feel I am missing out on performance and effectiveness because I have never had a runtime to program to and do not yet know how to use it well.
The code is available as part of MPWFoundation, downloadable from the site. See my next post for more details.
Post a Comment