Showing posts with label PDF. Show all posts
Showing posts with label PDF. Show all posts

Wednesday, October 7, 2015

Jitterdämmerung

So, Windows 10 has just been released, and with it Ahead Of Time (AOT) compilation feature .NET native. Google also just recently introduced ART for Android, and I just discovered that Oracle is planning an AOT compiler for mainstream Java.

With Apple doggedly sticking to Ahead of Time Compilation for Objective-C and now their new Swift, JavaScript is pretty much the last mainstream hold-out for JIT technology. And even in JavaScript, the state-of-the-art for achieving maximum performance appears to be asm.js, which largely eschews JIT techniques by acting as object-code in the browser represented in JavaScript for other languages to be AOT-compiled into.

I think this shift away from JITs is not a fluke but was inevitable, in fact the big question is why it has taken so long (probably industry inertia). The benefits were always less than advertised, the costs higher than anticipated. More importantly though, the inherent performance characteristics of JIT compilers don't match up well with most real world systems, and the shift to mobile has only made that discrepancy worse. Although JITs are not going to go away completely, they are fading into the sunset of a well-deserved retirement.

Advantages of JITs less than promised

I remember reading the copy of the IBM Systems Journal on Java Technology back in 2000, I think. It had a bunch of research articles describing super amazing VM technology with world-beating performance numbers. It also had a single real-world report from IBM's San Francisco project. In the real world, it turned out, performance was a bit more "mixed" as they say. In other words: it was terrible and they had to do an incredible amount of work for the system be even remotely usable.

There was also the experience of the New Typesetting System (NTS), a rewrite of TeX in Java. Performance was atrocious, the team took it with humor and chose a snail as their logo.

Nts at full speed One of the reasons for this less than stellar performance was that JITs were invented for highly dynamic languages such as Smalltalk and Self. In fact, the Java Hotspot VM can be traced in a direct line to Self via the Strongtalk system, whose creator Animorphic Systems was purchased by Sun in order to acquire the VM technology.

However, it turns out that one of the biggest benefits of JIT compilers in dynamic languages is figuring out the actual types of variables. This is a problem that is theoretically intractable (equivalent to the halting problem) and practically fiendishly difficult to do at compile time for a dynamic language. It is trivial to do at runtime, all you need to do is record the actual types as they fly by. If you are doing Polymorphic Inline Caching, just look at the contents of the caches after a while. It is also largely trivial to do for a statically typed language at compile time, because the types are right there in the source code!

So gathering information at runtime simply isn't as much of a benefit for languages such as C# and Java as it was for Self and Smalltalk.

Significant Costs

The runtime costs of a JIT are significant. The obvious cost is that the compiler has to be run alongside the program to be executed, so time compiling is not available for executing. Apart from the direct costs, this also means that your compiler is limited in the types of analyses and optimizations it can do. The impact is particularly severe on startup, so short-lived programs like for example the TeX/NTS are severely impacted and can often run slower overall than interpreted byte-code.

In order to mitigate this, you start having to have multiple compilers and heuristics for when to use which compilers. In other words: complexity increases dramatically, and you have only mitigated the problem somewhat, not solved it.

A less obvious cost is an increase in VM pressure, because the code-pages created by the JIT are "dirty", whereas executables paged in from disk are clean. Dirty pages have to be written to disk when memory is required, clean pages can simply be unmapped. On devices without a swap file like most smartphones, dirty vs. clean can mean the difference between a few unmapped pages that can be swapped in later and a process getting killed by the OS.

VM and cache pressure is generally considered a much more severe performance problem than a little extra CPU use, and often even than a lot of extra CPU use. Most CPUs today can multiply numbers in a single cycle, yet a single main memory access has the CPU stalled for a hundred cycles or more.

In fact, it could very well be that keeping non-performance-critical code as compact interpreted byte-code may actually be better than turning it into native code, as long as the code-density is higher.

Security risks

Having memory that is both writable and executable is a security risk. And forbidden on iOS, for example. The only exception is Apple's own JavaScript engine, so on iOS you simply can't run your own JITs.

Machines got faster

On the low-end of performance, machines have gotten so fast that pure interpreters are often fast enough for many tasks. Python is used for many tasks as is and PyPy isn't really taking the Python world by storm. Why? I am guessing it's because on today's machines, plain old interpreted Python is often fast enough. Same goes for Ruby: it's almost comically slow (in my measurements, serving http via Sinatra was almost 100 times slower than using libµhttp), yet even that is still 400 requests per second, exceeding the needs of the vast majority of web-sites including my own blog, which until recently didn't see 400 visitors per day.

The first JIT I am aware of was Peter Deutsch's PS (Portable Smalltalk), but only about a decade later Smalltalk was fine doing multi-media with just a byte-code interpreter. And native primitives.

Successful hybrids

The technique used by Squeak: interpreter + C primitives for heavy lifting, for example for multi-media or cryptography has been applied successfully in many different cases. This hybrid approach was described in detail by John Ousterhout in Scripting: Higher-Level Programming for the 21st Century: high level "scripting" languages are used to glue together high performance code written in "systems" languages. Examples include Numpy, but the ones I found most impressive were "computational steering" systems apparently used in supercomputing facilities such as Oak Ridge National Laboratories. Written in Tcl.

What's interesting with these hybrids is that JITs are being squeezed out at both ends: at the "scripting" level they are superfluous, at the "systems" level they are not sufficient. And I don't believe that this idea is only applicable to specialized domains, though there it is most noticeable. In fact, it seems to be an almost direct manifestation of the observations in Knuth's famous(ly misquoted) quip about "Premature Optimization":

Experience has shown (see [46], [51]) that most of the running time in non-IO-bound programs is concentrated in about 3 % of the source text.

[..] The conventional wisdom shared by many of today's software engineers calls for ignoring efficiency in the small; but I believe this is simply an overreaction to the abuses they see being practiced by penny-wise-and-pound-foolish programmers, who can't debug or maintain their "optimized" programs. In established engineering disciplines a 12 % improvement, easily obtained, is never considered marginal; and I believe the same viewpoint should prevail in soft- ware engineering. Of course I wouldn't bother making such optimizations on a one-shot job, but when it's a question of preparing quality programs, I don't want to restrict myself to tools that deny me such efficiencies.

There is no doubt that the grail of efficiency leads to abuse. Programmers waste enormous amounts of time thinking about, or worrying about, the speed of noncritical parts of their programs, and these attempts at efficiency actually have a strong negative impact when debugging and maintenance are considered. We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil.

Yet we should not pass up our opportunities in that critical 3 %. A good programmer will not be lulled into complacency by such reasoning, he will be wise to look carefully at the critical code; but only after that code has been identified. It is often a mistake to make a priori judgments about what parts of a program are really critical, since the universal experience of programmers who have been using measurement tools has been that their intuitive guesses fail. After working with such tools for seven years, I've become convinced that all compilers written from now on should be designed to provide all programmers with feedback indicating what parts of their programs are costing the most; indeed, this feedback should be supplied automatically unless it has been specifically turned off.

[..]

(Most programs are probably only run once; and I suppose in such cases we needn't be too fussy about even the structure, much less the efficiency, as long as we are happy with the answers.) When efficiencies do matter, however, the good news is that usually only a very small fraction of the code is significantly involved.

For the 97%, a scripting language is often sufficient, whereas the critical 3% are both critical enough as well as small and isolated enough that hand-tuning is possible and worthwhile.

I agree with Ousterhout's critics who say that the split into scripting languages and systems languages is arbitrary, Objective-C for example combines that approach into a single language, though one that is very much a hybrid itself. The "Objective" part is very similar to a scripting language, despite the fact that it is compiled ahead of time, in both performance and ease/speed of development, the C part does the heavy lifting of a systems language. Alas, Apple has worked continuously and fairly successfully at destroying both of these aspects and turning the language into a bad caricature of Java. However, although the split is arbitrary, the competing and diverging requirements are real, see Erlang's split into a functional language in the small and an object-oriented language in the large.

Unpredictable performance model

The biggest problem I have with JITs is that their performance model is extremely unpredictable. First, you don't know when optimizations are going to kick in, or when extra compilation is going to make you slower. Second, predicting which bits of code will actually be optimized well is also hard and a moving target. Combine these two factors, and you get a performance model that is somewhere between unpredictable and intractable, and therefore at best statistical: on average, your code will be faster. Probably.

While there may be domains where this is acceptable, most of the domains where performance matters at all are not of this kind, they tend to be (soft) real time. In real time systems average performance matters not at all, predictably meeting your deadline does. As an example, delivering 80 frames in 1 ms each and 20 frames in 20 ms means for 480ms total time means failure (you missed your 60 fps target 20% of the time) whereas delivering 100 frames in 10 ms each means success (you met your 60 fps target 100% of the time), despite the fact that the first scenario is more than twice as fast on average.

I really learned this in the 90ies, when I was doing pre-press work and delivering highly optimized RIP and Postscript processing software. I was stunned when I heard about daily newspapers switching to pre-rendered, pre-screened bitmap images for their workflows. This is the most inefficient format imaginable for pre-press work, with each page typically taking around 140 MB of storage uncompressed, whereas the Postscript source would typically be between 1/10th and 1/1000th of the size. (And at the time, 140MB was a lot even for disk storage, never mind RAM or network capacity.

The advantage of pre-rendered bitmaps is that your average case is also your worst case. Once you have provisioned your infrastructure to handle this case, you know that your tech stack will be able to deliver your newspaper on time, no matter what the content. With Postscript (and later PDF) workflows, you average case is much better (and your best case ridiculously so), but you simply don't get any bonus points for delivering your newspaper early. You just get problems if it's late, and you are not allowed to average the two.

Eve could survive and be useful even if it were never faster than, say, Excel. The Eve IDE, on the other hand, can't afford to miss a frame paint. That means Imp must be not just fast but predictable - the nemesis of the SufficientlySmartCompiler.
I also saw this effect in play with Objective-C and C++ projects: despite the fact that Objective-C's primitive operations are generally more expensive, projects written in Objective-C often had better performance than comparable C++ projects, because the Objective-C's performance model was so much more simple, obvious and predictable.

When Apple was still pushing the Java bridge, Sun engineers did a stint at a WWDC to explain how to optimize Java code for the Hotspot JIT. It was comical. In order to write fast Java code, you effectively had to think of the assembler code that you wanted to get, then write the Java code that you thought might net that particular bit of machine code, taking into account the various limitations of the JIT. At that point, it is a lot easier to just write the damn assembly code. And vastly more predictable, what you write is what you get.

Modern JITs are capable of much more sophisticated transformations, but what the creators of these advanced optimizers don't realize is that they are making the problem worse rather than better. The more they do, the less predictable the code becomes.

The same, incidentally, applies to SufficentlySmart AOT compilers such as the one for the Swift language, though the problem is not quite as severe as with JITs because you don't have the dynamic component. All these things are well-intentioned but all-in-all counter-productive.

Conclusion

Although the idea of Just in Time Compilers was very good, their area of applicablity, which was always smaller than imagined and/or claimed, has shrunk ever further due to advances in technology, changing performance requirements and the realization that for most performance critical tasks, predictability is more important than average speed. They are therefore slowly being phased out in favor of simpler, generally faster and more predictable AOT compilers. Although they are unlikely to go away completely, their significance will be drastically diminished.

Alas, the idea that writing high-level code without any concessions to performance (often justified by misinterpreting or simply just misquoting Knuth) and then letting a sufficiently smart compiler fix it lives on. I don't think this approach to performance is viable, more predictability is needed and a language with a hybrid nature and the ability for the programmer to specify behavior-preserving transformations that alter the performance characteristics of code is probably the way to go for high-performance, high-productivity systems. More on that another time.

What do you think? Are JITs on the way out or am I on crack? Should we have a more manual way of influencing performance without completely rewriting code or just trusting the SmartCompiler?

Update: Nov. 13th 2017

The Mono Project has just announced that they are adding a byte-code interpreter: "We found that certain programs can run faster by being interpreted than being executed with the JIT engine."

Update: Mar. 25th 2021

Facebook has created an AOT compiler for JavaScript called Hermes. They report significant performance improvements and reductions in memory consumption (with of course some trade-offs for specific benchmarks).

Speaking of JavaScript, Google launhed Ignition in 2017, a JS bytecode interpreter for faster startup and better performance on low-memory devices. It also did much better in terms of pure performance than they anticipated.

So in 2019, Google introduced their JIT-Less JS engine. Not even an AOT compiler, just a pure interpreter. While seeing slowdowns in the 20-80% range on synthetic benchmarks, they report real-world performance only a few percent lower than their high-end, full-throttle TurboFan JIT.

Oh, how could I forget WebAssembly? "One of the reasons applications target WebAssembly is to execute on the web with predictable high performance. ".

I've also been told by People Who Know™ that the super-amazing Graal JIT VM for Java is primarily used for its AOT capabilities by customers. This is ironic, because the aptly named Graal pretty much is the Sufficiently Smart Compiler that was always more of a joke than a real goal...but they achieved it nonetheless, and as a JIT no less! And now that we have it, it turns out customers mostly don't care. Instead, they care about predictability, start-up performance and memory consumption.

Last not least, Apple's Rosetta 2 translator for running Intel binaries on ARM is an AOT binary-to-binary translator. And it is much faster, relatively, than the original Rosetta or comparable JIT-based binary translators, with translated Intel binaries clocking in at around 80% the speed of native ARM binaries. Combined with the amazing performance of the M1, this makes M1 Macs frequently faster than Intel Macs at running Intel Mac binaries. (Of course it also includes a (much slower) JIT component as a fallback when the AOT compiler can't figure things out. Like when the target program is itself a JIT... )

Update: Oct. 3rd 2021

It appears that 45% of CVEs for V8 and more than half of "in the wild" Chrome exploits abused a JIT bug, which is why Microsoft is experimenting with what they call SDSM (Super Duper Secure Mode). What's SDSM? JavaScript without the JIT.

Unsurprisingly at this point, performance is affected far less than one might have guessed, even for the benchmarks, and even less in real-world tasks: "Anecdotally, we find that users with JIT disabled rarely notice a difference in their daily browsing.".

Update: Mar. 30th 2023

To my eyes, this looks like startup costs, much of which will be jit startup costs.

HN user jigawatts appears to confirm my suspicion:

Teams is an entire web server written in an interpreted language compiled on the fly using a "Just in Time" optimiser (node.js + V8) -- coupled to a web browser that is basically an entire operating system. You're not "starting Teams.exe", you are deploying a server and booting an operating system. Seen in those terms, 9 seconds is actually pretty good. Seen in more... sane terms, showing 1 KB of text after 9 seconds is about a billion CPU instructions needed per character.
. It also looks to be the root problem, well one of the root problems with Visual Studio's launch time.

Update: Apr. 1st 2023

"One of the most annoying features of Julia is its latency: The laggy unresponsiveness of Julia after starting up and loading packages." -- Julia's latency: Past, present and future

I find it interesting that the term "JIT" is never mentioned in the post, I guess it's just taken as a given.

I am pretty sure I've missed some developments.

Monday, January 21, 2013

More Objective-C Drawing Context Pleasantries

It's been a little over half a year since I first made my pleasant Objective-C drawing contextpublic, and I haven't been idle. In the process of retrofitting my own code to use MPWDrawingContext and adding more and more graphics (for example, I now do my icons in code), I've discovered a lot about making drawing with code a more pleasant experience.

Blocks

Blocks seem to be a really wonderful match for a graphics context, and most of the changes involve blocks in some way.

Bracketing operations such as gsave/grestore now have block versions, so the Objective-C block structure reflects the nesting:

    [context ingsave:^(Drawable c ){
        [c translate:@[ @130 ,@140]];
        [c setFont:[context fontWithName:@"ArialMT" size:345]];
        [c setTextPosition:NSMakePoint(0, 0)];
        [c show:@"\u2766"];
    }];
This is somewhat more compact than the plain code, which for correctness should also have a @try/@finally block wrapped around the basic drawing so exceptions don't mess up the graphics state stack.
    [context gsave];
    [context translate:@[ @130 ,@140]];
    [context setFont:[context fontWithName:@"ArialMT" size:345]];
    [context setTextPosition:NSMakePoint(0, 0)];
    [context show:@"\u2766"];
    [context grestore];
Similar for drawing shadows:
    [context withShadowOffset:NSMakeSize(0, -8 * scale) blur:12 * scale  color:[context  colorGray:0 alpha: 0.75] draw:^(Drawable c ){
        [[[c setFillColorGray:0.9 alpha:1.0] ellipseInRect:ellipseRect] fill];
    }];
Again, this seems a little clearer than having to explicitly set and unset, makes it harder to miss the end of the bracket when moving code around and remains exception-safe.
    [context sethadowOffset:NSMakeSize(0, -8 * scale) blur:12 * scale  color:[context  colorGray:0 alpha: 0.75]];
    [[[context setFillColorGray:0.9 alpha:1.0] ellipseInRect:ellipseRect] fill];
    [context clearShadow];

Stored, delayed and repeated drawing

You can create an object for later drawing by sending the -laterWithSize:(NSSize)size content:(DrawingBlock)commands message. For example, here is a simple diamond shape:
    NSSize diamondSize=NSMakeSize(16,16);
        id diamond = [context laterWithSize:diamondSize
                              content:^(id  context){
            id red = [context colorRed:1.0 green:0.0 blue:0.0 alpha:1.0];
            [context setFillColor:red];
            [[context moveto:diamondSize.width/2 :2] 
				lineto:diamondSize.width-2 :diamondSize.height/2];
            [[context lineto:diamondSize.width/2 :diamondSize.height-2]
				lineto:2 :diamondSize.height/2];
            [[context closepath] fill];
        }];
We can now draw this anywhere we want, and at any scale or orientation, using the -drawImage: message.
    [context drawImage:diamond];
You also have layerWitSize:content: and bitmapWithSize:content: messages if you want to specifically use CGLayer or CGImage instead, but using laterWithSize:content: preserves maximum quality, and it will automatically switch to a CGLayer when rendering to a PDF context in order to minimize PDF file size.

Patterns

I talked about patterns earlier. What I didn't mention then was that this is just the ability to use a stored set of drawing commands (see previous section) as a color:
    [context setColor:diamond];
I am not going to post the comparison to plain CG here, you can read it in the original Apple documentation.

I should note that this currently works for colored patterns, not for uncolored patterns, due to the fact that I haven't yet exposed color spaces. The basic process will be very similar.

Polymorphic object arguments

Path construction and graphics state messages with point arguments are now available in a version that takes a single object argument, in addition to the format with anonymous float arguments (moveto:(float)x :(float)y).
  • moveto:
  • lineto:
  • translate:
  • scale:
The single argument can be an Objective-C array:
    [context moveto:@[ @10, @20]];
Alternatively, any custom object that responds to count and either objectAtIndex:, (float)realAtIndex: or getReals:(float*)buffer length:(int)maxLen can be used. The scale: message can also take a single NSNumber and will treat that as uniform x and y scale.

Linecap parameters

Linecap parameters can now be set using distinct message:
  • setlinecapRound
  • setlinecapButt
  • setlinecapSquare
Having multiple messages rather than a single message with a parameter probably seems odd, but it actually reduces the number of names involved, and these names are nicely scoped to this context. The constant strings or enums that are typically used have global scope and therefore tend to need ugly and hard-to-remember prefixes:
    [context setlinecapRound];
vs.
    [context setLinecap: kCGContextLinecapRound];

Future

Another reason to be as purely message-based as possible is that it makes bridging to other languages easier, for example for interactive drawing environments: Creating a badge (youtube).

I've also started experimenting with other outputs, for example creating a version of the same badge composed of CALayer objects using the same drawing commands. Other output should follow, for example web with SVG or HTML 5 Canvas or direct OpenGL textures.

I also want to finally add image processing operations both stand-alone and as chained drawing contexts, as well as getting more complex text layout options in there.

p.s.: now on hacker news

Thursday, November 1, 2012

CGLayer performance and quality

The CoreGraphics CGLayer object seems to keep causing confusion. I hope I can clear that up a bit.

The intent of CGLayer was to have some sort of object for optimized repeated drawing, similar to how you would call a drawing procedure multiple times, have multiple instances of a view or simply draw the same PDF or bitmap image multiple times. The difference was supposed to be that the drawing layer could do secret magic "stuff" to optimize the repeated instance drawings. Not giving away the specific representation or optimizations performed means that whatever is done can be adapted both to different contexts and with changing technology.

Currently, that primarily means a texture stored on the graphics card, and that is and was one of the possible optimizations. However, the reasoning for such an opaque optimized object goes back further, at least to NeXTStep/Rhapsody: DisplayPostscript was implemented as a server process, and despite Mach's fast memory-remapping messages, shipping images back and forth between the client and server was not an optimization, and so you couldn't use (client side) bitmaps for optimized drawing. Well you could, but it wouldn't be optimized.

With Quartz becoming a client-side drawing library in Mac OS X, some of the reasons for these distinctions disappeared, but the distinctions (such as NSBitmapImageRep vs. NSCachedImageRep) remained. I think there were plans for CGLayer to become a kind of NSCachedImageRep on steroids, for example maintaining OpenGL display lists, but as far as I could tell, those plans never really panned out: in my tests drawing a CGLayer always looked the same as drawing a cached bitmap, and also performed similarly.

In 10.5, most of the accumulated cruft was cleaned up, NSBitmapImageRep for examples is now only a fairly thin wrapper around its underlying CGImage, and NSCachedImageRep was deprecated completely in 10.6, as it no longer has any advantages. With these changes, drawing a CGImage or a NSBitmapImageRep became as fast as possible, with CGImages stored on the graphics card. At this point CGLayer was essentially abandoned, and for drawing to the screen, you can just as easily use a NSBitmapImageRep, CGImage, UIImage etc.

So is CGLayer now completely useless? Not quite! It turns out that when drawing to a PDF context, CGLayer will generate a PDF Form object, which stores the original drawing commands (vectors, text, images). This is obviously much better than drawing a bitmap, both in terms of quality and in terms of file size and performance.

So if your plans include printing or generating a PDF, then I would recommend taking a look at CGLayer for repeated drawing of (vector) content.

Monday, October 15, 2012

CoreGraphics, patterns and resolution independence (not just) for retina displays

In a recent post with followup, Mark Granoff demonstrates how to intelligently deal with the need for higher resolution backgrounds by using CoreGraphics pattern images, particularly using the [UIColor colorWithPatternImage:] method. However, he does wonder why he still has to deal with retina resolution issues at some points in the code, when "…the docs say that CoreGraphics handles scaling issues automatically."

That's a good question, and the answer lies in the fact that the example uses pattern images and mask images, rather than CoreGraphics patterns and geometric primitives. Once you explicitly ask for bitmap representations, you will be dealing with pixels and different resolution. The clue is to avoid going to pixels as much and as long as possible. The doughnut shape, for example, can easily be achieved using basic geometry and a little knowledge of the Postscript/PDF fill rules.

Using the standard "nonzero-winding-number" rule, a doughnut effect can be achieved by having the two arcs that are nested inside each other drawn in opposite directions. That's one of the reasons the extra "clockwise" parameter exists.


  NSPoint centerPoint = NSMakePoint([view frame].size.width/2, 150);
  [context arcWithCenter:centerPoint
           radius:50 
           startDegrees:0
           endDegrees:360  
           clockwise:YES];
  [context arcWithCenter:centerPoint
           radius:100
           startDegrees:0
           endDegrees:360  
           clockwise:NO];
  [context fill];

(The code examples here use MPWDrawingContext for convenience, pure CoreGraphics code tends to be two to three times more verbose). The second way to achieve the doughnut would be to just use the even/odd fill rule, in which case the direction doesn't matter. matter.

Patterns can also be specified geometrically, or rather with callbacks to draw the pattern shape. Objective-C Blocks are really a perfect fit for specifying these sorts of callbacks, but were only introduced much later than the CoreGraphics pattern callback API. The following code shows how to specify the diamond pattern via an Objective-C block, courtesy of some glue API provided by MPWDrawingContext.


        NSSize patternSize=NSMakeSize(16,16);
        id diamond = [context laterWithSize:patternSize
                              content:^(id  context){
            id red = [context colorRed:1.0 green:0.0 blue:0.0 alpha:1.0];
            [context setFillColor:red];
            [[context moveto:patternSize.width/2 :2] 
				lineto:patternSize.width-2 :patternSize.height/2];
            [[context lineto:patternSize.width/2 :patternSize.height-2]
				lineto:2 :patternSize.height/2];
            [[context closepath] fill];
        }];
        [context setFillColor:diamond];
        [[context nsrect:[[self view] frame]] fill];

The "laterWithSize:content:" message creates a callback object that not only encapsulates the block, but also implements a -CGColor method so the callback can be used directly as a color in -setFillColor:.

With all the graphics specified using pure geometry, CoreGraphics can now do its thing and automatically handle varying device resolutions, wether it's a retina display or a zoomable interface or even print, all without ever having to deal with the different resolutions in code. Although I haven't tested it, the code should also use less memory, because it doesn't create potentially large temporary bitmaps, and for the cherry on top it's also a fraction of the code. CoreGraphics rules!

Forked project on github.