Showing posts with label Memory management. Show all posts
Showing posts with label Memory management. Show all posts

Friday, November 13, 2020

M1 Memory and Performance

The M1 Macs are out now, and not only does Apple claim they're absolutely smokin', early benchmarks seem to confirm those claims. I don't find this surprising, Apple has been highly focused on performance ever since Tiger, and as far as I can tell hasn't let up since.

One maybe somewhat surprising aspect of the M1s is the limitation to "only" 16 Gigabytes of memory. As someone who bought a 16 Kilobyte language card to run the Merlin 6502 assembler on his Apple ][+ and expanded his NeXT cube, which isn't that different from a modern Mac, to a whopping 16 Megabytes, this doesn't actually seem that much of a limitation, but it did cause a bit of consternation.

I have a bit of a theory as to how this "limitation" might tie in to how Apple's outside-the-box approach to memory and performance has contributed to the remarkable achievement that is the M1.

The M1 is apparently a multi-die package that contains both the actual processor die and the DRAM. As such, it has a very high-speed interface between the DRAM and the processors. This high-speed interface, in addition to the absolutely humongous caches, is key to keeping the various functional units fed. Memory bandwidth and latency are probably the determining factors for many of today's workloads, with a single access to main memory taking easily hundreds of clock cycles and the CPU capable of doing a good number of operations in each of these clock cycles. As Andrew Black wrote: "[..] computation is essentially free, because it happens 'in the cracks' between data fetch and data store; ..".

The tradeoff is that you can only fit so much DRAM in that package for now, but if it fits, it's going to be super fast.

So how do we make sure it all fits? Well, where Apple might have been "focused" on performance for the last 15 years or so, they have been completely anal about memory consumption. When I was there, we were fixing 32 byte memory leaks. Leaks that happened once. So not an ongoing consumption of 32 bytes again and again, but a one-time leak of 32 bytes.

That dedication verging on the obsessive is one of the reasons iPhones have been besting top-of-the-line Android phone that have twice the memory. And not by a little, either.

Another reason is the iOS team's steadfast refusal to adopt tracing garbage collection as most of the rest of the industry did, and macOS's later abandonment of that technology in favor of the reference counting (RC) they've been using since NeXTStep 4.0. With increased automation of those reference counting operations and the addition of weak references, the convenience level for developers is essentially indistinguishable from a tracing GC now.

The benefit of sticking to RC is much-reduced memory consumption. It turns out that for a tracing GC to achieve performance comparable with manual allocation, it needs several times the memory (different studies find different overheads, but at least 4x is a conservative lower bound). While I haven't seen a study comparing RC, my personal experience is that the overhead is much lower, much more predictable, and can usually be driven down with little additional effort if needed.

So Apple can afford to live with more "limited" total memory because they need much less memory for the system to be fast. And so they can do a system design that imposes this limitation, but allows them to make that memory wicked fast. Nice.

Another "well-known" limitation of RC that has made it the second choice compared to tracing GC is the fact that updating those reference counts all the time is expensive, particularly in a multi-threaded environment where those updates need to be atomic. Well...

How? Problem solved. I guess it helps if you can make your own Silicon ;-)

So Apple's focus on keeping memory consumption under control, which includes but is not limited to going all-in on reference counting where pretty much the rest of the industry has adopted tracing garbage collection, is now paying off in a majory way ("bigly"? Too soon?). They can get away with putting less memory in the system, which makes it possible to make that memory really fast. And that locks in an advantage that'll be hard to duplicate.

It also means that native development will have a bigger advantage compared to web technologies, because native apps benefit from the speed and don't have a problem with the memory limitations, whereas web-/electron apps will fill up that memory much more quickly.

Monday, October 15, 2012

CoreGraphics, patterns and resolution independence (not just) for retina displays

In a recent post with followup, Mark Granoff demonstrates how to intelligently deal with the need for higher resolution backgrounds by using CoreGraphics pattern images, particularly using the [UIColor colorWithPatternImage:] method. However, he does wonder why he still has to deal with retina resolution issues at some points in the code, when "…the docs say that CoreGraphics handles scaling issues automatically."

That's a good question, and the answer lies in the fact that the example uses pattern images and mask images, rather than CoreGraphics patterns and geometric primitives. Once you explicitly ask for bitmap representations, you will be dealing with pixels and different resolution. The clue is to avoid going to pixels as much and as long as possible. The doughnut shape, for example, can easily be achieved using basic geometry and a little knowledge of the Postscript/PDF fill rules.

Using the standard "nonzero-winding-number" rule, a doughnut effect can be achieved by having the two arcs that are nested inside each other drawn in opposite directions. That's one of the reasons the extra "clockwise" parameter exists.


  NSPoint centerPoint = NSMakePoint([view frame].size.width/2, 150);
  [context arcWithCenter:centerPoint
           radius:50 
           startDegrees:0
           endDegrees:360  
           clockwise:YES];
  [context arcWithCenter:centerPoint
           radius:100
           startDegrees:0
           endDegrees:360  
           clockwise:NO];
  [context fill];

(The code examples here use MPWDrawingContext for convenience, pure CoreGraphics code tends to be two to three times more verbose). The second way to achieve the doughnut would be to just use the even/odd fill rule, in which case the direction doesn't matter. matter.

Patterns can also be specified geometrically, or rather with callbacks to draw the pattern shape. Objective-C Blocks are really a perfect fit for specifying these sorts of callbacks, but were only introduced much later than the CoreGraphics pattern callback API. The following code shows how to specify the diamond pattern via an Objective-C block, courtesy of some glue API provided by MPWDrawingContext.


        NSSize patternSize=NSMakeSize(16,16);
        id diamond = [context laterWithSize:patternSize
                              content:^(id  context){
            id red = [context colorRed:1.0 green:0.0 blue:0.0 alpha:1.0];
            [context setFillColor:red];
            [[context moveto:patternSize.width/2 :2] 
				lineto:patternSize.width-2 :patternSize.height/2];
            [[context lineto:patternSize.width/2 :patternSize.height-2]
				lineto:2 :patternSize.height/2];
            [[context closepath] fill];
        }];
        [context setFillColor:diamond];
        [[context nsrect:[[self view] frame]] fill];

The "laterWithSize:content:" message creates a callback object that not only encapsulates the block, but also implements a -CGColor method so the callback can be used directly as a color in -setFillColor:.

With all the graphics specified using pure geometry, CoreGraphics can now do its thing and automatically handle varying device resolutions, wether it's a retina display or a zoomable interface or even print, all without ever having to deal with the different resolutions in code. Although I haven't tested it, the code should also use less memory, because it doesn't create potentially large temporary bitmaps, and for the cherry on top it's also a fraction of the code. CoreGraphics rules!

Forked project on github.

Sunday, September 20, 2009

Cocoa(touch) memory management is as easy as 1-2-3

There is a common misconception that Cocoa memory management is hard. It's not.

  1. Use auto-generated accessors religiously
  2. Release your instance variables in dealloc
  3. Always use convenience methods to create objects
Wow, that wasn't too hard!