Showing posts with label Threading. Show all posts
Showing posts with label Threading. Show all posts

Wednesday, August 26, 2015

What Happens to OO When Processors Are Free?

A while ago, I presented as a crazy thought experiment the idea of using Montecito's transistor budget for creating a chip with tens of thousand of ARM cores. Well, it seems the idea wasn't so crazy after all: The SpiNNaker project is trying to build a system with a million ARM CPUs, and it is designing a custom chip with lots of ARM cores on it.

Of course they only have 1/6th the die area of the Montecito and are using a conservative 135nm process rather than the 95nm of the Montecito or the 15nm that is state of the art, so they have a much lower transistor budget. They also use the later ARM 9 core and add 54 SRAM banks with 32KB each (from the die picture, 3 per core), so in the end they "only" put 18 cores on the chip, rather than many thousands. Using a state of the art 14nm process would mean roughly 100 times more transistors, a Montecito-sized die another factor of six. At that point, we would be at 10000 cores per chip, rather than 18.

One of the many interesting features of the SpiNNaker project is that "the micro-architecture assumes that processors are ‘free’: the real cost of computing is energy." This has interesting consequences for potentially simplifying object- or actor-oriented programming. Alan Kay's original idea of objects was to scale down the concept of "computer", so every object is essentially a self-contained computer with CPU and storage, communicating with its peers via messages. (Erlang is probably the closest implementation of this concept).

In our core-scarce computing environments, this had to be simulated by multiplexing all (or most) of the objects onto a single von Neumann computer, usually with a shared address space. If cores are free and we have them in the tens of thousands, we can start entertaining the idea of no longer simulating object-oriented computing, but rather of implementing it directly by giving each object its own core and attached memory. Yes, utilization of these cores would probably be abysmal, but with free cores low utilization doesn't matter, and low utilization (hopefully) means low power consumption.

Even at 1% utilization, 10000 cores would still mean throughput equivalent to 100 ARM 9 cores running full tilt, and I am guessing pretty low power consumption if the transistors not being used are actually off. More important than 100 core-equivalents running is probably the equivalent of 100 bus interfaces running at full tilt. The aggregate on-chip memory bandwidth would be staggering.

You could probably also run the whole thing at lower clock frequencies, further reducing power. With each object having around 96KB of private memory to itself, we would probably be looking at coarser-grained objects, with pure data being passed between the objects (Objective-C or Erlang style) and possibly APL-like array extensions (see OOPAL). Overall, that would lead to de-emphasis of expression-oriented programming models, and a more architectural focs.

This sort of idea isn't new, the Transputer got there in the late 80ies, but it was conceived when Moore's law didn't just increase transistor counts, but also clock-frequencies, and so Intel could always bulldozer away more intelligent architectures with better fabs. This has stopped, clock-frequencies have been stagnant for a while and even geometries are starting to stutter. So maybe now the time for intelligent CPU architectures has finally come, and with it the impetus for examining our assumptions about programming models.

As always, comments welcome here or on Hacker News.

UPDATE: The kilo-cores are here:

  • Kilocore: 1000 processors, 1.78 Trillion ops/sec, and at 1.78pJ/Op super power-efficient, so at 150 GOps/s only uses 0.7 watts. On a 32nm process, so not yet maxed out.
  • GRVI Phalanx joins The Kilocore Club: 1680 cores.
No reports of any of them running actors, but ensembles might work :-)

Thursday, October 18, 2012

Little Message Dispatch, aka "Sending Primitives to the Main Thread"

Just ran across a Stack Overflow question on using primitives with performSelectorOnMainThread:. The original poster asks how he can send the message [myButton setEnabled:YES] from a background thread so it will execute on the main thread.

Alas, the obvious [myButton performSelectorOnMainThread:@selector(setEnabled:) withObject:(BOOL)YES waitUntilDone:YES]; is not only ugly, but also doesn't work. It used to kinda sorta work for scalar integer/pointer parameters that fit in a register, but it certainly wasn't a good idea and started breaking when Apple started to retain those parameters. Casting a BOOL to a pointer and back might work at times, sending it a retain will definitely not.

What to do? Well, I would suggest the following:



[[myButton onMainThread] setEnabled:YES];


Not only does it handle the primitives without a sweat, it is also succinct and readable. It is obviously implemented using Higher Order Messaging (now with Wikipedia page), and I actually have a number of these HOMs in MPWFoundation that cover the common use-cases:

@interface NSObject(asyncMessaging)

-async;
-asyncPrio;
-asyncBackground;
-asyncOnMainThread;
-onMainThread;
-asyncOn:(dispatch_queue_t)queue;
-asyncOnOperationQueue:(NSOperationQueue*)aQueue;
-afterDelay:(NSTimeInterval)delay;


@end

There is a little HOM_METHOD() Macro that generates both the trampoline method and the worker method, so the following code defines the -(void)onMainThread method that then uses performSelectorOnMainThread to send the NSInvocation to the main thread:
HOM_METHOD(onMainThread)
        [invocation performSelectorOnMainThread:@selector(invokeWithTarget:) withObject:self waitUntilDone:YES];
}

You can use MPWFoundation as is or take the above code and combine it with Simple HOM.

Monday, January 10, 2011

Little Message Dispatch

Brent Simmons's recent notes on threading show a great, limited approach to threading that appears to work well in practice. If you haven't read it and are at all interested in threading on OS X or iOS, I suggest you head over there right now.

I feel much the same way, that is although I think Grand Central Dispatch is awesome, I simply haven't been able to justify spending much time with it, because it usually turns out that my own threading needs so far have been far more modest than what GCD provides. In fact, I find that an approach that's even more constrained than the one based on NSOperationQueue that Brent describes has been working really well in a number of projects.

Instead of queueing up operations and letting them unwind however, I just spawn a single I/O thread (at most a few) and then have that perform the I/O deterministically. This is paired with a downloader that uses the NSURL loading system to download any number of requests in parallel.


- (void)downloadNewsContent
{       
        id pool=[NSAutoreleasePool new];
        
        [[self downloader] downloadRequests:[self thumbnailRequests]];
        [[self downloader] downloadRequests:[self contentRequests]];
        [[self downloader] downloadOnlyRequests:[self imageRequests]];
        [pool release];
}


This loads 3 types of objects: first the thumbnails, then article content, then images associated with the articles. The sequencing is both deliberate (thumbs first, article images cannot be loaded before the article content is present) and simply expressed in the code by the well-known means of just writing the actions one after the other, rather than having those dependencies expressed in call-backs, completion blocks or NSOperation subclasses.

So work is done semi-sequentially in the background, while coordination is done on the main thread, with liberal use of performSelectorOnMainThread. Of course, I make that a little simpler with a couple of HOMs that dispatch messages to threads:

  • async runs the message on a new thread, I use it for long-running, intrinsically self contained work. It is equivalent to performSelectorInBackground: except for being able to take an arbitrary message.
  • asyncOnMainThread and syncOnMainThread are the equivalents of performSelectorOnMainThread, with the waitUntilDone flag set to YES or NO
  • afterDelay: sends he message after the specified delay
Here is a bit of code that shows how to have a dispatch a long-running thread and have it communicate status to the main thread.

-(void)loadSections {
	[[self asyncOnMainThread] showSyncing];
	[[[self sections] do] downloadNewsContent];
	[[self asyncOnMainThread] showDoneSyncing];
}
 ...
 -(IBAction)syncButtonClicked {
	[[self async] loadSections];
}


Brent sums it up quite well in his post:
Here’s the thing about code: the better it is, the more it looks and reads like a children’s book.
Yep.

Sunday, May 2, 2010

iPhone XML Performance Revisited

Ray Wenderlich has done a great comparison of iPhone XML parsers, using the same sample I had looked at earlier in the context of responsiveness.

 

As Ray was comparing performance, my hobby-horse, I was obviously curious as to how MAX stacked up against all the upstart competition. Quite well, as it turns out (average of 5 runs on an iPad with 3.2):


Figure 1: total times (seconds)


MAX was about 50% faster than the closest competition, TBXML, at 0.43s vs. 0.61s.

 

However, the XMLPerformance sample is a bit weird in that it measures elapsed time, not CPU time, and is multi-threaded, updating the display as results come in.

 

In order to account for this overhead that has nothing to do with the XML parsers, I added a "Dummy" parser that doesn't actually parse anything, but rather just generates dummy Song entries as quickly as possible. This takes around 0.2 seconds all by itself. Subtracting this non-XML overhead from the total times yields the following results:


Figure 2: XML parse times (seconds)


When measuring just the XML parsers themselves, MAX is around twice as fast as the closest competition and seven times as fast as the built in NSXMLParser.

 

Sweet.

[Update]  I forgot the link to where I had uploaded the source: XMLPerformance.tgz at http://www.metaobject.com/downloads/Objective-C/

Wednesday, September 26, 2007

Objective-C future(s)

Via LtU, I got alerted to the fact that theEtoile project now has an implementation of futures. Cool.

However, their implementation has specific objects reacting asynchronously to messages, making it more similar to the actor model,which as they mention is also very much Alan Kay's original conceptual model for Smalltalk:

Bob Barton, the main designer of the B5000 and a professor at Utah had said in one of his talks a few days earlier: "The basic principle of recursive design is to make the parts have the same power as the whole." For the first time I thought of the whole as the entire computer and wondered why anyone would want to divide it up into weaker things called data structures and procedures. Why not divide it up into little computers, as time sharing was starting to? But not in dozens. Why not thousands of them, each simulating a useful structure? [Emphasis mine]
Actors are inherently asynchronous, each actor runs in a separate process/thread and messages arealso asynchronous, with the sender not waiting for the message to be delivered or ever gettinga return value. Of course the actor model also makes all objects active, so the Etoile model, whichonly makes objects of specific classes active, is somewhere inbetween.

Futures, on the other hand, as introduced in MULTLSIP (pdf), tryto integrate asynchronous execution into a traditional call/return control- and data-flow. So messages(or functions in MULTILSIP) appear to have normal synchronous semantics and immediately yielda return value, but when annotated with the future keyword execution of that return valueis done in a background thread and the immediate return value is just a proxy for the value that is still being computed.

In the HOM paper (pdf) presented at OOPSLA 2005, I also describe a Future implementationbased on Higher Order Messaging that comes very close to the way it was done in MULTILSIP. A -futureHOM is all that is needed to indicate that you would like a result computed in a background thread:

  result = [anObject lengthyOperation:parameter];           //  synchronous
  result = [[anObject future] lengthyOperation:parameter];  //  asynchronous with future
I am probably biased, but this seems about as easy-to-use as possible,with all the nasty machinery (worker-queues, lockless FIFOs, etc.)hidden behind a single -future message.