Friday, November 13, 2020

M1 Memory and Performance

The M1 Macs are out now, and not only does Apple claim they're absolutely smokin', early benchmarks seem to confirm those claims. I don't find this surprising, Apple has been highly focused on performance ever since Tiger, and as far as I can tell hasn't let up since.

One maybe somewhat surprising aspect of the M1s is the limitation to "only" 16 Gigabytes of memory. As someone who bought a 16 Kilobyte language card to run the Merlin 6502 assembler on his Apple ][+ and expanded his NeXT cube, which isn't that different from a modern Mac, to a whopping 16 Megabytes, this doesn't actually seem that much of a limitation, but it did cause a bit of consternation.

I have a bit of a theory as to how this "limitation" might tie in to how Apple's outside-the-box approach to memory and performance has contributed to the remarkable achievement that is the M1.

The M1 is apparently a multi-die package that contains both the actual processor die and the DRAM. As such, it has a very high-speed interface between the DRAM and the processors. This high-speed interface, in addition to the absolutely humongous caches, is key to keeping the various functional units fed. Memory bandwidth and latency are probably the determining factors for many of today's workloads, with a single access to main memory taking easily hundreds of clock cycles and the CPU capable of doing a good number of operations in each of these clock cycles. As Andrew Black wrote: "[..] computation is essentially free, because it happens 'in the cracks' between data fetch and data store; ..".

The tradeoff is that you can only fit so much DRAM in that package for now, but if it fits, it's going to be super fast.

So how do we make sure it all fits? Well, where Apple might have been "focused" on performance for the last 15 years or so, they have been completely anal about memory consumption. When I was there, we were fixing 32 byte memory leaks. Leaks that happened once. So not an ongoing consumption of 32 bytes again and again, but a one-time leak of 32 bytes.

That dedication verging on the obsessive is one of the reasons iPhones have been besting top-of-the-line Android phone that have twice the memory. And not by a little, either.

Another reason is the iOS team's steadfast refusal to adopt tracing garbage collection as most of the rest of the industry did, and macOS's later abandonment of that technology in favor of the reference counting (RC) they've been using since NeXTStep 4.0. With increased automation of those reference counting operations and the addition of weak references, the convenience level for developers is essentially indistinguishable from a tracing GC now.

The benefit of sticking to RC is much-reduced memory consumption. It turns out that for a tracing GC to achieve performance comparable with manual allocation, it needs several times the memory (different studies find different overheads, but at least 4x is a conservative lower bound). While I haven't seen a study comparing RC, my personal experience is that the overhead is much lower, much more predictable, and can usually be driven down with little additional effort if needed.

So Apple can afford to live with more "limited" total memory because they need much less memory for the system to be fast. And so they can do a system design that imposes this limitation, but allows them to make that memory wicked fast. Nice.

Another "well-known" limitation of RC that has made it the second choice compared to tracing GC is the fact that updating those reference counts all the time is expensive, particularly in a multi-threaded environment where those updates need to be atomic. Well...

How? Problem solved. I guess it helps if you can make your own Silicon ;-)

So Apple's focus on keeping memory consumption under control, which includes but is not limited to going all-in on reference counting where pretty much the rest of the industry has adopted tracing garbage collection, is now paying off in a majory way ("bigly"? Too soon?). They can get away with putting less memory in the system, which makes it possible to make that memory really fast. And that locks in an advantage that'll be hard to duplicate.

It also means that native development will have a bigger advantage compared to web technologies, because native apps benefit from the speed and don't have a problem with the memory limitations, whereas web-/electron apps will fill up that memory much more quickly.

Tuesday, September 15, 2020

Pointers are Easy, Optimization is Complicated

Just recently came across Ralf Jung's 2018 post titled Pointers are Complicated. The central thesis is that the model that most C (and assembly language) programmers have that a pointer is just an integer that happens to be a machine address is wrong, in fact the author flat out states: "Pointers are definitely not integers."

That's a strong statement. I like strong statements, because they make a discussion possible. So let's respond in kind: the claim that pointers are definitely not integers is wrong.

The example

The example the author uses to show that pointers are definitely not integers is the following:
int test() {
    auto x = new int[8];
    auto y = new int[8];
    y[0] = 42;
    int i = /* some side-effect-free computation */;
    auto x_ptr = &x[i];
    *x_ptr = 23;
    return y[0];

And this is the crux of the reasoning:
It would be beneficial to be able to optimize the final read of y[0] to just return 42. The justification for this optimization is that writing to x_ptr, which points into x, cannot change y.
So pointers are "hard" and "not integers" because they conflict with this optimization that "would be beneficial".

I find this fascinating: a "nice to have" optimzation is so obviously more important than a simple and obvious pointer model that it doesn't even need to be explained as a possible tradeoff, never mind justified as to why the tradeoff is resolved in favor of the nice-to-have optimization.

I prefer the simple and obvious pointer model. Vastly.

This way of placing the optimizer's concerns far ahead of the programmer's is not unique, if you check out Chris Lattner's What Every C Programmer Should Know About Undefined Behavior, you will note the frequent occurrence of the phrase "enables ... optimizations". It's pretty much the only justification ever given.

I call this now industry-dominating style of programming Compiler Optimizer Creator Oriented Programming (COCOP). It was thoroughly critiqued in What every compiler writer should know about programmers or “Optimization” based on undefined behaviour hurts performance (pdf).

Pointers as Integers

There are certainly machines where pointers are not integers, the most prominent being 8086/80286 16 bit segmented mode, where a (far) pointer consists of a segment and an offset. On 8086, the segment is simply shifted left 4 bits and added to the offset, on 80286 the segment can be located anywhere in memory or not be resident, implementing a segmented virtual memory. AFAIK, these modes are simplified variants of the iAPX 432 object memory model.

What's important to note in this context is that the iAPX 432 and its memory model failed horribly, and industry actively and happily moved away from the x86 segmented model to what is called a "flat address space", common on other architectures and finally also adopted by Intel with the 386.

The salient feature of a "flat address space" is that a pointer is an integer, and in fact this eqivalence is also rather influential on CPU architecture, with address-space almost universally tied to the CPU's integer size. So although the 68K was billed as a 16 bit CPU (or 16/32), its registers were actually 32 bits, and IIRC its address ALUs were fully 32 bit, so if you wanted to do some kinds of 32 bit aritmetic, the LEA (Load Effective Address) instruction was your friend. The reason for the segmented architecturee on the 8086 was that it was a true 16 bit machine, with 16 bit registers, but Intel wanted to have a 20 bit address space.

So not only was and is there an equivalence of pointers and integers, this state of affairs was one that was actively sought and joyously received once we achieved it again. Giving it up for nice-to-have optimizations seems at best debatable, but at the very least it is something that should be discussed/debated, rather than simply assumed away.

Sunday, June 21, 2020

Beyond Faster JSON Support for iOS/macOS, Part 9: CSV and SQLite

When looking at the MPWPlistStreaming protocol that I've been using for my JSON parsing series, one thing that was probably noticeable is that it isn't particularly JSON-focused. In fact, it wasn't even initially designed for parsing, but for generating.

So could we use this for other de-serialization tasks? Glad you asked!

CSV parsing

One of the examples in my performance book involves parsing Comma Separated Values quickly, within the context of getting the time to convert a 139Mb GTFS file to something usable on the phone down from 20 minutes using using CoreData/SQLite to slightly less than a second using custom in-memory data structures that are also several orders of magnitude faster to query on-device.

The original project's CVS parser took around 18 seconds, which wasn't a significant part of the 20 minutes, but when the rest only took a couple of hundred milliseconds, it was time to make that part faster as well. The result, slightly generalized, is MPWDelimitedTable ( .h .m ).

The basic interface is block-based, with the block being called for every row in the table, called with a dictionary composed of the header row as keys and the contents of the row as values.

-(void)do:(void(^)(NSDictionary* theDict, int anIndex))block;

Adapting this to the MPWPlistStreaming protocol is straightforward:
-(void)writeOnBuilder:(id )builder
    [builder beginArray];
    [self do:^(NSDictionary* theDict, int anIndex){
        [builder beginDictionary];
        for (NSString *key in self.headerKeys) {
            [builder writeObject:theDict[key] forKey:key];
        [builder endDictionary];
    [builder endArray];

This is a quick-and-dirty implementation based on the existing API that is clearly sub-optimal: the API we call first constructs a dictionary from the row and the header keys and then we iterate over it. However, it works with our existing set of builders and doesn't build an in-memory representation of the entire CSV.

It will also be relatively straightforward to invert this API usage, modifying the low-level API to use MPWPlistStreaming and then creating a higher-level block- and dictionay-based API on top of that, in a way that will also work with other MPWPlistStreaming clients.


Another tabular data format is SQL data bases. On macOS/iOS, one very common database is SQLite, usually accessed via CoreData or the excellent and much more light-weight fmdb.

Having used fmdb myself before, and bing quite delighted with it, my first impulse was to write a MPWPlistStreaming adapter for it, but after looking at the code a bit more closely, it seemed that it was doing quite a bit that I would not need for MPWPlistStreaming.

I also think I saw the same trade-off between a convenient and slow convenience based on NSDictionary and a much more complex but potentially faster API based on pulling individual type values.

So Instead I decided to try and do something ultra simple that sits directly on top of the SQLite C-API, and the implementation is really quite simple and compact:

@interface MPWStreamQLite()

@property (nonatomic, strong) NSString *databasePath;


@implementation MPWStreamQLite
    sqlite3 *db;

    self=[super init];
    self.databasePath = newpath;
    return self;

    sqlite3_stmt *res;
    int rc = sqlite3_prepare_v2(db, [sql UTF8String], -1, &res, 0);
    @autoreleasepool {
        [self.builder beginArray];
        int step;
        int numCols=sqlite3_column_count(res);
        NSString* keys[numCols];
        for (int i=0;i < numCols; i++) {
            keys[i]=@(sqlite3_column_name(res, i));
        while ( SQLITE_ROW == (step = sqlite3_step(res))) {
            @autoreleasepool {
                [self.builder beginDictionary];
                for (int i=0; i < numCols; i++) {
                    const char *text=(const char*)sqlite3_column_text(res, i);
                    if (text) {
                        [self.builder writeObject:@(text) forKey:keys[i]];
                [self.builder endDictionary];
        [self.builder endArray];
    return rc;

    return sqlite3_open([self.databasePath UTF8String], &db);

    if (db) {

Of course, this doesn't do a lot, chiefly it only reads, no updates, inserts or deletes. However, the code is striking in its brevity and simplicity, while at the same time being both convenient and fast, though with still some room for improvement.

In my experience, you tend to not get all three of these properties at the same time: code that is simple and convenient tends to be slow, code that is convenient and fast tends to be rather tricky and code that's simple and fast tends to be inconvenient to use.

How easy to use is it? The following code turns a table into an array of dictionaries:

#import <MPWFoundation/MPWFoundation.h>

int main(int argc, char* argv[]) {
   MPWStreamQLite *db=[[MPWStreamQLite alloc] initWithPath:@"chinook.db"];
   db.builder = [MPWPListBuilder new];
   if( [db open] == 0 ) {
       [db exec:@"select * from artists;"];
       NSLog(@"results: %@",[db.builder result]);
       [db close];
   } else {
       NSLog(@"Can't open database: %s\n", [db error]);

This is pretty good, but probably roughly par for the course for returning a generic data structure such as array of dictionaries, which is not going to be particularly efficient. (One of my first clues that CoreData's predecessor EOF wasn't particularly fast was when I read that fetching raw dictionaries was an optimization, much faster than fetching objects.)

What if we want to get objects instead? Easy, just replace the MPWPListBuilder with an MPWObjectBuilder, parametrized with the class to create. Well, and define the class, but presumably you already havee that if the task is to convert to objects of that class. And it cold obviously also be automated.

#import <MPWFoundation/MPWFoundation.h>

@interface Artist : NSObject { }

@property (assign) long ArtistId;
@property (nonatomic,strong) NSString *Name;


@implementation Artist

	return [NSString stringWithFormat:@"<%@:%p id: %ld name: %@>",[self class],self,self.ArtistId,self.Name];


int main(int argc, char* argv[]) {
   MPWStreamQLite *db=[[MPWStreamQLite alloc] initWithPath:@"chinook.db"];
   db.builder = [[MPWObjectBuilder alloc] initWithClass:[Artist class]];
   if( [db open] == 0) {
       [db exec:@"select * from artists"];
       NSLog(@"results: %@",[db.builder result]);
       [db close];
   } else {
       NSLog(@"Can't open database: %s\n", [db error]);

Note that this does not generate a plist representation as an intermediate step, it goes straight from database result sets to objects. The generic intermediate "format" is the MPWPlistStreaming protocol, which is a dematerialized representation, both plist and objects are peers.

Sunday, June 14, 2020

The Curious Case of Swift's Adoption of Smalltalk Keyword Syntax

I was really surprised to learn that Swift recently adopted Smalltalk keyword syntax: [Accepted] SE-0279: Multiple Trailing Closures. That is: a keyword terminated by a colon, followed by an argument and without any surrounding braces.

The mind boggles.

A little.

Of course, Swift wouldn't be Swift if this weren't a special case of a special case, specifically the case of multiple trailing closures, which is a special case of trailing closures, which are weird and special-casey enough by themselves. Below is an example:

UIView.animate(withDuration: 0.3) {
  self.view.alpha = 0
} completion: { _ in

Note how the arguments to animate() would seem to terminate at the closing parenthesis, but that's actually not the case. The curly braces after the closing paren start a closure that is actually also an argument to the method, a so-called trailing closure. I have a little bit of sympathy for this construct, because closures inside of the parentheses look really, really awkward. (Of course, all params apart from a sole x inside f(x) look awkward, but let's not quibble. For now.).

Another thing this enables is methods that reasonably resemble control structures, which I heard is a really great idea.

The problem is that sometimes you have more than one closure argument, and then just stacking them up behind what appears to be end of the function/method call gets really, really awkward, and you can't tell which block is which argument, because the trailing closure doesn't get a keyword.

Well, now it does. And we now have 4 different method syntaxes in one!

  1. Traditional C/Pascal/C++/Java function call syntax x.f()
  2. The already weird-ish addition of Smalltalk/Objective-C keywords inside the f(x) syntax: f(arg:x)
  3. Original trailing-closure syntax, which is just its own thing, for the first closure
  4. Smalltalk non-brackted keyword syntax for the 2nd and subsequent closures.
That is impressive, in a scary kind of way.
Swift is a crescendo of special cases stopping just short of the general; the result is complexity in the semantics, complexity in the behaviour (i.e. bugs), and complexity in use (i.e. workarounds).
In understand that this proposal was quite controversial, with heated discussion between opponents and proponents. I understand and sympathize with both sides. On the one hand, this is markedly better than alternatives. On the other hand it is a special case of a special case that is difficult to justify as an addition of all that is already there.

Special cases beget special cases beget special cases.

Of course the answer was always there: Smalltalk keyword syntax is not just the only reasonable solution in this case, it also solves all the other cases. It is the general solution. Here's how this could look in Objective-Smalltalk (which uses curly braces instead for closures instead of Smalltalk-80's square brackets):

UIView animate:{ self.view.alpha ← 0. } withDuration:0.3 completion:{ self view removeFromSuperview. }.

No special cases, every argument is labeled, no syntax mush of brackets inside parentheses etc. And yes, this also handles user-defined control structures, to:do: is just a method on NSNumber:

1 to:10 do:{:i | stdout println:"I will not introduce {i} special cases willy nilly.".}.

And since keywords naturally go between their arguments, there is no need for "operators", as a very different and special syntax form. You just allow some "binary" keywords to look a little different, so instead of 2 multiply:3 you can write 2 * 3. And when you have 2 raisedTo:3 instead of pow(2,3) (with the signature: func pow(_ x: Decimal, _ y: Int) -> Decimal), do you really neeed to go to the trouble of defining an "operator"?

Or Swift's a as b, another special kind of syntax. How about a as:b? (Yes I know there are details, but those are ... details.). And so on and so forth.

But of course, it's too late now. When I chose Smalltalk as the base syntax for the language that has turned into Objective-Smalltalk, it wasn't just because I just like it or have gotten used to it via Objective-C. Smalltalk's syntax is surprisingly flexible and general, Smalltalk APIs look a lot like DSLs, without any of the tooling or other overheads.

And that's the frustrating part: this stuff was and is available and well-known. At least if you bother to look and/or ask. But instead, we just choose these things willy-nilly and everybody has to suffer the consequences.


I guess what I am trying to get at is that if you'd thought things through just a little bit, you could have had almost the entire syntax of your language for the cost (complexity, implementation size and brittleness, cognitive load, etc.) of this one special case of a special case. And it would have been overall better to boot.

Monday, June 1, 2020

MPWTest Only Tests Frameworks

It should be noted, if it wasn't obvious, that MPWTest is opinionated software, meaning it achieves some of its smoothness by gleefully embracing constraints that some might view as potentially crippling limitations.

Maybe the biggest of these constraints, mentioned in the previous post, is that MPWTest only tests frameworks. This means that the following workflow is not supported out of the box:

The point being that this is a workflow I not just somewhat indifferently do not want, but rather emphatically and actively want to avoid. Tests that are run (only?) when launching the app are application tests. My perspective is that unit tests are an integral part of the class. This may seem a subtle distinction, but subtle differences in something you do constantly can have huge impacts. "Steter Tropfen hรถhlt den Stein."

Another aspect is that launching the app for testing as a permanent and fixed part of your build process seems highly annoying at best. Linker finishes, app pops up, runs for a couple of seconds, shuts down again. I don't see that as viable. For testing to be integral and pervasive, it has to be invisible when the tests succeed.

The testing pyramid is helpful here: my contention is that you want to be at the bottom of that pyramid, ideally all of the time. Realistically, you're probably not going to get there, but you should push really, really hard, even making sacrifices that appear to be unreasonable to achieve that goal.

Framework-oriented programming

Only testing frameworks begs the question as to how to test those parts of the application not in frameworks. For me the answer is simple: there isn't any production code outside of frameworks.

None. Not the UI, not the application delegate. Only the auto-generated main().

The benefits of this approach are plentiful, the effort minimal. And if you think this is an, er, eccentric position to take, the program you almost certainly use to create apps for iOS/macOS etc. takes the same eccentric position: Xcode's main executable is 45K in size and only contains a main() function and some Swift boilerplate.

If all your code is in frameworks, only testing frameworks is not a problem. That may seem like a somewhat extreme case of sour grapes, with the arbitrary limitations of a one-off unit testing framework driving major architectural decisions, but the causality is the other way around: I embraced framework-oriented programming before and independently of MPWTest.


Another issue is iOS. Running a command-line tool that dynamically loads and tests frameworks is at least tricky and may be impossible, so that approach currently does not work. My current approach is that I view on-device and on-simulator tests as higher-up in the testing hierarchy: they are more costly, less numerous and run less frequently.

The vast majority of code lives in cross-platform frameworks (see: Ports and Adapters) and is developed and tested primarily on macOS. I have found this to be much faster than using the simulator or a device in day-to-day programming, and have used this "mac-first" technique even on projects where we were using XCTest.

Although not testing on the target platform may be seen as a problem, I have found discrepancies to be between exceedingly rare and non-existent, with "normal" code trending towards the latter. One of the few exceptions in the not-quite-so-normal code that I sometimes create was the change of calling conventions on arm64, which meant that plain method pointers (IMPs) no longer worked, but had to be cast to the "correct" pointer type, only on device. Neither macOS nor the simulator would show the problem.

For that purpose, I hacked together a small iOS app that runs the tests specified in a plist in the app bundle. There is almost certainly a better way to handle this, but I haven't had the cycles or motivation to look into it.

How to approximate

So you can't or don't want to adopt MPWTest. That doesn't mean you can't get at least some of the benefits of the approach. As a start, instead of using Cmd-B in Xcode to build, just use Cmd-U instead. That's what I did when working on Wunderlist, where we used XCTest.

Second, adopt framework-oriented programming and the Ports and Adapters style as much as possible. Put all your code in frameworks, and as much as possible in cross-platform frameworks that you can test/run on macOS, and even if you are developing exclusively for iOS, create a macOS target for that framework. This makes using Cmd-U to build much less painful.

Third, adhere to a strict 1:1 mapping between production classes and test classes, and place your test classes in the same file as the class they are testing.

My practical experience with both JUnit and XCTest on medium-sized projects does not square with the assertion that the difference is not that big: you still have to create these additional classes, they have to communicate with the class under tests (self in MPWTest), you have to track changes etc. And of course, you have to know to configure und use the framework differently from the way it was built, intended and documented. And what I've seen of OCUnit use was that the tests were not co-located with the class, but in a separate part of the project.

A final note is that the trick of interchangeably using the class as the test fixture is only really possible in a language like Objective-C where classes are first class objects. It simply wouldn't be possible in Java. This is how the class can test itself, and the tests become an integral part of the class, rather than something that's added somewhere else.

Saturday, May 30, 2020

MPWTest: Reducing Test Friction by Going Beyond the xUnit Model

By popular demand, a quick rundown of MPWTest (“The Simplest Testing Framework That Could Possibly Work”), my own personal unit testing framework, and how it makes TDD fast, fun, and frictionless.

I created MPWTest because once I had been bitten by the TDD bug, I definitely did not want to write software without TDD ever again, if I could help it. This was long before XCTest, and even its precursor SenTestKit was in at best in parallel development, I certainly wasn't aware of it.

It is a bit different, and the differences make it sufficiently better that I much prefer it to the xUnit variants that I've worked with (JUnit, some SUnit, XCTest). All of these are vastly better than not doing TDD, but they introduce significant amounts of overhead, friction, that make the testing experience much more cumbersome than it needs to be, and to me at least partly explains some of the antipathy I see towards unit testing from developers.

The attitude I see is that testing is like eating your vegetables, you know it's supposed to be good for you and you do it, grudgingly, but it really is rather annoying and the benefits are more something you know intellectually.

For me with MPWTest, TDD is also still intellectually a Good Thing™, but also viscerally fun, less like vegetables and more like tasty snacks, except that those snacks are not just yummy, but also healthy. It helps me stay in the flow and get things done.

What it does is let me change code quickly and safely, the key to agile:

Here is how it works.


First you need to build the testlogger binary of the MPWTest project. I put mine in /usr/local/bin and forget about it. You can put it anywhere you like, but will have to adjust the paths in what follows.

Next, add a "Script" build phase to your (framework) project. MPWTest currently only tests frameworks.


if  [ -f ${tester}  ]  ; then
    $tester ${framework}
    echo "projectfile:0:1: warning: $tester or  $framework  not found, tests not run"

The bottom of the Build Phases pane of your project should then look something roughly like the following:

There is no separate test bundle, no extra targets, nada. This may not seem such a big deal when you have just a single target, but once you start getting having a few frameworks, having an additional test target for each really starts to add up. And adds a decision-point: should I really create an additional test bundle for this project? Maybe I can just repurpose this existing one?


In the class to be tested, add the +(NSArray*)testSelectors method, returning the list of tests to run/test methods to execute. Here is an example from the JSON parser I've been writing about:

  return @[

You could also determine these names automagically, but I prefer the explicit list as part of the specification: these are the tests that should be run. Otherwise it is too easy to just lose a test to editing mistakes and be none the wiser for it.

Then just implement a test, for example testUnicodeEscapes:

	MPWMASONParser *parser=[MPWMASONParser parser];
	NSData *json=[self frameworkResource:@"unicodeescapes" category:@"json"];
	NSArray *array=[parser parsedData:json];
	NSString *first = [array objectAtIndex:0];
	INTEXPECT([first length],1,@"length of parsed unicode escaped string");
	INTEXPECT([first characterAtIndex:0], 0x1234, @"expected value");
	IDEXPECT([array objectAtIndex:1], @"\n", @"second is newline");

Yes, this is mostly old code. The macros do what you, er, expect: INTEXPECT() expects integer equality (or other scalars, to be honest), IDEXPECT() expects object equality. There are also some conveniences for nil, not nil, true and false, as well as a specialized one for floats that sets an acceptable range.

In theory, you can put these methods anywhere, but I tend to place them in a testing category at the bottom of the file.


#import "DebugMacros.h"

@implementation MPWMASONParser(testing)

The DebugMacros.h header has the various EXPCECT() macros. The header is the only dependency in your code, you do not need to link anything.

Even more than not having a separate test bundle, not having a separate test class (-hierarchy) really simplifies things. A lot.

First, there is no question as to where to find the tests for a particular class: at the bottom of the file, just scroll down. Same for the class for some tests: scroll up. I find this incredibly useful, because the tests serve as specification, documentation and example code for class.

There is also no need to maintain parallel class hierarchies, which are widely regarded as a fairly serious code-smell, for the obvious reasons: the need to keep those hierarchies in sync along with the problems once they do get out of sync, which they will, etc.


After the setup, you just build your projects, the tests will be run automatically as part of the build. If there are test failures, they are reported by Xcode as you would expect:

My steps tend to be:

  1. add name of test to +testSelectors,
  2. hit build to ensure tests are red,
  3. while Xcode builds, add empty test method,
  4. hit build again to ensure tests are now green,
  5. either add an actual EXPECT() for the test,
  6. or an EXPECTTRUE(false,@"impelemented") as placeholder
This may seem like a lot of steps, but it's really mostly just letting Xcode check things while I am doing the edits that need to be done anyhow. Hitting Cmd-B a couple of times while editing doesn't hurt.

The fact that tests run as part of every build, because you cannot build without running the tests, gives you a completely different level of confidence in your code, which translates to courage.

Running the tests all the time is also splendid motivation to keep those tests green, because if the tests fail, the build fails. And if the build fails, you cannot run the program. Last not least, running the tests on every build also is strong motivation to keep those tests fast. Testing just isn't this separate activity, it's as integral a part of the development process as writing code and compiling it.


There are some drawbacks to this approach, one that the pretty Xcode unit test integration doesn't work, as when this was done Apple had already left the platform idea behind and was only focused on making an integrated solution.

As noted above, displaying test failures as errors and jumping to the line of the failed test-expectation does work. This hooks into the mechanism Xcode uses to get that information from compilers, which simply output the line number and error message on stdout. Any tool that formats its output the same way will work wth Xcode.

In the end, while I do enojoy the blinkenlights of Xcode's unit test integration, and being able to run tests individually with simple mouse-click, all this bling really just reinforces that idea of tests as a separate entity. If my tests are always run and are always green, and are always fast, then I don't need or even want UI for them, the UI is a distraction, the tests should fade into the background.

Another slightly more annoying issue is debugging: as the tests are run as part of the build, a test failure is a build failure and will block any executables from running. However, Xcode only debugs executables, so you can't actually get to a debuggable run session.

As I don't use debuggers all that much, and failure in TDD usually manifests itself in test failure rather than something you need the debugger to track, this hasn't been much of a problem. In the past, I would then just revert to the command line, for example with lldb testlogger MPWFoundation to debug my foundation framework, as you can't actually run a framewework. Or so I thought. Only receently did I find out that you can set an executable parameter in your target's build scheme. I now set that to testlogger and can debug the framework to my heart's content.

Leaving the problem of Xcode not actually letting me run the executable due to the build failing, and as far as I know having no facility for debugging build phases.

The workaround for that is temporarily disabling the Test build phase, which can be accomplished by misusing the "Run script only when installing" flag.

While these issues aren't actually all the significant, they are somewhat more jarring than you might expect because the experience is so buttery smooth the rest of the time.

Of course, if you want a pure test class, you can do that: just create a class that only has tests. Furthermore, each class is actually asked for a test fixture object for each test. The default is just to return the class object itself, but you can also return an instance, which can have setup and teardown methods the way you expect from xUnit.

The code to enumerate and probe all classes in the system in order to find tests is also interesting, if straightforward, and needs to be updated from time to time, as there are a few class in the system that do not like to be probed.


I'd obviously be happy if people try out MPWTest and find it useful. Or find it not so useful and provide good feedback. I currently have no specific plans for Swift support. Objective-C compatible classes should probably work, the rest of the language probably isn't dynamic enough to support this kind of transparent integration, certainly not without more compiler work. But I am currently investigating Swift interop. more generally, and now that I am no longer restricted to C/Objective-C, more might be possible.

I will almost certainly use the lessons learned here to create linguistically integrated testing in Objective-Smalltalk. As with many other aspects of Objective-Smalltalk, the gap to be bridged for super-smooth is actually not that large.

Another takeaway is that unit testing is really, really simple. In fact, when I asked Kent Beck about it, his response was that everyone should build their own. So go and build wonderful things!

Wednesday, May 13, 2020

Embedding Objective-Smalltalk

Ilja just asked for embedded scripting-language suggestions, presumably for his GarageSale E-Bay listings manager, and so of course I suggested Objective-Smalltalk.

Unironically :-)

This is a bit scary. On the one hand, Objective-Smalltalk has been in use in my own applications for well over a decade and runs the site, both without a hitch and the latter shrugging of a Hacker News "Hug of Death" without even the hint of glitch. On the other hand, well, it's scary.

As for usability, you include two frameworks in your application bundle, and the code to start up and interact with the interpreter or interpreters is also fairly minimal, not least of all because I've been doing so in quite a number of applications now, so inconvenience gets whittled away over time.

In terms of suitability, I of course can't answer that except for saying it is absolutely the best ever. I can also add that another macOS embeddable Smalltalk, FScript, was used successfully in a number of products. Anyway, Ilja was kind enough to at least pretend to take my suggestion seriously, and responded with the following question as to how code would look in practice:

I am only too happy to answer that question, but the answer is a bit beyond the scope of twitter, hence this blog post.

First, we can keep things very close to the original, just replacing the loop with a -select: and of course changing the syntax to Objective-Smalltalk.

runningListings := context getAllRunningListings.
listingsToRelist := runningListings select:{ :listing |
    listing daysRunning > 30 and: listing watchers < 3 .
ebay endListings:listingsToRelist ended:{ :ended | 
     ebay relistListings:ended relisted: { :relisted |
         ui alert:"Relisted: {relisted}".

Note the use of "and:" instead of "&&" and the general reduction of sigils. Although I personally don't like the pyramid of doom, the keyword message syntax makes it significantly less odious.

So much in fact, that Swift recently adopted open keyword syntax for the special case of multiple trailing closures. Of course the mind boggles a bit, but that's a topic for a separate post.

So how else can we simplify? Well, the context seems a little unspecific, and getAllRunningListings a bit specialized, it probably has lots of friends that result from mapping a website with lots of resources onto a procedural interface.

Let's instead use URLs for this, so an ebay: scheme that encapsulates the resources that EBay lets us play with.

listingsToRelist := ebay:listings/running select:{ :listing |
    listing daysRunning > 30 and: listing watchers < 3 .
ebay endListings:listingsToRelist ended:{ :ended | 
     ebay relistListings:ended relisted: { :relisted |
         ui alert:"Relisted {relisted} listings".

I have to admit I also don't really understand the use of callbacks in the relisting process, as we are waiting for everything to complete before moving to the next stage. So let's just implement this as plain sequential code:
listingsToRelist := ebay:listings/running select:{ :listing |
    listing daysRunning > 30 and: listing watchers < 3 .
ended := ebay endListings:listingsToRelist.
relisted := ebay relistListings:ended.
ui alert:"Relisted: {relisted}".

(In scripting contexts, Objective-Smalltalk currently allows defining variables by assigning to them. This can be turned off.)

However, it seems odd and a bit non-OO that the listings shouldn't know how to do stuff, so how about just having relist and end be methods on the listings themselves? That way the code simplifies to the following:

listingsToRelist := ebay:listings/running select:{ :listing |
    listing daysRunning > 30 and: listing watchers < 3 .
ended := listingsToRelist collect end.
relisted := ended collect relist.
ui alert:"Relisted: {relisted}".

If batch operations are typical, it probably makes sense to have a listings collection that understands about those operations:
listingsToRelist := ebay:listings/running select:{ :listing |
    listing daysRunning > 30 and: listing watchers < 3 .
ended := listingsToRelist end.
relisted := ended relist.
ui alert:"Relisted: {relisted}".

Here I am assuming that ending and relisting can fail and therefore these operations need to return the listings that succeeded.

Oh, and you might want to give that predicate a name, which then makes it possible to replace the last gobbledygook with a clean, "do what I mean" Higher Order Message. Oh, and since we've had Unicode for a while now, you can also use '←' for assignment, if you want.

extension EBayListing {
  -<bool>shouldRelist {
      self daysRunning > 30 and: self watchers < 3.

listingsToRelist ← ebay:listings/running select shouldRelist.
ended ← listingsToRelist end.
relisted ← ended relist.
ui alert:"Relisted: {relisted}".

To my obviously completely unbiased eyes, this looks pretty close to a high-level, pseudocode specification of the actions to be taken, except that it is executable.

This is a nice step-by-step script, but with everything so compact now, we can get rid of the temporary variables (assuming the extension) and make it a one-liner (plus the alert):

relisted ← ebay:listings/running select shouldRelist end relist.
ui alert:"Relisted: {relisted}".

It should be noted that the one-liner came to be not as a result of sacrificing readability in order to maximally compress the code, but rather as an indirect result of improving readability by removing the cruft that's not really part of the problem being solved.

Although not needed in this case (the precedence rules of unary message sends make things unambiguous) some pipe separators may make things a bit more clear.

relisted ← ebay:listings/running select shouldRelist | end | relist.
ui alert:"Relisted: {relisted}".

Whether you prefer the one-liner or the step-by-step is probably a matter of taste.

Sunday, April 26, 2020

Faster JSON Support for iOS/macOS, Part 8: Dematerialize All the Things!

In the last exciting instalment of our JSON parsing on macOS/iOS series, we got rid of temporary objects in our parser → builder protocol as much as possible and saw performance soar to 195 MB/s, almost 20 times faster than Swift's JSONDecoder. At this point, creating the objects and adding them to the array take a combined total of 45%, and surely this is something we can't reasonably get rid off. Is it?

Object Streaming

Although the requirement was for objects to be created, nobody said that they all have to exist at the same time. Instead of returning the complete array of parsed objects when done, we can also tell the parser to stream objects to some target as they come in, by setting the streamingThreshold, which says at which depth into the JSON tree we start to use streaming.

    MPWMASONParser *parser=[[MPWMASONParser alloc] initWithClass:[TestClass class]];
    MPWObjectBuilder *builder=(MPWObjectBuilder*)[parser builder];
    [builder setStreamingThreshold:1];
    [builder setTarget:self];
    [parser parsedData:json];

Since we've set ourselves as the streaming target we need to provide a writeObject: method in order to conform to the Streaming protocol.

    if (!first) {
        first=[MPWRusage current];

This method counts the objects and sums up their hi instance variables. It also records the time the first object comes in. How does this do?

Very well, at 192 ms and 229 MB/s. In addition, the time to first object is around 700 ยตs, so less than a millisecond for an application to start receiving usable data and be able to provide feedback to the user.

What's immediately noticeable is that the beginDictionary method is no longer at the top of the profile, it is almost all the way to the bottom with just 2.6% and 4.3ms of the total running time.

How is this actually possible? After all, we still get the 1 million objects, so we still have to create all of them, even if we dole them out in a piecemeal fashion. Or do we?


The MPWObjectCache class (.h .m), keeps a circular buffer of objects that it can reinitialize and reuse after the application code is done with them. It is described in some detail in my book (did I mention the book?), in a part that Pearson has kindly made publicly available.

With such a cache in place, we only actually instantiate the number of objects needed to fill the cache, after that we safely recycle those same objects over and over again, at the cost of a few function calls. If objects are retained, they will not be reused.

Column stores, or structures of arrays

Another neat way of interpreting dematerialization is to store all the data in a columnar data format, a structure of arrays (SoA) instead of Array of Structures (AoS) organisation. (Thanks to Holgi for suggesting this).

For this we need a specific builder (MPWArraysBuilder, .h .m) that maintains a set of (mutable) arrays stored by key. When it receives a value, it looks up the appropriate array by key and adds the value to that array, as follows:

    if ( _arrayMap && keyStr) {
        MPWIntArray *a=OBJECTFORSTRINGLENGTH(_arrayMap, keyStr, keyLen);
        [a addInteger:(int)number];

    if ( _arrayMap && keyStr) {
        NSMutableArray *a=OBJECTFORSTRINGLENGTH(_arrayMap, keyStr, keyLen);
        [a addObject:aString];

For integer values, this would be an MPWIntArray (.h .m) for strings a regular NSMutableArray.

This does even better, at 155 ms / 284 MB/s.

Other options

These are not the only options. For example, it turns out that the protocol connecting parser and builder was not specifically created for this purpose, it actually extends the Streaming protocol to handle disassembled hierarchies. So you can take a tree, pipe it through a pipeline and then accurately reassemble it on the other end.

The protocol is used in Polymorphic Write Streams to enable Standard Object Out shown at DLS '19, with an earlier version presented at Macoun 2018 (German):


However, we are probably hitting diminishing returns at this point, certainly for a proof of concept. There is certainly some more fat to trim, some objc_msgSend()s to IMP-cache away, and going over most of the character input twice is probably something we could avoid.

Apart from further performance improvements, there are also minor details of correctness to take care of, for example handling the JSON escape characters in keys or properly handling hierarchy. These things are not particularly hard, and are handled for XML in the superclass, but do require a bit of thought and effort to complete.

There is also the question of hooking up to Swift in general (a simple attempt failed in getting the right methods), or Codable in particular. The latter would require a somewhat different approach from now: instead of instantiating the object and actively setting its properties, you need to create a temporary structure that you then pass to the object's decoder so it can decode itself. Again, the MAX superclass uses this approach, so it probably won't be too hard to do, with the main trickiness probably in reconciling that more hierarchical/recursive approach with the streaming required by the protocol.

I can (and probably will) also go into a little more analysis of the hows and whys of this approach. So maybe provide some feedback: what would interest you most? Are you interested in a production version of this? Or more extreme optimizations (the ones so far were fairly tame)?


I can help not just Apple, but also you and your company/team with performance and agile coaching, workshops and consulting. Contact me at info at

Saturday, April 25, 2020

Maybe Visual Programming is The Answer. Maybe Not

Whenever discussing problems with programming today and potential solutions, invariably someone will pop up and declare that the problem is obviously the fact that programs are linear text and if only programming were visual, all problems would immediately disappear in some unspecified way.

I understand the attraction of visual programming, particularly for visual thinkers. However, it's not as if this hasn't been tried, with so far very limited success. Brad Myers, in his 1989 paper Taxonomies of Visual Programming gave, along with the titular taxonomy, a non-exhaustive summary of the problems, starting with visual languages in general:

  • Difficulty with large programs or large data. Almost all visual representations are physically larger than the text they replace, so there is often a problem that too little will fit on the screen. This problem is alleviated to some extent by scrolling and various abstraction mechanisms.
  • Need for automatic layout. When the program or data gets to be large, it can be very tedious for the user to have to place each component, so the system should lay out the picture automatically. Unfortunately, for many graphical representations, generating an attractive layout can be difficult, and generating a perfect layout may be intractable. For example, generating an optimal layout of graphs and trees is NP-Complete [95]. More research is needed, therefore, on fast layout algorithms for graphs that have good user interface characteristics, such as avoiding large scale changes to the display after a small edit.
  • Lack of formal specification. Currently, there is no formal way to describe a Visual Language. Something equivalent to the BNFs used for textual languages is needed. This would provide the field with a ‘‘hard science’’ foundation, and may allow tools to be created that will make the construction of editors and compilers for Visual Languages easier. Chang [49] [96], Glinert [97] and Selker [98] have made attempts in this direction, but much more work is needed.
  • Tremendous difficulty in building editors and environments. Most Visual Languages require a specialized editor, compiler, and debugger to be created to allow the user to use the language. With textual languages, conventional, existing text editors can be used and only a compiler and possibly a debugger needs to be written. Currently, each graphical language requires its own editor and environment, since there are no general purpose Visual Language editors. These editors are hard to create because there are no ‘‘editor-compilers’’ or other similar tools to help. The ‘‘compiler-compiler’’ tools used to build compilers for textual languages are also rarely useful for building compilers and interpreters for Visual Languages. In addition, the language designer must create a system to display the pictures from the language, which usually requires low-level graphics programming. Other tools that traditionally exist for textual languages must also be created, including pretty-printers, hard-copy facilities, program checkers, indexers, cross- referencers, pattern matching and searching (e.g., ‘‘grep’’ in Unix), etc. These problems are made worse by the historical lack of portability of most graphics programs.
  • Lack of evidence of their worth. There are not many Visual Languages that would be generally agreed are ‘‘successful,’’ and there is little in the way of formal experiments or informal experience that shows that Visual Languages are good. It would be interesting to see experimental results that demonstrated that visual programming techniques or iconic languages were better than good textual methods for performing the same tasks. Metrics might include learning time, execution speed, retention, etc. Fortunately, preliminary results are appearing for the advantages of using graphics for teaching students how to program [36].
  • Poor representations. Many visual representations are simply not very good. Programs are hard to understand once created and difficult to debug and edit. This is especially true once the programs get to be a non-trivial size.
  • Lack of Portability of Programs. A program written in a textual language can be sent through electronic mail, and used, read and edited by anybody. Graphical languages require special software to view and edit; otherwise they can only be viewed on hard- copy.

In addition, most visual programming languages are "unstructured" in the software engineering sense. They

  • use gotos and explicit transfer of control (often through wires),
  • only have global variables,
  • have no procedural abstraction,
  • if they have procedural abstraction, they may not have parameters for the procedures,
  • have no place for comments.

Furthermore, he notes that most visual languages don't interoperate with programs created in other languages, with some exceptions.

I am not saying that visual programming languages will not and cannot work, in fact, I am quite a fan myself. As these are specific problems, they probably can be solved, and I have a few ideas for solving some of them. However, before making claims that visual programming by itself is obviously and singularly the solution to our computing woes, please mention at least in passing how you've addressed the problems identified by Myers.


Friday, April 24, 2020

Faster JSON Support for iOS/macOS, Part 7: Polishing the Parser

A convenient setback

One thing that you may have noticed last time around was that we were getting the instance variable names from the class, but then also still manually setting the common keys manually. That's a bit of duplicated and needlessly manual effort, because the common keys are exactly those ivar names.

However, the two pieces of information are in different places, the ivar names in the builder and the common strings in the in the parse itself. One way of consolidating this information is by creating a convenience intializer for decoding to objects as follows:

    self = [self initWithBuilder:[[[MPWObjectBuilder alloc] initWithClass:classToDecode] autorelease]];
    [self setFrequentStrings:(NSArray*)[[[classToDecode ivarNames] collect] substringFromIndex:1]];
    return self;

We still compute the ivar names twice, but that's not really such a big deal, so something we can fix later, just like the issue that we should probably be using property names instead of instance variable names that in the case of properties we have to post-process to get rid of the underscores added by ivar synthesis.

With that, the code to parse to objects simplifies to the following, very similar to what you would see in Swift with JSONDecoder.

    MPWMASONParser *parser=[[MPWMASONParser alloc] initWithClass:[TestClass class]];
    NSArray* objResult = [parser parsedData:json];

So, quickly verifying that performance is still the same (always do this!) and...oops! Performance dropped significantly, from 441ms to over 700ms. How could such an innocuous change lead to a 50% performance regression?

The profile shows that we are now spending significantly more time in MPWSmallStringTable's objectForKey: method, where it gets the bytes out of the NSString/CFString, but why that should be the case is a bit mysterious, since we changed virtually nothing.

A little further sleuthing revealed that the strings in question are now instances of NSTaggedPointerString, where previously they were instances of __NSCFConstantString. The latter has a pointer to its byte-oriented character orientation, which it can simply return, while the former cleverly encodes the characters in the pointer itself, so it first has to reconstruct that byte representation. The method of constructing that representation and computing the size of such a representation also appears to be fairly generic and slow via a stream.

This isn't really easy to solve, since the creation of NSTaggedPointerStrring instances is hardwired pretty deep in CoreFoundation with no way to disable this "optimization". Although it would be possible to create a new NSString subclass with a byte buffer, make sure to convert to that class before putting instances in the lookup table, that seems like a lot of work. Or we could just revert this convenience.

Damn the torpedoes and full speed ahead!

Alternatively, we really wanted to get rid of this whole process of packing character data into NSString instances just to immediately unpack them again, so let's leave the regression as is and do that instead.

Where previously the builder had a NSString *key instance vaiable, it now has a char *keyStr and a int keyLen. The string-handling case in the JSON parser is now split betweeen the key and the non-key casse, with the non-key case still doing the conversion, but the key-case directly sending the char* and length to the builder.

			case '"':
                parsestring( curptr , endptr, &stringstart, &curptr  );
				if ( curptr[1] == ':' ) {
                    [_builder writeKeyString:stringstart length:curptr-stringstart];
				} else {
                    curstr = [self makeRetainedJSONStringStart:stringstart length:curptr-stringstart];
					[_builder writeString:curstr];

This means that at least temporarily, JSON escape handling is disabled for keys. It's straightforward to add back, makeRetainedJSONStringStart:length: does all its processing in a character buffer, only converting to a string object at the very end.

    if ( keyStr ) {
        MPWValueAccessor *accesssor=OBJECTFORSTRINGLENGTH(self.accessorTable, keyStr, keyLen);
        [accesssor setValue:aString forTarget:*tos];
    } else {
        [self pushObject:aString];

If there is a key, we are in a dictionary, otherwise an array (or top-level). In the dictionary case, we can now fetch the ValueAccessor via the OBJECTFORSTRINGLENGTH() macro.

The results are encouraging: 299ms, or 147 MB/s.

The MPWPlistBuilder also needs to be adjusted: as it builds and NSDictionary and not an object, it actually needs the NSString key, but the parser no longer delivers those. So it just creates them on the fly:

    NSString *key=nil;
    if ( keyStr) {
        if ( _commonStrings ) {
            key=OBJECTFORSTRINGLENGTH(_commonStrings, keyStr, keyLen);
        if ( !key ) {
            key=[[[NSString alloc] initWithBytes:keyStr length:keyLen encoding:NSUTF8StringEncoding] autorelease];
    return key;

Surprisingly, this makes the dictionary parsing code slightly faster, bringing up to par with NSSJSSONSerialization at 421ms.

Eliminating NSNumber

Our use of NSNumber/CFNumber values is very similar to our use of NSString for keys: the parser wraps the parsed number in the object, the builder then unwraps it again.

Changing that, initially just for integers, is straightforward: add an integer-valued message to the builder protocol and implement it.

    if ( keyStr ) {
        MPWValueAccessor *accesssor=OBJECTFORSTRINGLENGTH(_accessorTable, keyStr, keyLen);
        [accesssor setIntValue:number forTarget:*tos];
    } else {
        [self pushObject:@(number)];

The actual integer parsing code is not in MPWMASONParser but its superclasss, and as we don't want to touch that for now, let's just copy-paste that code, modifying it to return a C primitive type instead of an object.

-(long)longElementAtPtr:(const char*)start length:(long)len
    long val=0;
    int sign=1;
    const char *end=start+len;
    if ( start[0] =='-' ) {
    } else if ( start[0]=='+' ) {
    while ( start < end && isdigit(*start)) {
        val=val*10+ (*start)-'0';
    return val;

I am sure there are better ways to turn a string into an int, but it will do for now. Similarly to the key/string distinction, we now special case integers.
                if ( isReal) {
                    number = [self realElement:numstart length:curptr-numstart];

                    [_builder writeString:number];
                } else {
                    long n=[self longElementAtPtr:numstart length:curptr-numstart];
                    [_builder writeInteger:n];

Again, not pretty, but we can clean it up later.

Together with using direct instance variable access instead of properties to get to the accessorTable, this yields a very noticeable speed boost:

229 ms, or 195 MB/s.



What happened here? Just random hacking on the profile and replacing nice object-oriented programming with ugly but fast C?

Although there is obviously some truth in that, profiles were used and more C primitive types appeared, I would contend that what happened was a move away from objects, and particularly away from generic and expensive Foundation objects ("Foundation oriented programming"?) towards message oriented programming.

I'm sorry that I long ago coined the term "objects" for this topic because it gets many people to focus on the lesser idea.

The big idea is "messaging" -- that is what the kernal of Smalltalk/Squeak is all about (and it's something that was never quite completed in our Xerox PARC phase). The Japanese have a small word -- ma -- for "that which is in between" -- perhaps the nearest English equivalent is "interstitial". The key in making great and growable systems is much more to design how its modules communicate rather than what their internal properties and behaviors should be.

It turns out that message oriented programming (or should we call it Protocol Oriented Programming?) is where Objective-C shines: coarse-grained objects, implemented in C, that exchange messages, with the messages also as primitive as you can get away with. That was the idea, and when you follow that idea, Objective-C just hums, you get not just fast, but also flexible and architecturally nicely decoupled objects: elegance.

The combination of objects + primitive messages is very similar to another architecturally elegant and productive style: Unix pipes and filters. The components are in C and can have as rich an internal structure as you want, but they have to talk to each other via byte-streams. This can also be made very fast, and also prevents or at least reduces coupling between the components.

Another aspect is the tension between an API for use and an API for reuse, particularly within the constraints of call/return. When you get tasked with "Create a component + API for parsing JSON", something like NSJSONSerialization is something you almost have to come up with: feed it JSON, out comes parsed JSON. Nothing could be more convenient to use for "parsing JSON".

MPWMASONParser on the other hand is not convenient at all when viewed in isolation, but it's much more capable of being smoothly integrated into a larger processing chain. And most of the work that NSJSONSerialization did in the name of convenience is now just wasted, it doesn't make further processing any easier but sucks up enormous amounts of time.

Anyway, let's look at the current profile:

First, times are now small enough that high-resolution (100ยตs) sampling is now necessary to get meaningful results. Second, the NSNumber/CFNumber and NSString packing and unpacking is gone, with an even bigger chunk of the remaining time now going to object creation. objc_msgSend() is now starting to actually become noticeable, as is the (inefficient) character level parsing. The accessors of our test objects start to appear, if barely.

With the work we've done so far, we've improved speed around 5x from where we started, and at 195 MB/s are almost 20x faster than Swift's JSONDecoder.

Can we do better? Stay tuned.


I can help not just Apple, but also you and your company/team with performance and agile coaching, workshops and consulting. Contact me at info at

Monday, April 20, 2020

Somewhat Faster JSON Support for iOS/macOS, Part 6: Cutting KVC out of the Loop

Last time, we actually made some significant headway by taking advantage of the dematerialisation of the plist intermediate representation. So instead of first producing an array of dictionaries, we went directly from JSON to the final object representation.

This got us down from around 1.1 seconds to a little over 600 milliseconds.

It was accomplished by using the Key Value Coding method setValue:forKey: to directly set the attributes of the objects from the parsed JSON. Oh, and instantiating those objects in the first place, instead of dictionaries.

That this should be so much faster than most other methods, for example beating Swift's JSONDecoder() by a cool 7x, is a little surprising, given that KVC is, as I mentioned in the first article of the series, the slowest mechanism for getting data in and out of objcets short of deliberate Rube Goldber Mechanisms.

What is KVC and why is it slow?

Key Value Coding was a core part of NeXT's Enterprise Object Framework, introduced in 1994.
Key-value coding is a data access mechanism in which the properties of an object are accessed indirectly by key or name, rather than directly as fields or by invocation of accessor methods. It is used throughout Enterprise Objects but is perhaps most useful to you when accessing data in relationships between enterprise objects.

Key-value coding enables the use of keypaths to traverse relationships. For example, if a Person entity has a relationship called toPhoto whose destination entity (called PersonPhoto) contains an attribute called photo, you could access that data by traversing the keypath from within a Person object.

Keypaths are just one way key-value coding is an invaluable feature of Enterprise Objects. In general, though, it is most useful in providing a consistent way to access an object's data. Rather than needing to know if an object's data members have accessor methods, what the names of those accessor methods are, or if the data is accessible through fields, all you need to know are the keys that represent an object’s data. Key-value coding automatically finds the data, regardless of how the object provides its data. In this context, key-value coding satisfies the classic design pattern principle to “encapsulate the things that varies.”

It still is an extremely powerful programming technique that lets us write algorithms that work generically with any object properties, and is currently the basis for CoreData, AppleScript support, Key Value Observing and Bindings. (Though I am somewhat skeptical of some of these, not least for performance reasons, see The Siren Call of KVO and (Cocoa) Bindings). It was also part of the inspiration for Polymorphic Identifiers.

The core of KVC are the valueForKey: and setValue:forKey: messages, which have default implementations in NSObject. These default implementations take the NSString key, derive an accessor message from that key and then send the message, either setting or returning a value. If the value that the underlying message takes/returns is a non-object type, then KVC wraps/unwraps as necessary.

If this sounds expensive, then that's because it is. To derive the set accessor from the key, the first character of the key has to be capitalized, the the string "set" prepended and the string converted to an Objective-C selector (SEL). In theory, this has to be done on every call to one of the KVC methods, and it has to be done with NSString objects, which do a fantastic job of representing human-visible text, but are a bit heavy-weight for low-level work.

Doing the full computation on every invocation would be way too expensive, so Apple caches some of the intermediate results. As there is no obvious place to put those intermediate results, they are placed in global hash tables, keyed by class and property/key name. However, even those lookups are still significantly more expensive than the final set or get property accesss, and we have to do multiple lookups. Since theses tables have to be global, locking is also required.


All this expense could be avoided if we had a custom object to mediate the access, rather than a naked NSString. That object could store those computed values, and then provide fast and generic access to arbitrary properties. Enter MPWValueAccesssor (.h .m).

A word of warning: unlike MPWStringtable, MPWValueAccesssor is mostly experimental code. It does have tests and largely works, but it is incomplete in many ways and also contains a bunch of extra and probably extraneous ideas. It is sufficient for our current purpose.

The core of this class is the AccessPathComponent struct.

typedef struct {
    Class   targetClass;
    int     targetOffset;
    SEL     getSelector,putSelector;
    IMP0    getIMP;
    IMP1    putIMP;
    id      additionalArg;
    char    objcType;
} AccessPathComponent;

This struct contains a number of different ways of getting/setting the data:
  1. the integer offset into the object where the ivar is located
  2. a pair of Objective-C selectors/message names, one for getting, one for setting.
  3. a pair of function pointers to the Objective-C methods that the respective selectors resolve to
  4. the additional arg is the key, to be used for keyed access
The getIMP and putImp are initialized to objc_msgSend(), so they can always be used. If we bind the ValueAccessor to a class, those function pointers get resolved to the actual getter/setter methods. In addition the objcType gets set to the type of the instance variable, so we can do automatic conversions like KVC. (This was some code I actually had to add between the last instalment and the current one.)

The key takeaway is that all the string processing and lookup that KVC needs to do on every call is done once during initialization, after that it's just a few messages and/or pre-resolved function calls.

Hooking up the ValueAccessor

Adapting the MPWObjectBuilder (.h .m) to use MPWValueAccessor was much easier than I had expected. Thee following shows the changes made:
@property (nonatomic, strong) MPWSmallStringTable *accessorTable;


    NSArray *ivars=[theClass ivarNames];
    ivars=[[ivars collect] substringFromIndex:1];
    NSMutableArray *accessors=[NSMutableArray arrayWithCapacity:ivars.count];
    for (NSString *ivar in ivars) {
        MPWValueAccessor *accessor=[MPWValueAccessor valueForName:ivar];
        [accessor bindToClass:theClass];
        [accessors addObject:accessor];
    MPWSmallStringTable *table=[[[MPWSmallStringTable alloc] initWithKeys:ivars values:accessors] autorelease];

-(void)writeObject:anObject forKey:aKey
    MPWValueAccessor *accesssor=[self.accessorTable objectForKey:aKey];
    [accesssor setValue:anObject forTarget:*tos];

The bulk of the changes come as part of the new -setupAccessors: method. It first asks the class what its instance variables are, creates a value accessor for that instance variabl(-name), binds the accessor to the class and finally puts the accessors in a lookup table keyed by name.

The -writeObject:forKey: method is modified to look up and use a value accessor instead of using KVC.


The parsing driver code didn't have to be changed, re-running it on our non-representative 44 MB JSON file yields the following time:

441 ms.

Now we're really starting to get somewhere! This is just shy of 100 MB/s and 10x faster then Swift's JSONDecoder, and within 5% of raw NSJSONSerialization.

Analysis and next steps

Can we do better? Why yes, glad you asked. Let's have a look at the profile.

First thing to note is that object-creation (beginDictionary) is now the #1 entry under the parse, as it should be. This is another indicator that we are not just moving in the right direction, but also closing in on the endgame.

However, there is still room for improvement. For example, although actually searching the SmallStringTable for the ValueAccessor (offsetOfCStringWithLengthInTableOfLength()) takes only 2.7% of the time, about the same as getting the internal char* out of a CFString via the fast-path (CFStringGetCStringPtr()), the total time for the -objectForKey: is a multiple of that, at 13%. This means that unwrapping the NSString takes more time than doing the actual work. Wrapping the char* and length into an NSString also takes significant time, and all of this work is redundant...we would be better of just passing along the char* and length.

A similar wrap/unwrap situation occurs with integers, which we first turn into NSNumbers, only to immediately get the integer out again so we can set it.

objc_msgSend() also starts getting noticeable, so looking at a bit of IMP-caching and just eliminating unnecessary indirection also seems like a good idea.

That's another aspect of optimization work: while the occasional big win is welcome, getting to truly outstanding performance means not being satisfied with that, but slogging through all the small-ish seeming detail.


I can help not just Apple, but also you and your company with performance and agile coaching, workshops and consulting. Contact me at info at


Somewhat Less Lethargic JSON Support for iOS/macOS, Part 1: The Status Quo
Somewhat Less Lethargic JSON Support for iOS/macOS, Part 2: Analysis
Somewhat Less Lethargic JSON Support for iOS/macOS, Part 3: Dematerialization
Equally Lethargic JSON Support for iOS/macOS, Part 4: Our Keys are Small but Legion
Less Lethargic JSON Support for iOS/macOS, Part 5: Cutting out the Middleman
Somewhat Faster JSON Support for iOS/macOS, Part 6: Cutting KVC out of the Loop
Faster JSON Support for iOS/macOS, Part 7: Polishing the Parser
Faster JSON Support for iOS/macOS, Part 8: Dematerialize All the Things!
Beyond Faster JSON Support for iOS/macOS, Part 9: CSV and SQLite

Friday, April 17, 2020

Less Lethargic JSON Support for iOS/macOS, Part 5: Cutting out the Middleman

After initially disappointing results trying to get to faster JSON processing (parsing, for now), we finally got parity with NSJSONSerialization, more or less, in the last instalment, with the help of MPWSmallStringTable to unique our strings before turning them into objects, string creation being surprisingly expensive even for tagged pointer strings.

Cutting out the Middleman: ObjectBuilder

In the first instalment of this series, we saw that we could fairly trivially create objects from the plist created by NSJSONSerialization.

MPWObjectBuilder (.h .m) is a subclass of MPWPlistBuilder that changes just a few things: instead of creating dictionaries, it creates objects, and instead of using -setObject:forKey: to set values in that dictionary, it uses the KVC message -setValue:forKey: (vive la petite diffรฉrence!) to set values in that object.

@implementation MPWObjectBuilder

    self=[super init];
    self.cache=[MPWObjectCache cacheWithCapacity:20 class:theClass];
    return self;

    [self pushContainer:GETOBJECT(_cache) ];

-(void)writeObject:anObject forKey:aKey
    [*tos setValue:anObject forKey:aKey];

That's it! Well, all that need concern us for now, the actual class has some additional features that don't matter here. The _tos instance variable is the top of a stack that MPWPlistBuilder maintains while constructing the result. The MPWObjectCache is just a factory for creating objects.

So let's fire it up and see what it can do!

    NSArray *keys=@[ @"hi", @"there", @"comment"];
    MPWMASONParser *parser=[MPWMASONParser parser];
    MPWObjectBuilder *builder=[[MPWObjectBuilder alloc] initWithClass:[TestClass class]];
    [parser setBuilder:builder];
    [parser setFrequentStrings:keys];
    NSArray* objResult = [parser parsedData:json];
    NSLog(@"MPWMASON %@ with %ld elements",[objResult firstObject],[objResult count]);

Not the most elegant code in the universe, and not a complete parser by an stretch of the imagination, but workable.

Result: 621 ms.

Not too shabby, only 50% slower than baseNSJSONSerialization on our non-representative 44MB JSON file, but creating the final objects, instead of just the intermediate representation, and arround 7x faster than Apple's JSONDecoder.

Although still below 100 MB/s and nowhere near 2.5 GB/s we're also starting to close in on the performance level that should be achievable given the context, with 140ms for basic object creation and 124ms for a mostly empty parse.

Analysis and next steps

Ignoring such trivialities as actually being useful for more than the most constrained situations (array of single kind of object), how can we improve this? Well, make it faster, of course, so let's have a look at the profile:

As expected, the KVC code is now the top contributor, with around 40% of total runtime. (The locking functions that show up as siblings of -setValue:forKey: are almost certainly part of that implementation, this slight misattribution of times is something you should generally expect and be aware of with Instruments. I am guessing it has to do with missing frame-pointers (-fomit-frame-pointer) but don't really feel any deep urge to investigate, as it doesn't materially impact the outcome of the analysis.

I guess that's another point: gather enough data to inform your next step, certainly no less, but also no more. I see both mistakes, the more common one definitely being making things "fast" without enough data. Or any, for that matter. If I had a €uro for every project that claims high performance without any (comparative) benchmarking, simply because they did something the authors think should be fast, well, you know, ....

The other extreme is both less common and typically less bad, as at least you don't get the complete nonsense of performance claims not backed by any performance testing, but running a huge battery of benchmarks on every step of an optimization process is probably going to get in the way of achieving results, and yes, I've seen this in practice.

So next we need to remove KVC.


Somewhat Less Lethargic JSON Support for iOS/macOS, Part 1: The Status Quo
Somewhat Less Lethargic JSON Support for iOS/macOS, Part 2: Analysis
Somewhat Less Lethargic JSON Support for iOS/macOS, Part 3: Dematerialization
Equally Lethargic JSON Support for iOS/macOS, Part 4: Our Keys are Small but Legion
Less Lethargic JSON Support for iOS/macOS, Part 5: Cutting out the Middleman
Somewhat Faster JSON Support for iOS/macOS, Part 6: Cutting KVC out of the Loop
Faster JSON Support for iOS/macOS, Part 7: Polishing the Parser
Faster JSON Support for iOS/macOS, Part 8: Dematerialize All the Things!
Beyond Faster JSON Support for iOS/macOS, Part 9: CSV and SQLite

Thursday, April 16, 2020

Equally Lethargic JSON Support for iOS/macOS, Part 4: Our Keys are Small but Legion

In our last instalment, we started implementing our JSON parser with lots of good ideas, such as dematerialization via a property list protocol, but immediately fell flat on our face with our code being 50% slower than NSJSONSerialization. And what's worse, there wasn't an obvious way out, as the bulk of the time was spent in Apple code.

Nobody said this was going to be easy.


Let's have another look at the profile:

The top 4 consumers of CPU are -setObject:forKey:, string creation, dictionary creation and message sending. I don't really know what to do about either creating those dictionaries we have to create or setting their contents, so what about string creation?

Although making string creation itself faster is unlikely, what we can do is reduce the number of strings we create: since most of our JSON payload consists of objects nรฉ dictionaries, the vast majority of our strings is actually going to be string keys. So they will come from a small set of known strings and be on the small-ish side. Particularly the former suggests that we should re-use keys, rather than creating multiple new copies.

The usual way to look up something with a known key is an NSDictionary, but alas that would require the keys we look up to already be objects, meaning we would have to create string objects to look up our sting object values, rather defeating the purpose of the exercise.

What we would need is a way of looking up objects by raw C-Sting, an unadorned char*. Fortunately, I've been here before, so the required class has been in MPWFoundation for a little over 13 years. (What's the "Trump smug face emoticon?)


The MPWSmallStringTable (.h / .m ) class is exactly what it says on the tin: a table for looking up objects by (small) string keys. And it is accessible by char* (+length, don't want to require NUL termination) in addition to string objects.

Quite a bit of work went into making this fast, both the implementation and the interface. It is not a hash table, it compares chars directly, using indexing and bucketing to expend as little work as possible while discarding non-matching strings.

In fact, since performance is its primary reason for existing, its unit tests include performance comparisons against an NSDictionary with NSString keys, which currently clock in at 5-8x faster.

The interface includes two macros: OBJECTFORSTRINGLENGTH() and OBJECTFORCONSTANTSTRING(). You need to give the former a length, the latter figures the size out compile time using the sizeof operator, which really does return the length of string constants. Don't use it with non-constant strings (so char*) as there sizeof will return the size of the pointer.

Avoiding Allocation of Frequent Strings

With MPWSmallStringTable at hand, we can now use it in MPWMASONParser to look up common strings like our keys without allocating them.

The -setFrequentStrings: method we saw declared in the interface takes an array of strings, which the parser turns into a string table mapping from the C-Sting versions of those to the NSString version.

	[self setCommonStrings:[[[MPWSmallStringTable alloc] initWithKeys:strings values:strings] autorelease]];

The method that is supposed to create string objects from char*s starts as follows:
-(NSString*)makeRetainedJSONStringStart:(const char*)start length:(long)len
	NSString *curstr;
	if ( commonStrings  ) {
		NSString *res=OBJECTFORSTRINGLENGTH( commonStrings, start, len );
		if ( res ) {
			return [res retain];

So we first check the common stings table, and only if we don't find it there do we drop down to the code to allocated the string. (Yeah, the -retain is probably questionable, though currently necessary)

Trying it out

Now all we need to do is tell the parser about those common strings before we ask it to parse JSON.
    NSArray *keys=@[ @"hi", @"there", @"comment"];
    MPWMASONParser *parser=[MPWMASONParser parser];
    [parser setFrequentStrings:keys];
    NSArray* plistResult = [parser parsedData:json];
    NSLog(@"MPWMASON %@ with %ld dicts",[plistResult firstObject],[plistResult count]);

While this seems a bit tacky, telling a JSON parser what to expect beforehand at least a little seems par for the course, so whatever.

How does that fare? Well, 440ms, which is 180ms faster than before and anywhere from as fast as NSJSONSerialization to 5% slower. Good enough for now.

This result is actually a bit surprising, because the keys that are created by both NSJSONSerialization and MPWMASONParser happen to be instances of NSTaggedPointerString. These strings do not get allocated on the heap, the entire string contents are cleverly encoded in the object pointer itself. Creating these should only be a couple of shifts and ORs, but apparently that takes (significantly) longer than doing the lookup, or more likely CF adds other overhead. This was certainly the case with the original tagged CFNumber, where just doing the shift+OR yourself was massively faster than calling CFNumberCreate().

What next?

Having MPWSmallStringTable immediately suggests ways of tackling the other expensive parts we identified in the profile, -setObject:forKey: and dictionary creation: use a string table with pre-computed key space, then set the objects via char* keys.

Another alternative is to use the MPWXmlAttributes class from MAX, which is optimized for the parsing and use-once case.

However, all this loses sight of the fact that we aren't actually interested in producing a plist. We want to create objects, ideally without creating that plist. This is a common pitfall I see in optimization work: getting so caught up in the details (because there is a lot of detail, and it tends to be important) that one loses sight of the context, the big picture so to speak.

Can this, creating objects from JSON, now be done more quickly? That will be in the next instalment. But as a taste of what's possible, we can just set the builder to nil, in order to see how the parser does when not having to create a plist.

The result: 160ms.

So yes, this can probably work, but it is work.


Somewhat Less Lethargic JSON Support for iOS/macOS, Part 1: The Status Quo
Somewhat Less Lethargic JSON Support for iOS/macOS, Part 2: Analysis
Somewhat Less Lethargic JSON Support for iOS/macOS, Part 3: Dematerialization
Equally Lethargic JSON Support for iOS/macOS, Part 4: Our Keys are Small but Legion
Less Lethargic JSON Support for iOS/macOS, Part 5: Cutting out the Middleman
Somewhat Faster JSON Support for iOS/macOS, Part 6: Cutting KVC out of the Loop
Faster JSON Support for iOS/macOS, Part 7: Polishing the Parser
Faster JSON Support for iOS/macOS, Part 8: Dematerialize All the Things!
Beyond Faster JSON Support for iOS/macOS, Part 9: CSV and SQLite