Showing posts with label Objective-C. Show all posts
Showing posts with label Objective-C. Show all posts

Wednesday, June 7, 2023

Mojo is a much better "Objective-C without the C" than Swift ever was

One of the primary things that people don't understand about Objective-C is that it is a solution of the two language problem, or more precisely a generalisation of the two language problem to the scripted component pattern.

The scripted component pattern itself is a (common) solution to the problem, first identified in the 70s that programming-in-the-large is not the same as programming-in-the-small, that module implementation languages are not necessarily suitable as module interconnection languages.

And so we have all sorts of flexible connection languages, often interpreted (aka glue, scripting, and orchestration languages), starting with the Unix shell, in addition to fast, compiled component languages such as C, C++ and Rust, and a system will usually incorporate at least one of each kind.

But then you run into the two language problem: you have to deal with these two distinct languages, with how they integrate, and with the boundaries of the integration often not matching up very well with the boundaries of the problem you're trying to solve.

Objective-C solved the two language problem by just jamming the two languages into one: Smalltalk for the scripting/integration and C for the component language. Interoperability is smooth and at the statement level, thougha there is some friction due to overlaps caused by integrating two existing languages that were not designed to be integrated.

Mojo essentially uses the Objective-C approach of jamming the two languages into one. Except it doesn't repeat Objective-C's mistake of using the component language as the base (which, inexplicably, Swift didn't just repeat, but actually doubled down on by largely deprecating objects). The reason this is a mistake is that it turns out that the connection language is actually the more general one, the component language is a specialisation of the connection language.

With this realisation, Mojo's approach of making the connection language the base language make sense. In addition, the fact that the component language is a specialisation also means that you don't actually need to jam a full second language into your base, a few syntactic markers to to indicate the specialisations are sufficient.

This is pretty much exactly stage 2 of the 4 stages of Objective-S, so I think they are using exactly the right approach for this. Except of course for the use of Python as the base instead of Smalltalk, which is a pragmatic choice given what they are trying to accomplish, but means your connection language is unduly limited.

Objective-S has the same basic structure, but with a much more capable connection language as the base.

Monday, June 20, 2022

Blackbird: A reference architecture for local-first connected mobile apps

Wow, what a mouthful! Although this architecture has featured in a number of my other writings, I haven't really described it in detail by itself. Which is a shame, because I think it works really well and is quite simple, a case of Sophisticated Simplicity.

Why a reference architecture?

The motivation for creating and now presenting this reference architecture is that the way we build connected mobile apps is broken, and none of the proposed solutions appear to help. How are they broken? They are overly complex, require way too much code, perform poorly and are unreliable.

Very broadly speaking, these problems can be traced to the misuse of procedural abstraction for a problem-space that is broadly state-based, and can be solved by adapting a state-based architectural style such as in-process REST and combining it with well-known styles such as MVC.

More specifically, MVC has been misapplied by combining UI updates with the model updates, a practice that becomes especially egregious with asynchronous call-backs. In addition, data is pushed to the UI, rather than having the UI pull data when and as needed. Asynchronous code is modelled using call/return and call-backs, leading to call-back hell, needless and arduous transformation of any dependent code into asynchronous code (see "what color is your function") that is also much harder to read, discouraging appropriate abstractions.

Backend communication is also an issue, with newer async/await implementations not really being much of an improvement over callback-based ones, and arguably worse in terms of actual readability. (They seem readable, but what actually happens is different  enough that the simplicity is deceptive).

Overview

The overall architecture has four fundamental components:
  1. The model
  2. The UI
  3. The backend
  4. The persistence
The main objective of the architecture is to keep these components in sync with each other, so the whole thing somewhat resembles a control loop architecture: something disturbs the system, for example the user did something in the UI, and the system responds by re-establishing equilibrium.

The model is the central component, it connects/coordinates all the pieces and is also the only one directly connected to more than one piece. In keeping with hexagonal architecture, the model is also supposed to be the only place with significant logic, the remainder of the system should be as minimal, transparent and dumb as possible.

memory-model := persistence.
persistence  |= memory-model.
ui          =|= memory-model. 
backend     =|= memory-model.

Graphically:

Elements

Blackbird depends crucially on a number of architectural elements: first are stores of the in-process REST architectural style. These can be thought of as in-process HTTP servers (without the HTTP, of course) or composable dictionaries. The core store protocol implements the GET, PUT and DELETE verbs as messages.

The role of URLs in REST is taken by Polymorphic Identifiers. These are objects that can reference identify values in the store, but are not direct pointers. For example, they need to be a able to reference objects that aren't there yet.

Polymorphic Identifiers can be application-specific, for example they might consist just of a numeric id,

MVC

For me, the key part of the MVC architectural style is the decoupling of input processing and resultant output processing. That is, under MVC, the view (or a controller) make some change to the model and then processing stops. At some undefined later time (could be synchronous, but does not have to be) the Model informs the UI that it has changed using some kind of notification mechanism.

In Smalltalk MVC, this is a dependents list maintained in the model that interested views register with. All these views are then sent a #changed message when the model has changed. In Cocoa, this can be accomplished using NSNotificationCenter, but really any kind of broadcast mechanism will do.

It is then the views' responsibility to update themselves by interrogating the model.

For views, Cocoa largely automates this: on receipt of the notification, the view just needs invalidate itself, the system then automatically schedules it for redrawing the next time through the event loop.

The reason the decoupling is important to maintain is that the update notification can come for any other reason, including a different user interaction, a backend request completing or even some sort of notification or push event coming in remotely.

With the decoupled M-V update mechanism, all these different kinds of events are handled identically, and thus the UI only ever needs to deal with the local model. The UI is therefore almost entirely decoupled from network communications, we thus have a local-first application that is also largely testable locally.

Blackbird refines the MVC view update mechanism by adding the polymorphic identifier of the modified item in question and placing those PIs in a queue. The queue decouples model and view even more than in the basic MVC model, for example it become fairly trivial to make the queue writable from any thread, but empty only onto the main thread for view updates. In addition, providing update notifications is no longer synchronous, the updater just writes an entry into the queue and can then continue, it doesn't wait for the UI to finish its update.

Decoupling via a queue in this way is almost sufficient for making sure that high-speed model updates don't overwhelm the UI or slow down the model. Both these performance problems are fairly rampant, as an example of the first, the Microsoft Office installer saturates both CPUs on a dual core machine just painting its progress bar, because it massively overdraws.

An example of the second was one of the real performance puzzlers of my career: an installer that was extremely slow, despite both CPU and disk being mostly idle. The problem turned out to be that the developers of that installer not only insisted on displaying every single file name the installer was writing (bad enough), but also flushing the window to screen to make sure the user got a chance to see it (worse). This then interacted with a behavior of Apple's CoreGraphics, which disallows screen flushes at a rate greater than the screen refresh rate, and will simply throttle such requests. You really want to decouple your UI from your model updates and let the UI process updates at its pace.

Having polymorphic identifiers in the queue makes it possible for the UI to catch up on its own terms, and also to remove updates that are no longer relevant, for example discarding duplicate updates of the same element.

The polymorphic identifier can also be used by views in order to determine whether they need to update themselves, by matching against the polymorphic identifier they are currently handling.

Backend communication

Almost every REST backend communication code I have seen in mobile applications has created "convenient" cover methods for every operation of every endpoint accessed by the application, possibly automatically generated.

This ignores the fact that REST only has a few verbs, combined with a great number of identifiers (URLs). In Blackbird, there is a single channel for backend communication: a queue that takes a polymorphic identifier and an http verb. The polymorphic identifier is translated to a URL of the target backend system, the resulting request executed and when the result returns it is placed in the central store using the provided polymorphic identifier.

After the item has been stored, an MVC notification with the polymorphic identifier in question is enqueued as per above.

The queue for backend operations is essentially the same one we described for model-view communication above, for example also with the ability to deduplicate requests correctly so only the final version of an object gets sent if there are multiple updates. The remainder of the processing is performed in pipes-and-filters architectural style using polymorphic write streams.

If the backend needs to communicate with the client, it can send URLs via a socket or other mechanism that tells the client to pull that data via its normal request channels, implementing the same pull-constraint as in the rest of the system.

One aspect of this part of the architecture is that backend requests are reified and explicit, rather than implicitly encoded on the call-stack and its potentially asynchronous continuations. This means it is straightforward for the UI to give the user appropriate feedback for communication failures on the slow or disrupted network connections that are the norm on mobile networks, as well as avoid accidental duplicate requests.

Despite this extra visibility and introspection, the code required to implement backend communications is drastically reduced. Last not least, the code is isolated: network code can operate independently of the UI just as well as the UI can operate independently of the network code.

Persistence

Persistence is handled by stacked stores (storage combinators).

The application is hooked up to the top of the storage stack, the CachingStore, which looks to the application exactly like the DictStore (an in-memory store). If a read request cannot be found in the cache, the data is instead read from disk, converted from JSON by a mapping store.

For testing the rest of the app (rather than the storage stack), it is perfectly fine to just use the in-memory store instead of the disk store, as it has the same interface and behaves the same, except being faster and non-persistent.

Writes use the same asynchronous queues as the rest of the system, with the writer getting the polymorphic identifiers of objects to write and then retrieving the relevant object(s) from the in-memory store before persisting. Since they use the same mechanism, they also benefit from the same uniquing properties, so when the I/O subsystem gets overloaded it will adapt by dropping redundant writes.

Consequences

With the Blackbird reference architecture, we not only replace complex, bulky code with much less and much simpler code, we also get to reuse that same code in all parts of the system while making the pieces of the system highly independent of each other and optimising performance.

In addition, the combination of REST-like stores that can be composed with constraint- and event-based communication patterns makes the architecture highly decoupled. In essence it allows the kind of decoupling we see in well-implemented microservices architectures, but on mobile apps without having to run multiple processes (which is often not allowed).

Sunday, July 25, 2021

Deleting Code to Double the Performance of my Trivial Objective-S Tasks Backend

About two months ago, I showed a trivial tasks backend for a hypothetical ToDoMVC app. At the time, I noted that the performance was pretty insane for something written in a (slow) scripting language: 7K requests per second when fetching a single task.

That was using an encoder method (that writes key/value pairs to the JSON encoder) written in Objective-S, and I wondered how much faster it would go if that was no longer the case. Twice as fast, it turns out.

Yesterday, I wrote about tuning the Objective-S's SQLite insert performance to around 130M rows/minute, coincidentally also for a simple tasks schema. One part of that performance story was the fact that the encoder method (writing key/value pairs to the SQLite encoder) was generated by pasting together Objective-C blocks and installing the whole thing as an Objective-C method. No interpretation, except for calling a series of blocks stored in an NSArray. I had completely forgotten about the hand-written Objective-S encoder method in the back-end's Task class! Since generation is automatic, but won't override an already existing method, all I had to do in order to get the better performance was delete the old method.


> wrk -c 1 -t 1 http://localhost:8082/tasks 
Running 10s test @ http://localhost:8082/tasks
  1 threads and 1 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency    66.60us    9.69us   1.08ms   96.72%
    Req/Sec    14.95k   405.55    15.18k    98.02%
  150275 requests in 10.10s, 30.67MB read
Requests/sec:  14879.22
Transfer/sec:      3.04MB
> curl  http://localhost:8082/tasks         
[{"id":1,"done":0,"title":"Clean Room"},{"id":2,"done":1,"title":"Check Twitter"}]%

More than twice the performance, and that while fetching two tasks instead of just one, so around 30K tasks/second! (And yes, I checked that I wasn't hitting a 404...).

So what's the performance if we actually fetch more than a minimal number of tasks? For 128 tasks, 64x more than before, it's still around 9K requests/s, so most of the time so far was per-request overhead. At this point we are serving a little over 1M tasks/s:


> wrk -c 1 -t 1 'http://localhost:8082/tasks/' 
Running 10s test @ http://localhost:8082/tasks/
  1 threads and 1 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency   112.13us   76.17us   5.57ms   99.63%
    Req/Sec     9.05k   397.99     9.21k    97.03%
  90923 requests in 10.10s, 483.41MB read
Requests/sec:   9002.44
Transfer/sec:     47.86MB

If memory serves, that was around the rate we were seeing with the Wunderlist backend when we had a couple of million users, not that these are comparable in any meaningful way. For 1024 tasks there's a significant drop to slightly above 1.8K requests/s, with the task-rate almost doubling to 1.8M/s:


> wrk -c 1 -t 1 'http://localhost:8082/tasks/' 
Running 10s test @ http://localhost:8082/tasks/
  1 threads and 1 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency   552.06us   62.77us   1.84ms   81.08%
    Req/Sec     1.82k    52.95     1.89k    90.10%
  18267 requests in 10.10s, 778.36MB read
Requests/sec:   1808.59
Transfer/sec:     77.06MB

UPDATE:
Of course, those larger request sizes also see a much larger increase in performance than 2x. With the old code, the 128 task case clocks in at 147 requests/s and the 1024 task case at 18 requests/s, at which point it's a 100x improvement. So gives you an idea just how slow my Objective-S interpreter is.

Saturday, July 24, 2021

Inserting 130M SQLite Rows per Minute...from a Scripting Language

The other week, I stumbled on the post Inserting One Billion Rows in SQLite Under A Minute, which was a funny coincidence, as I was just in the process of giving my own SQLite/Objective-S adapter a bit of tune-up. (The post's title later had "Towards" prepended, because the author wasn't close to hitting that goal).

This SQLite adapater was a spin-off of my earlier article series on optimizing JSON performance, itself triggered by the ludicrously bad performance of Swift Coding at this rather simple and relevant task. To recap: Swift's JSON coder clocked in at about 10MB/s. By using a streaming approach and a bit of tuning, we got that to around 200MB/s.

Since then, I have worked on making Objective-S much more useful for UI work, with the object-literal syntax making defining UIs as convenient as the various "declarative" functional approaches such as React or SwiftUI. Except it is still using the same AppKit or UIKit objects we know and love, and doesn't force us to embrace the silly notion that the UI is a pure function of the model. Oh, and you get live previews that actually work. But more on that later.

So I am slowly inching towards doing a ToDoMVC, a benchmark that feels rather natural to me. While I am still very partial to just dumping JSON files, and the previous article series hopefully showed that this approach is plenty fast enough, I realize that a lot of people prefer a "real" database, especially on the back-end, and I wanted to build that as well. One of the many benchmarks I have for Objective-S is that it should be possible to build a nicer Rails with it. (At this point in time I am pretty sure I will hit that benchmark).

One of the ways to figure out if you have a good design is to stress-test it. One very useful stress-test is seeing how fast it can go, because that will tell you if the thing you built is lean, or if you put in unnecessary layers and indirections.

This is particularly interesting in a Scripted Components (pdf) system that combines a relatively slow but flexible interactive scripting language with fast, optimized components. The question is whether you can actually combine the flexibility of the scripting language while reaping the benefits of the fast components, rather than having to dive into adapting and optimizing the components for each use case, or just getting slow performance despite the fast components. My hunch was that the streaming approach I have been using for a while now and that worked really well for JSON and Objective-C would also do well in this more challenging setting.

Spoiler alert: it did!

The benchmark

The benchmark was a slightly modified version of the script that serves as a tasks backend. Like said sample script it also creates a tasks database and inserts some example rows. Instead of inserting two rows, it inserts 10 million. Or a hundred million.


#!env stsh
#-taskbench:dbref
#

class Task {
	var  id.
	var  done.
	var  title.
	-description { "". }
	+sqlForCreate {
		'( [id] INTEGER PRIMARY KEY, [title] VARCHAR(220) NOT NULL, [done] INTEGER );'.
	}
}.

scheme todo : MPWAbstractStore {
	var db.
	var tasksTable.
	-initWithRef:ref {
		this:db := (MPWStreamQLite alloc initWithPath:ref path).
		this:tasksTable :=  #MPWSQLTable{ #db: this:db , #tableClass: Task, #name: 'tasks'  }.
		this:db open.
		self.
	}
	-createTable {
		this:tasksTable create.
	    this:tasksTable := this:db tables at:'tasks'.
        this:tasksTable createEncoderMethodForClass: Task.
	}
	-createTaskListToInsert:log10ofSize {
		baseList ← #( #Task{  #title: 'Clean Room', #done: false }, #Task{  #title: 'Check Twitter', #done: true } ).
		...replicate ...
		taskList.
	}
	-insertTasks {
	    taskList := self createTaskListToInsert:6.
		1 to:10 do: {
			this:tasksTable insert:taskList.
		}.
	}
}.
todo := todo alloc initWithRef:dbref.
todo createTable.
todo insertTasks.

(I have removed the body of the method that replicates the 2 tasks into the list of millions of tasks we need to insert. It was bulky and not relevant.)

In this sample we define the Task class and use that to create the SQL Table. We could also have simply created the table and generated a Tasks class from that.

Anyway, running this script yields the following result.


> time ./taskbench-sqlite.st /tmp/tasks1.db 
./taskbench-sqlite.st /tmp/tasks1.db  4.07s user 0.20s system 98% cpu 4.328 total
> ls -al  /tmp/tasks1.db* 
-rw-r--r--  1 marcel  wheel   214M Jul 24 20:11 /tmp/tasks1.db
> sqlite3 /tmp/tasks1.db 'select count(id) from tasks;' 
10000000

So we inserted 10M rows in 4.328 seconds, yielding several hundred megabytes of SQLite data. This would be 138M rows had we let it run for a minute. Nice. For comparison, the original article's numbers were 11M rows/minute for CPython, 40M rows/minute for PyPy and 181M rows/minute for Rust, though on a slower Intel MacBook Pro whereas I was running this on an M1 Air. I compiled and ran the Rust version on my M1 Air and it did 100M rows in 21 seconds, so just a smidgen over twice as fast as my Objective-S script, though with a simpler schema (CHAR(6) instead of VARCHAR(220)) and less data (1.5GB vs. 2.1GB for 100M rows).

Getting SQLite fast

The initial version of the script was far, far slower, and at first it was, er, "sub-optimal" use of SQLite that was the main culprit, mostly inserting every row by itself without batching. When SQLite sees an INSERT (or an UPDATE for that matter) that is not contained in a transaction, it will automatically wrap that INSERT inside a generated transaction and commit that transaction after the INSERT is processed. Since SQLite is very fastidious about ensuring that transactions get to disk atomically, this is slow. Very slow.

The class handling SQLite inserts is a Polymorphic Write Stream, so it knows what an array is. When it encounters one, it sends itself the beginArray message, writes the contents of the array and finishes by sending itself the endArray message. Since writing an array sort of implies that you want to write all of it, this was a good place to insert the transactions:


-(void)beginArray {
    sqlite3_step(begin_transaction);
    sqlite3_reset(begin_transaction);
}

-(void)endArray {
    sqlite3_step(end_transaction);
    sqlite3_reset(end_transaction);
}

So now, if you want to write a bunch of objects as a single transaction, just write them as an array, as the benchmark code does. There were some other minor issues, but after that less than 10% of the total time were spent in SQLite, so it was time to optimize the caller, my code.

Column keys and Cocoa Strings

At this point, my guess was that the biggest remaining slowdown would be my, er, "majestic" Objective-S interpreter. I was wrong, it was Cocoa string handling. Not only was I creating the SQLite parameter placeholder keys dynamically, so allocating new NSString objects for each column of each row, it also happens that getting character data from an NSString object nowadays involves some very complex and slow internal machinery using encoding conversion streams. -UTF8String is not your friend, and other methods appear to fairly consistently use the same slow mechanism. I guess making NSString horribly slow is one way to make other string handling look good in comparison.

After a few transformations, the code would just look up the incoming NSString key in a dictionary that mapped it to the SQLite parameter index. String-processing and character accessing averted.

Jitting the encoder method. Without a JIT

One thing you might have noticed about the class definition in the benchmark code is that there is no encoder method, it just defines its instance variables and some other utilities. So how is the class data encoded for the SQLTable? KVC? No, that would be a bit slow, though it might make a good fallback.

The magic is the createEncoderMethodForClass: method. This method, as the name suggests, creates an encoder method by pasting together a number of blocks, turns the top-level into a method using imp_implementationWithBlock(), and then finally adds that method to the class in question using class_addMethod().


-(void)createEncoderMethodForClass:(Class)theClass
{
    NSArray *ivars=[theClass allIvarNames];
    if ( [[ivars lastObject] hasPrefix:@"_"]) {
        ivars=(NSArray*)[[ivars collect] substringFromIndex:1];
    }
    
    NSMutableArray *copiers=[[NSMutableArray arrayWithCapacity:ivars.count] retain];
    for (NSString *ivar in ivars) {
        MPWPropertyBinding *accessor=[[MPWPropertyBinding valueForName:ivar] retain];
        [ivar retain];
        [accessor bindToClass:theClass];
        
        id objBlock=^(id object, MPWFlattenStream* stream){
            [stream writeObject:[accessor valueForTarget:object] forKey:ivar];
        };
        id intBlock=^(id object, MPWFlattenStream* stream){
            [stream writeInteger:[accessor integerValueForTarget:object] forKey:ivar];
        };
        int typeCode = [accessor typeCode];
        
        if ( typeCode == 'i' || typeCode == 'q' || typeCode == 'l' || typeCode == 'B' ) {
            [copiers addObject:Block_copy(intBlock)];
        } else {
            [copiers addObject:Block_copy(objBlock)];
        }
    }
    void (^encoder)( id object, MPWFlattenStream *writer) = Block_copy( ^void(id object, MPWFlattenStream *writer) {
        for  ( id block in copiers ) {
            void (^encodeIvar)(id object, MPWFlattenStream *writer)=block;
            encodeIvar(object, writer);
        }
    });
    void (^encoderMethod)( id blockself, MPWFlattenStream *writer) = ^void(id blockself, MPWFlattenStream *writer) {
        [writer writeDictionaryLikeObject:blockself withContentBlock:encoder];
    };
    IMP encoderMethodImp = imp_implementationWithBlock(encoderMethod);
    class_addMethod(theClass, [self streamWriterMessage], encoderMethodImp, "v@:@" );
}

What's kind of neat is that I didn't actually write that method for this particular use-case: I had already created it for JSON-coding. Due to the fact that the JSON-encoder and the SQLite writer are both Polymorphic Write Streams (as are the targets of the corresponding decoders/parsers), the same method worked out of the box for both.

(It should be noted that this encoder-generator currently does not handle all variety of data types; this is intentional).

Getting the data out of Objective-S objects

The encoder method uses MPWPropertyBinding objects to efficiently access the instance variables via the object's accessors, caching IMPs and converting data as necessary, so they are both efficient and flexible. However, the actual accessors that Objective-S generated for its instance variables were rather baroque, because they used the same basic mechanism used for Objective-S methods, which can only deal with objects, not with primitive data types.

In order to interoperate seamlessly with Objective-C, which expected methods that can take data types other than objects, all non-object method arguments are converted to objects on the way in, and return values are converted from objects to primitive values on the way out.

So even the accessors for primitive types such as the integer "id" or the boolean "done" would have their values converted to and from objects by the interface machinery. As I noted above, I was a bit surprised that this inefficiency was overshadowed by the NSString-based key handling.

In fact, one of the reason for pursuing the SQLite insert benchmark was to have a reason for finally tackling this Rube-Goldberg mechanism. In the end, actually addressing it turned out to be far less complex than I had feared, with the technique being very similar to that used for the encoder-generator above, just simpler.

Depending on the type, we use a different block that gets parameterised with the offset to the instance variable. I show the setter-generator below, because there the code for the object-case is actually different due to retain-count handling:


#define pointerToVarInObject( type, anObject ,offset)  ((type*)(((char*)anObject) + offset))

#ifndef __clang_analyzer__
// This leaks because we are installing into the runtime, can't remove after

-(void)installInClass:(Class)aClass
{
    SEL aSelector=NSSelectorFromString([self objcMessageName]);
    const char *typeCode=NULL;
    int ivarOffset = (int)[ivarDef offset];
    IMP getterImp=NULL;
    switch ( ivarDef.objcTypeCode ) {
        case 'd':
        case '@':
            typeCode = "v@:@";
            void (^objectSetterBlock)(id object,id arg) = ^void(id object,id arg) {
                id *p=pointerToVarInObject(id,object,ivarOffset);
                if ( *p != arg ) {
                    [*p release];
                    [arg retain];
                    *p=arg;
                }
            };
            getterImp=imp_implementationWithBlock(objectSetterBlock);
            break;
        case 'i':
        case 'l':
        case 'B':
            typeCode = "v@:l";
            void (^intSetterBlock)(id object,long arg) = ^void(id object,long arg) {
                *pointerToVarInObject(long,object,ivarOffset)=arg;
            };
            getterImp=imp_implementationWithBlock(intSetterBlock);
            break;
        default:
            [NSException raise:@"invalidtype" format:@"Don't know how to generate set accessor for type '%c'",ivarDef.objcTypeCode];
            break;
    }
    if ( getterImp && typeCode ) {
        class_addMethod(aClass, aSelector, getterImp, typeCode );
    }
    
}

At this point, profiles were starting to approach around two thirds of the time being spent in sqlite_ functions, so the optimisation efforts were starting to get into a region of diminishing returns.

Linear scan beats dictionary

One final noticeable point of obvious overhead was the (string) key to parameter index mapping, which the optimizations above had left at a NSDictionary mapping from NSString to NSNumber. As you probably know, NSDictionary isn't exactly the fastest. One idea was to replace that lookup with a MPWFastrStringTable, but that means either needing to solve the problem of fast access to NSString character data or changing the protocol.

So instead I decided to brute-force it: I store the actual pointers to the NSString objects in a C-Array indexed by the SQLite parameter index. Before I do the other lookup, which I keep to be safe, I do a linear scan in that table using the incoming string pointer. This little trick largely removed the parameter index lookup from my profiles.

Conclusion

With those final tweaks, the code is probably quite close to as fast as it is going to get. Its slower performance compared to the Rust code can be attributed to the fact that it is dealing with more data and a more complex schema, as well as having to actually obtain data from materialized objects, whereas the Rust code just generates the SQlite calls on-the-fly.

All this is achieved from a slow, interpreted scripting language, with all the variable parts (data class, steering code) defined in said slow scripting language. So while I look forward to the native compiler for Objective-S, it is good to know that it isn't absolutely necessary for excellent performance, and that the basic design of these APIs is sound.

Tuesday, June 15, 2021

if let it be

One of funkier aspects of Swift syntax is the if let statement. As far as I can tell, it exists pretty much exclusively to check that an optional variable actually does contain a value and if it does, work with a no-longer-optional version of that variable.

Swift packages this functionality in a combination if statement and let declaration:


if let value = value {
   print("value is \(value)")
}

This has a bunch of problems that are explained nicely in a Swift Evolution thread (via Michael Tsai) together with some proposals to fix it. One of the issues is the idiomatic repitition of the variable name, because typically you do want the same variable, just with less optionality. Alas, code-completion apparently doesn't handle this well, so the temptation is to pick a non-descriptive variable name.

In my previous post (Asynchronous Sequences and Polymorphic Streams) I noted how the fact that iteration in Smalltalk and Objecive-S is done via messages and blocks means that there is no separate concept of a "loop-variable", that is just an argument to the block.

Conditionals are handled the same way, with blocks and messages, but normally don't pass arguments to their argument blocks, because in normal conditionals those arguments would always be just the constants true or false. Not very interesting.

When I added ifNotNil: some time ago, I used the same logic, but it turns out the object is now actually potentially interesting. So ifNotNil: now passes the now-known-to-be-non-nil value to the block and can be used as follows:


value ifNotNil:{ :value |
    stdout println:value.
}

This doesn't eliminate the duplication, but does avoid the issue of having the newly introduced variable name precede the original variable. Well, that and the whole weird if let in the first place.

With anonymous block arguments, we actually don't have to name the parameter at all:


value ifNotNil:{ stdout println:$0. }

Alternatively, we can just take advantage of some conveniensces and use a HOM instead:


value ifNotNil printOn:stdout.

Of course, Objective-S currently doesn't care about optionality, and with the current nil-eating behavior, the ifNotNil is not strictly necessary, you could just write it as follow:


value printOn:stdout.

I haven't really done much thinking about it, but the whole idea of optionality shouldn't really be handled in the space of values, but in the space of references. Which are first class objects in Objective-S.

So you don't ask a value if it is nil or not, you ask the variable if it contains a value:


ref:value ifBound:{ :value | ... }

To me that makes a lot more sense than having every type be accompanied by an optional type.

So if we were to care about optionality so in the future, we have the tools to create a sensible solution. And we can let if let just be.

Sunday, June 13, 2021

Asynchronous Sequences and Polymorphic Streams

Browsing the WWDC '21 session videos, I came across the session on Asynchronous Sequences. The preview image showcased some code for asynchronously fetching and massaging current earthquake data from the U.S. Geological Survey:
@main
struct QuakesTool {
   static func main() async throws {
      let endpointURL = URL(string: "https://earthquake.usgs.gov/earthquakes/feed/v1.0/summary/all_month.csv")!

      for try await event in endpointURL.lines.dropFirst() {
         let values = event.split(separator: ",")
         let time = values[0]
         let latitude = values[1]
         let longitude = values[2]
         let magnitude = values[4]
         print("Magnitude \(magnitude) on \(time) at \(latitude) \(longitude)")
      }
   }
}

This is nice, clean code, and it certainly looks like it serves as a good showcase for the benefits of asynchronous coding with async/await and asynchronous sequences built on top of async/await.

Or does it?

Here is the equivalent code in Objective-S:


#!env stsh
stream ← ref:https://earthquake.usgs.gov/earthquakes/feed/v1.0/summary/all_month.csv linesAfter:1.

stream do: { :theLine |
   values ← theLine componentsSeparatedByString:','.
   time ← values at:0.
   latitude ← values at:1.
   longitude ← values at:2.
   magnitude ← values at:4.
   stdout println:"Quake: magnitude {magnitude} on {time} at {latitude} {longitude}".
}. 
stream awaitResultForSeconds:20.

Objective-S does not (and will not) have async/await, but it can nevertheless provide the equivalent functionality easily and elegantly. How? Two features:

  1. Polymorphic Write Streams
  2. Messaging
Let's see how these two conspire to make adding something equivalent to for try await trivial.

Polymorphic Write Streams

In the Objective-S implementation, https://earthquake.usgs.gov/earthquakes/feed/v1.0/summary/all_month.csv is not a string, but an actual identifier, a Polymorphic Identifier, adding the ref: prefix turns it into a binding, a first class variable. You can ask a binding for its value, but for bindings that can also be regarded as collections of some kind, you can also ask them for a stream of their values, in this particular case a MPWURLStreamingStream. This stream is a Polymorphic Write Stream that can be easily composed with other filters to create pipelines. The linesAfter: method is a convenience method that does just that: it composes the URL fetcher with a filter that converts from bytes to lines of text and another filter that drops the first n items.

Objective-S actually has convenient syntax for creating these compositions without having to do it via convenience methods, but I wanted to keep differences in the surrounding scaffolding small for this example, which is about the for try away and do:.

When I encountered the example, Polymorphic Write Streams actually did not have a do: for iteration, but it was trivial to add:


-(void)do:aBlock
{
    [self setFinalTarget:[MPWBlockTargetStream streamWithBlock:aBlock]];
    [self run];
}

(This code lives in MPWFoundation, so it is in Objective-C, not Objective-S).

Those 5 lines were all that was needed. I did not have to make substantive changes to the language or its implementation. One reason for this is that Polymorphic Write Streams are asynchrony-agnostic: although they are mostly implemented as straightforward synchronous code, they work just as well if parts of the pipeline they are in are asynchronous. It just doesn't make a difference, because the semantics are in the data flow, not in the control flow.

Messaging

The other big reason an asynchronous do: was easy to add is messaging.
If you focus on just messaging -- and realize that a good metasystem can late bind the various 2nd level architectures used in objects -- then much of the language-, UI-, and OS based discussions on this thread are really quite moot.
One of the many really, really neat ideas in Smalltalk is how control structures, which in most other languages are special language features, are just plain old messages and implemented in the library, not in the language.

So the for ... in loop in Swift is just the do: message sent to a collection, and the keyword syntax makes this natural:


for event in lines {
...
}
...
lines do: { :event |
...
}

Note how making loops regular like this also makes the special concept of "loop variable" disappear. The "loop variable" is just the block argument. And I just realized the same would go for a not-nil result of a nil test.

Anyway, if "loops" are just messages, it's easy to add a method implementing iteration to some other entity, for example a stream, the way that I did. (Smalltalk streams also support the iteration messages).

And when you can easily make stream processing, which can handle asynchrony naturally and easily, just as convenient as imperative programming, you don't need async/await, which tries to make asynchronous programming look like imperative programming in order to make it convenient.

Friday, May 21, 2021

Why are there no return statements in Objective-S?

My previous example raised a question: why no return statements? I am assuming this was about this part of the example:

   -description { "Task: {this:title} done: {this:done}". }


The answer is that I would like to do without return statements if and as much as I can. We will see how much that is. In general, I am in favor of expression-orientation in programming languages. A simple example is if-statements vs. conditional expressions. In most languages today, like C, Objective-C and Swift, if is a statement. That means I write something as follows:
if ( condition ) {
   do something if true
} else {
   do something different if false
}

This seems obvious and is general, but very often you don't want to do arbitrary stuff, you just want to have some variable have some value in one case and a different value in another case.
int foo;
if ( condition ) {
   foo = 1;
} else {
   foo = 42;
}

In that case, it is annoying that the if is defined to be a statement and not an expression, because you can't just write the following:
int foo;
foo = if ( condition ) { 1; } else { 42; }

In addition, as hinted to in the previous examples, you can't use a statement to initialize a variable, that definitely has to be an expression. Which is why C and many derived languages have the "ternary" operator (?:), which is really just an if/else in expression form.
int foo=condition ? 1 : 42;

That solves the problem, but now you have two conditionals. Why not have just one? LISP, most of the FP languages as well as Smalltalk and Objective-S have an if that returns a value.
a := condition ifTrue:{ 1. } ifFalse:{ 42. }.

So that's why expression-orientation is useful in general. What about methods? The same general idea applies. Whereas in Java, for example, a read accessor is called getX(), indicating an action that is performed ("get the value of x") in Objective-C Smalltalk and Objective-S, it is just called x, ("the value of x").

The same idea applies to dropping return statements where possible. It's not "get me the description of this object", it is "the description of this object is...". And inside the method, it's not "this statement now returns the following string as the description", but, again, "the description is...".

Describing things that are, rather than actions to perform, is at the heart of Objective-S, as discussed in Can Programmers Escape the Gentle Tyranny of Call/Return.

As Guy Steele put it:

Another weakness of procedural and functional programming is that their viewpoint assumes a process by which "inputs" are transformed into "outputs"; there is equal concern for correctness and for termination (and proofs thereof). But as we have connected millions of computers to form the Internet and the World Wide Web, as we have caused large independent sets of state to interact–I am speaking of databases, automated sensors, mobile devices, and (most of all) people–in this highly interactive, distributed setting, the procedural and functional models have failed, another reason why objects have become the dominant model. Ongoing behavior, not completion, is now of primary interest. Indeed, object-oriented programming had its origins in efforts to simulate the ongoing behavior of interacting real-world entities–thus the programming language SIMULA was born.
So wherever possible, Objective-S tries to push towards expressing things as statically as possible, pushing away from action-orientation. For example, hooking up a timed source to a pin:
#Blinker{ #seconds: 1, #active: true} → ref:gpio:17. 

instead of executing a loop:
while True: 
    GPIO.output(17, True) 
    sleep(1) 
    GPIO.output(17, False) 
    sleep(1) 

The same goes for many other relationships: instead of writing procedural code that initiates and/or maintains the relatinship, with the actual relationship remainig implicit, describe the actual relationship, make that explicit, and instead keep the procedural code that maintains it as a hidden implementation detail.

If the return statement comes back, and it very well might, I am hoping it will be in a slightly more general form. I recall Smalltalk's "^" being described as "send back". I've already taken that and generalised it to mean "send result", using it in filter definitions, where "^" means "send a result to the next filter in the pipeline". It is needed there because filters are not limited to sending a single result, they can send zero or many.

With those more general semantics, "^" might also be used to send back results to the sender of an asynchronous message, which is obviously quite different from a "return".

And of course it would be useful for early returns, which are currently not possible.

What about void methods?

Objective-S does have void methods, after all its procedural part is essentially identical to Objective-C, which also has them. However, I agree with the FP folk that functions (procedures, methods) should be as (side-)effect free as possible, and void methods by definition are effectful (or no-ops).

So where do the effects go? Two places:

  1. The left hand side of the "←".

    In most current programming languages, assignment is severely crippled, and therefore not really useful for generalised effects. With Polymorphic Identifiers and Storage Combinators, there is enough expressive power and ability to abstract that we should need far fewer void methods.

  2. Connecting via "→"

    Much of the need for effectful methods in OO is for constructing and connecting objects. In Objective-S, you don't need to call methods that result in a connection being established as a side effect of munging on some state, you define connections between objects directly using "→".

    Well, and you define objects using object literals such as  #Blinker{ #seconds: 1, #active: true}  instead of setting instance variables procedurally.

That's the plan, anyway. Although a lot of that plan is coming true at the moment. Exciting times! (And one of the reasons I haven't been blogging all that much).

Friday, November 13, 2020

M1 Memory and Performance

The M1 Macs are out now, and not only does Apple claim they're absolutely smokin', early benchmarks seem to confirm those claims. I don't find this surprising, Apple has been highly focused on performance ever since Tiger, and as far as I can tell hasn't let up since.

One maybe somewhat surprising aspect of the M1s is the limitation to "only" 16 Gigabytes of memory. As someone who bought a 16 Kilobyte language card to run the Merlin 6502 assembler on his Apple ][+ and expanded his NeXT cube, which isn't that different from a modern Mac, to a whopping 16 Megabytes, this doesn't actually seem that much of a limitation, but it did cause a bit of consternation.

I have a bit of a theory as to how this "limitation" might tie in to how Apple's outside-the-box approach to memory and performance has contributed to the remarkable achievement that is the M1.

The M1 is apparently a multi-die package that contains both the actual processor die and the DRAM. As such, it has a very high-speed interface between the DRAM and the processors. This high-speed interface, in addition to the absolutely humongous caches, is key to keeping the various functional units fed. Memory bandwidth and latency are probably the determining factors for many of today's workloads, with a single access to main memory taking easily hundreds of clock cycles and the CPU capable of doing a good number of operations in each of these clock cycles. As Andrew Black wrote: "[..] computation is essentially free, because it happens 'in the cracks' between data fetch and data store; ..".

The tradeoff is that you can only fit so much DRAM in that package for now, but if it fits, it's going to be super fast.

So how do we make sure it all fits? Well, where Apple might have been "focused" on performance for the last 15 years or so, they have been completely anal about memory consumption. When I was there, we were fixing 32 byte memory leaks. Leaks that happened once. So not an ongoing consumption of 32 bytes again and again, but a one-time leak of 32 bytes.

That dedication verging on the obsessive is one of the reasons iPhones have been besting top-of-the-line Android phone that have twice the memory. And not by a little, either.

Another reason is the iOS team's steadfast refusal to adopt tracing garbage collection as most of the rest of the industry did, and macOS's later abandonment of that technology in favor of the reference counting (RC) they've been using since NeXTStep 4.0. With increased automation of those reference counting operations and the addition of weak references, the convenience level for developers is essentially indistinguishable from a tracing GC now.

The benefit of sticking to RC is much-reduced memory consumption. It turns out that for a tracing GC to achieve performance comparable with manual allocation, it needs several times the memory (different studies find different overheads, but at least 4x is a conservative lower bound). While I haven't seen a study comparing RC, my personal experience is that the overhead is much lower, much more predictable, and can usually be driven down with little additional effort if needed.

So Apple can afford to live with more "limited" total memory because they need much less memory for the system to be fast. And so they can do a system design that imposes this limitation, but allows them to make that memory wicked fast. Nice.

Another "well-known" limitation of RC that has made it the second choice compared to tracing GC is the fact that updating those reference counts all the time is expensive, particularly in a multi-threaded environment where those updates need to be atomic. Well...

How? Problem solved. I guess it helps if you can make your own Silicon ;-)

So Apple's focus on keeping memory consumption under control, which includes but is not limited to going all-in on reference counting where pretty much the rest of the industry has adopted tracing garbage collection, is now paying off in a majory way ("bigly"? Too soon?). They can get away with putting less memory in the system, which makes it possible to make that memory really fast. And that locks in an advantage that'll be hard to duplicate.

It also means that native development will have a bigger advantage compared to web technologies, because native apps benefit from the speed and don't have a problem with the memory limitations, whereas web-/electron apps will fill up that memory much more quickly.

Sunday, June 21, 2020

Beyond Faster JSON Support for iOS/macOS, Part 9: CSV and SQLite

When looking at the MPWPlistStreaming protocol that I've been using for my JSON parsing series, one thing that was probably noticeable is that it isn't particularly JSON-focused. In fact, it wasn't even initially designed for parsing, but for generating.

So could we use this for other de-serialization tasks? Glad you asked!

CSV parsing

One of the examples in my performance book involves parsing Comma Separated Values quickly, within the context of getting the time to convert a 139Mb GTFS file to something usable on the phone down from 20 minutes using using CoreData/SQLite to slightly less than a second using custom in-memory data structures that are also several orders of magnitude faster to query on-device.

The original project's CVS parser took around 18 seconds, which wasn't a significant part of the 20 minutes, but when the rest only took a couple of hundred milliseconds, it was time to make that part faster as well. The result, slightly generalized, is MPWDelimitedTable ( .h .m ).

The basic interface is block-based, with the block being called for every row in the table, called with a dictionary composed of the header row as keys and the contents of the row as values.


-(void)do:(void(^)(NSDictionary* theDict, int anIndex))block;

Adapting this to the MPWPlistStreaming protocol is straightforward:
-(void)writeOnBuilder:(id )builder
{
    [builder beginArray];
    [self do:^(NSDictionary* theDict, int anIndex){
        [builder beginDictionary];
        for (NSString *key in self.headerKeys) {
            [builder writeObject:theDict[key] forKey:key];
        }
        [builder endDictionary];
    }];
    [builder endArray];
}

This is a quick-and-dirty implementation based on the existing API that is clearly sub-optimal: the API we call first constructs a dictionary from the row and the header keys and then we iterate over it. However, it works with our existing set of builders and doesn't build an in-memory representation of the entire CSV.

It will also be relatively straightforward to invert this API usage, modifying the low-level API to use MPWPlistStreaming and then creating a higher-level block- and dictionay-based API on top of that, in a way that will also work with other MPWPlistStreaming clients.

SQLite

Another tabular data format is SQL data bases. On macOS/iOS, one very common database is SQLite, usually accessed via CoreData or the excellent and much more light-weight fmdb.

Having used fmdb myself before, and bing quite delighted with it, my first impulse was to write a MPWPlistStreaming adapter for it, but after looking at the code a bit more closely, it seemed that it was doing quite a bit that I would not need for MPWPlistStreaming.

I also think I saw the same trade-off between a convenient and slow convenience based on NSDictionary and a much more complex but potentially faster API based on pulling individual type values.

So Instead I decided to try and do something ultra simple that sits directly on top of the SQLite C-API, and the implementation is really quite simple and compact:


@interface MPWStreamQLite()

@property (nonatomic, strong) NSString *databasePath;

@end

@implementation MPWStreamQLite
{
    sqlite3 *db;
}

-(instancetype)initWithPath:(NSString*)newpath
{
    self=[super init];
    self.databasePath = newpath;
    return self;
}

-(int)exec:(NSString*)sql
{
    sqlite3_stmt *res;
    int rc = sqlite3_prepare_v2(db, [sql UTF8String], -1, &res, 0);
    @autoreleasepool {
        [self.builder beginArray];
        int step;
        int numCols=sqlite3_column_count(res);
        NSString* keys[numCols];
        for (int i=0;i < numCols; i++) {
            keys[i]=@(sqlite3_column_name(res, i));
        }
        while ( SQLITE_ROW == (step = sqlite3_step(res))) {
            @autoreleasepool {
                [self.builder beginDictionary];
                for (int i=0; i < numCols; i++) {
                    const char *text=(const char*)sqlite3_column_text(res, i);
                    if (text) {
                        [self.builder writeObject:@(text) forKey:keys[i]];
                    }
                }
                [self.builder endDictionary];
            }
        }
        sqlite3_finalize(res);
        [self.builder endArray];
    }
    return rc;
}

-(int)open
{
    return sqlite3_open([self.databasePath UTF8String], &db);
}

-(void)close
{
    if (db) {
        sqlite3_close(db);
        db=NULL;
    }
}


Of course, this doesn't do a lot, chiefly it only reads, no updates, inserts or deletes. However, the code is striking in its brevity and simplicity, while at the same time being both convenient and fast, though with still some room for improvement.

In my experience, you tend to not get all three of these properties at the same time: code that is simple and convenient tends to be slow, code that is convenient and fast tends to be rather tricky and code that's simple and fast tends to be inconvenient to use.

How easy to use is it? The following code turns a table into an array of dictionaries:


#import <MPWFoundation/MPWFoundation.h>

int main(int argc, char* argv[]) {
   MPWStreamQLite *db=[[MPWStreamQLite alloc] initWithPath:@"chinook.db"];
   db.builder = [MPWPListBuilder new];
   if( [db open] == 0 ) {
       [db exec:@"select * from artists;"];
       NSLog(@"results: %@",[db.builder result]);
       [db close];
   } else {
       NSLog(@"Can't open database: %s\n", [db error]);
   }
   return(0);
}

This is pretty good, but probably roughly par for the course for returning a generic data structure such as array of dictionaries, which is not going to be particularly efficient. (One of my first clues that CoreData's predecessor EOF wasn't particularly fast was when I read that fetching raw dictionaries was an optimization, much faster than fetching objects.)

What if we want to get objects instead? Easy, just replace the MPWPListBuilder with an MPWObjectBuilder, parametrized with the class to create. Well, and define the class, but presumably you already havee that if the task is to convert to objects of that class. And it cold obviously also be automated.



#import <MPWFoundation/MPWFoundation.h>

@interface Artist : NSObject { }

@property (assign) long ArtistId;
@property (nonatomic,strong) NSString *Name;

@end

@implementation Artist

-(NSString*)description
{
	return [NSString stringWithFormat:@"<%@:%p id: %ld name: %@>",[self class],self,self.ArtistId,self.Name];
}

@end

int main(int argc, char* argv[]) {
   MPWStreamQLite *db=[[MPWStreamQLite alloc] initWithPath:@"chinook.db"];
   db.builder = [[MPWObjectBuilder alloc] initWithClass:[Artist class]];
   if( [db open] == 0) {
       [db exec:@"select * from artists"];
       NSLog(@"results: %@",[db.builder result]);
       [db close];
   } else {
       NSLog(@"Can't open database: %s\n", [db error]);
   }
   return(0);
}

Note that this does not generate a plist representation as an intermediate step, it goes straight from database result sets to objects. The generic intermediate "format" is the MPWPlistStreaming protocol, which is a dematerialized representation, both plist and objects are peers.

TOC

Somewhat Less Lethargic JSON Support for iOS/macOS, Part 1: The Status Quo
Somewhat Less Lethargic JSON Support for iOS/macOS, Part 2: Analysis
Somewhat Less Lethargic JSON Support for iOS/macOS, Part 3: Dematerialization
Equally Lethargic JSON Support for iOS/macOS, Part 4: Our Keys are Small but Legion
Less Lethargic JSON Support for iOS/macOS, Part 5: Cutting out the Middleman
Somewhat Faster JSON Support for iOS/macOS, Part 6: Cutting KVC out of the Loop
Faster JSON Support for iOS/macOS, Part 7: Polishing the Parser
Faster JSON Support for iOS/macOS, Part 8: Dematerialize All the Things!
Beyond Faster JSON Support for iOS/macOS, Part 9: CSV and SQLite

Sunday, June 14, 2020

The Curious Case of Swift's Adoption of Smalltalk Keyword Syntax

I was really surprised to learn that Swift recently adopted Smalltalk keyword syntax: [Accepted] SE-0279: Multiple Trailing Closures. That is: a keyword terminated by a colon, followed by an argument and without any surrounding braces.

The mind boggles.

A little.

Of course, Swift wouldn't be Swift if this weren't a special case of a special case, specifically the case of multiple trailing closures, which is a special case of trailing closures, which are weird and special-casey enough by themselves. Below is an example:


UIView.animate(withDuration: 0.3) {
  self.view.alpha = 0
} completion: { _ in
  self.view.removeFromSuperview()
}

Note how the arguments to animate() would seem to terminate at the closing parenthesis, but that's actually not the case. The curly braces after the closing paren start a closure that is actually also an argument to the method, a so-called trailing closure. I have a little bit of sympathy for this construct, because closures inside of the parentheses look really, really awkward. (Of course, all params apart from a sole x inside f(x) look awkward, but let's not quibble. For now.).

Another thing this enables is methods that reasonably resemble control structures, which I heard is a really great idea.

The problem is that sometimes you have more than one closure argument, and then just stacking them up behind what appears to be end of the function/method call gets really, really awkward, and you can't tell which block is which argument, because the trailing closure doesn't get a keyword.

Well, now it does. And we now have 4 different method syntaxes in one!

  1. Traditional C/Pascal/C++/Java function call syntax x.f()
  2. The already weird-ish addition of Smalltalk/Objective-C keywords inside the f(x) syntax: f(arg:x)
  3. Original trailing-closure syntax, which is just its own thing, for the first closure
  4. Smalltalk non-brackted keyword syntax for the 2nd and subsequent closures.
That is impressive, in a scary kind of way.
Swift is a crescendo of special cases stopping just short of the general; the result is complexity in the semantics, complexity in the behaviour (i.e. bugs), and complexity in use (i.e. workarounds).
In understand that this proposal was quite controversial, with heated discussion between opponents and proponents. I understand and sympathize with both sides. On the one hand, this is markedly better than alternatives. On the other hand it is a special case of a special case that is difficult to justify as an addition of all that is already there.

Special cases beget special cases beget special cases.

Of course the answer was always there: Smalltalk keyword syntax is not just the only reasonable solution in this case, it also solves all the other cases. It is the general solution. Here's how this could look in Objective-Smalltalk (which uses curly braces instead for closures instead of Smalltalk-80's square brackets):


UIView animate:{ self.view.alpha ← 0. } withDuration:0.3 completion:{ self view removeFromSuperview. }.

No special cases, every argument is labeled, no syntax mush of brackets inside parentheses etc. And yes, this also handles user-defined control structures, to:do: is just a method on NSNumber:


1 to:10 do:{:i | stdout println:"I will not introduce {i} special cases willy nilly.".}.

And since keywords naturally go between their arguments, there is no need for "operators", as a very different and special syntax form. You just allow some "binary" keywords to look a little different, so instead of 2 multiply:3 you can write 2 * 3. And when you have 2 raisedTo:3 instead of pow(2,3) (with the signature: func pow(_ x: Decimal, _ y: Int) -> Decimal), do you really neeed to go to the trouble of defining an "operator"?

Or Swift's a as b, another special kind of syntax. How about a as:b? (Yes I know there are details, but those are ... details.). And so on and so forth.

But of course, it's too late now. When I chose Smalltalk as the base syntax for the language that has turned into Objective-Smalltalk, it wasn't just because I just like it or have gotten used to it via Objective-C. Smalltalk's syntax is surprisingly flexible and general, Smalltalk APIs look a lot like DSLs, without any of the tooling or other overheads.

And that's the frustrating part: this stuff was and is available and well-known. At least if you bother to look and/or ask. But instead, we just choose these things willy-nilly and everybody has to suffer the consequences.

UPDATE:

I guess what I am trying to get at is that if you'd thought things through just a little bit, you could have had almost the entire syntax of your language for the cost (complexity, implementation size and brittleness, cognitive load, etc.) of this one special case of a special case. And it would have been overall better to boot.

Monday, June 1, 2020

MPWTest Only Tests Frameworks

It should be noted, if it wasn't obvious, that MPWTest is opinionated software, meaning it achieves some of its smoothness by gleefully embracing constraints that some might view as potentially crippling limitations.

Maybe the biggest of these constraints, mentioned in the previous post, is that MPWTest only tests frameworks. This means that the following workflow is not supported out of the box:

The point being that this is a workflow I not just somewhat indifferently do not want, but rather emphatically and actively want to avoid. Tests that are run (only?) when launching the app are application tests. My perspective is that unit tests are an integral part of the class. This may seem a subtle distinction, but subtle differences in something you do constantly can have huge impacts. "Steter Tropfen höhlt den Stein."

Another aspect is that launching the app for testing as a permanent and fixed part of your build process seems highly annoying at best. Linker finishes, app pops up, runs for a couple of seconds, shuts down again. I don't see that as viable. For testing to be integral and pervasive, it has to be invisible when the tests succeed.

The testing pyramid is helpful here: my contention is that you want to be at the bottom of that pyramid, ideally all of the time. Realistically, you're probably not going to get there, but you should push really, really hard, even making sacrifices that appear to be unreasonable to achieve that goal.

Framework-oriented programming

Only testing frameworks begs the question as to how to test those parts of the application not in frameworks. For me the answer is simple: there isn't any production code outside of frameworks.

None. Not the UI, not the application delegate. Only the auto-generated main().

The benefits of this approach are plentiful, the effort minimal. And if you think this is an, er, eccentric position to take, the program you almost certainly use to create apps for iOS/macOS etc. takes the same eccentric position: Xcode's main executable is 45K in size and only contains a main() function and some Swift boilerplate.

If all your code is in frameworks, only testing frameworks is not a problem. That may seem like a somewhat extreme case of sour grapes, with the arbitrary limitations of a one-off unit testing framework driving major architectural decisions, but the causality is the other way around: I embraced framework-oriented programming before and independently of MPWTest.

iOS

Another issue is iOS. Running a command-line tool that dynamically loads and tests frameworks is at least tricky and may be impossible, so that approach currently does not work. My current approach is that I view on-device and on-simulator tests as higher-up in the testing hierarchy: they are more costly, less numerous and run less frequently.

The vast majority of code lives in cross-platform frameworks (see: Ports and Adapters) and is developed and tested primarily on macOS. I have found this to be much faster than using the simulator or a device in day-to-day programming, and have used this "mac-first" technique even on projects where we were using XCTest.

Although not testing on the target platform may be seen as a problem, I have found discrepancies to be between exceedingly rare and non-existent, with "normal" code trending towards the latter. One of the few exceptions in the not-quite-so-normal code that I sometimes create was the change of calling conventions on arm64, which meant that plain method pointers (IMPs) no longer worked, but had to be cast to the "correct" pointer type, only on device. Neither macOS nor the simulator would show the problem.

For that purpose, I hacked together a small iOS app that runs the tests specified in a plist in the app bundle. There is almost certainly a better way to handle this, but I haven't had the cycles or motivation to look into it.

How to approximate

So you can't or don't want to adopt MPWTest. That doesn't mean you can't get at least some of the benefits of the approach. As a start, instead of using Cmd-B in Xcode to build, just use Cmd-U instead. That's what I did when working on Wunderlist, where we used XCTest.

Second, adopt framework-oriented programming and the Ports and Adapters style as much as possible. Put all your code in frameworks, and as much as possible in cross-platform frameworks that you can test/run on macOS, and even if you are developing exclusively for iOS, create a macOS target for that framework. This makes using Cmd-U to build much less painful.

Third, adhere to a strict 1:1 mapping between production classes and test classes, and place your test classes in the same file as the class they are testing.

My practical experience with both JUnit and XCTest on medium-sized projects does not square with the assertion that the difference is not that big: you still have to create these additional classes, they have to communicate with the class under tests (self in MPWTest), you have to track changes etc. And of course, you have to know to configure und use the framework differently from the way it was built, intended and documented. And what I've seen of OCUnit use was that the tests were not co-located with the class, but in a separate part of the project.

A final note is that the trick of interchangeably using the class as the test fixture is only really possible in a language like Objective-C where classes are first class objects. It simply wouldn't be possible in Java. This is how the class can test itself, and the tests become an integral part of the class, rather than something that's added somewhere else.

Thursday, May 14, 2020

Embedding Objective-Smalltalk

Ilja just asked for embedded scripting-language suggestions, presumably for his GarageSale E-Bay listings manager, and so of course I suggested Objective-Smalltalk.

Unironically :-)

This is a bit scary. On the one hand, Objective-Smalltalk has been in use in my own applications for well over a decade and runs the http://objective.st site, both without a hitch and the latter shrugging of a Hacker News "Hug of Death" without even the hint of glitch. On the other hand, well, it's scary.

As for usability, you include two frameworks in your application bundle, and the code to start up and interact with the interpreter or interpreters is also fairly minimal, not least of all because I've been doing so in quite a number of applications now, so inconvenience gets whittled away over time.

In terms of suitability, I of course can't answer that except for saying it is absolutely the best ever. I can also add that another macOS embeddable Smalltalk, FScript, was used successfully in a number of products. Anyway, Ilja was kind enough to at least pretend to take my suggestion seriously, and responded with the following question as to how code would look in practice:

I am only too happy to answer that question, but the answer is a bit beyond the scope of twitter, hence this blog post.

First, we can keep things very close to the original, just replacing the loop with a -select: and of course changing the syntax to Objective-Smalltalk.


runningListings := context getAllRunningListings.
listingsToRelist := runningListings select:{ :listing |
    listing daysRunning > 30 and: listing watchers < 3 .
}
ebay endListings:listingsToRelist ended:{ :ended | 
     ebay relistListings:ended relisted: { :relisted |
         ui alert:"Relisted: {relisted}".
     }
}

Note the use of "and:" instead of "&&" and the general reduction of sigils. Although I personally don't like the pyramid of doom, the keyword message syntax makes it significantly less odious.

So much in fact, that Swift recently adopted open keyword syntax for the special case of multiple trailing closures. Of course the mind boggles a bit, but that's a topic for a separate post.

So how else can we simplify? Well, the context seems a little unspecific, and getAllRunningListings a bit specialized, it probably has lots of friends that result from mapping a website with lots of resources onto a procedural interface.

Let's instead use URLs for this, so an ebay: scheme that encapsulates the resources that EBay lets us play with.


listingsToRelist := ebay:listings/running select:{ :listing |
    listing daysRunning > 30 and: listing watchers < 3 .
}
ebay endListings:listingsToRelist ended:{ :ended | 
     ebay relistListings:ended relisted: { :relisted |
         ui alert:"Relisted {relisted} listings".
     }
}

I have to admit I also don't really understand the use of callbacks in the relisting process, as we are waiting for everything to complete before moving to the next stage. So let's just implement this as plain sequential code:
listingsToRelist := ebay:listings/running select:{ :listing |
    listing daysRunning > 30 and: listing watchers < 3 .
}
ended := ebay endListings:listingsToRelist.
relisted := ebay relistListings:ended.
ui alert:"Relisted: {relisted}".

(In scripting contexts, Objective-Smalltalk currently allows defining variables by assigning to them. This can be turned off.)

However, it seems odd and a bit non-OO that the listings shouldn't know how to do stuff, so how about just having relist and end be methods on the listings themselves? That way the code simplifies to the following:


listingsToRelist := ebay:listings/running select:{ :listing |
    listing daysRunning > 30 and: listing watchers < 3 .
}
ended := listingsToRelist collect end.
relisted := ended collect relist.
ui alert:"Relisted: {relisted}".

If batch operations are typical, it probably makes sense to have a listings collection that understands about those operations:
listingsToRelist := ebay:listings/running select:{ :listing |
    listing daysRunning > 30 and: listing watchers < 3 .
}
ended := listingsToRelist end.
relisted := ended relist.
ui alert:"Relisted: {relisted}".

Here I am assuming that ending and relisting can fail and therefore these operations need to return the listings that succeeded.

Oh, and you might want to give that predicate a name, which then makes it possible to replace the last gobbledygook with a clean, "do what I mean" Higher Order Message. Oh, and since we've had Unicode for a while now, you can also use '←' for assignment, if you want.


extension EBayListing {
  -<bool>shouldRelist {
      self daysRunning > 30 and: self watchers < 3.
  }
}

listingsToRelist ← ebay:listings/running select shouldRelist.
ended ← listingsToRelist end.
relisted ← ended relist.
ui alert:"Relisted: {relisted}".

To my obviously completely unbiased eyes, this looks pretty close to a high-level, pseudocode specification of the actions to be taken, except that it is executable.

This is a nice step-by-step script, but with everything so compact now, we can get rid of the temporary variables (assuming the extension) and make it a one-liner (plus the alert):


relisted ← ebay:listings/running select shouldRelist end relist.
ui alert:"Relisted: {relisted}".

It should be noted that the one-liner came to be not as a result of sacrificing readability in order to maximally compress the code, but rather as an indirect result of improving readability by removing the cruft that's not really part of the problem being solved.

Although not needed in this case (the precedence rules of unary message sends make things unambiguous) some pipe separators may make things a bit more clear.


relisted ← ebay:listings/running select shouldRelist | end | relist.
ui alert:"Relisted: {relisted}".

Whether you prefer the one-liner or the step-by-step is probably a matter of taste.

Sunday, April 26, 2020

Faster JSON Support for iOS/macOS, Part 8: Dematerialize All the Things!

In the last exciting instalment of our JSON parsing on macOS/iOS series, we got rid of temporary objects in our parser → builder protocol as much as possible and saw performance soar to 195 MB/s, almost 20 times faster than Swift's JSONDecoder. At this point, creating the objects and adding them to the array take a combined total of 45%, and surely this is something we can't reasonably get rid off. Is it?

Object Streaming

Although the requirement was for objects to be created, nobody said that they all have to exist at the same time. Instead of returning the complete array of parsed objects when done, we can also tell the parser to stream objects to some target as they come in, by setting the streamingThreshold, which says at which depth into the JSON tree we start to use streaming.


-(void)decodeMPWDirectStream:(NSData*)json
{
    MPWMASONParser *parser=[[MPWMASONParser alloc] initWithClass:[TestClass class]];
    MPWObjectBuilder *builder=(MPWObjectBuilder*)[parser builder];
    [builder setStreamingThreshold:1];
    [builder setTarget:self];
    [parser parsedData:json];
}

Since we've set ourselves as the streaming target we need to provide a writeObject: method in order to conform to the Streaming protocol.


-(void)writeObject:(TestClass*)anObject
{
    if (!first) {
        first=[MPWRusage current];
    }
    objCount++;
    hiCount+=anObject.hi;
}

This method counts the objects and sums up their hi instance variables. It also records the time the first object comes in. How does this do?

Very well, at 192 ms and 229 MB/s. In addition, the time to first object is around 700 µs, so less than a millisecond for an application to start receiving usable data and be able to provide feedback to the user.

What's immediately noticeable is that the beginDictionary method is no longer at the top of the profile, it is almost all the way to the bottom with just 2.6% and 4.3ms of the total running time.

How is this actually possible? After all, we still get the 1 million objects, so we still have to create all of them, even if we dole them out in a piecemeal fashion. Or do we?

MPWObjectCache

The MPWObjectCache class (.h .m), keeps a circular buffer of objects that it can reinitialize and reuse after the application code is done with them. It is described in some detail in my book (did I mention the book?), in a part that Pearson has kindly made publicly available.

With such a cache in place, we only actually instantiate the number of objects needed to fill the cache, after that we safely recycle those same objects over and over again, at the cost of a few function calls. If objects are retained, they will not be reused.

Column stores, or structures of arrays

Another neat way of interpreting dematerialization is to store all the data in a columnar data format, a structure of arrays (SoA) instead of Array of Structures (AoS) organisation. (Thanks to Holgi for suggesting this).

For this we need a specific builder (MPWArraysBuilder, .h .m) that maintains a set of (mutable) arrays stored by key. When it receives a value, it looks up the appropriate array by key and adds the value to that array, as follows:


-(void)writeInteger:(long)number
{
    if ( _arrayMap && keyStr) {
        MPWIntArray *a=OBJECTFORSTRINGLENGTH(_arrayMap, keyStr, keyLen);
        [a addInteger:(int)number];
        keyStr=NULL;
    }
}

-(void)writeString:(id)aString
{
    if ( _arrayMap && keyStr) {
        NSMutableArray *a=OBJECTFORSTRINGLENGTH(_arrayMap, keyStr, keyLen);
        [a addObject:aString];
        keyStr=NULL;
    }
}

For integer values, this would be an MPWIntArray (.h .m) for strings a regular NSMutableArray.

This does even better, at 155 ms / 284 MB/s.

Other options

These are not the only options. For example, it turns out that the protocol connecting parser and builder was not specifically created for this purpose, it actually extends the Streaming protocol to handle disassembled hierarchies. So you can take a tree, pipe it through a pipeline and then accurately reassemble it on the other end.

The protocol is used in Polymorphic Write Streams to enable Standard Object Out shown at DLS '19, with an earlier version presented at Macoun 2018 (German):

Outlook

However, we are probably hitting diminishing returns at this point, certainly for a proof of concept. There is certainly some more fat to trim, some objc_msgSend()s to IMP-cache away, and going over most of the character input twice is probably something we could avoid.

Apart from further performance improvements, there are also minor details of correctness to take care of, for example handling the JSON escape characters in keys or properly handling hierarchy. These things are not particularly hard, and are handled for XML in the superclass, but do require a bit of thought and effort to complete.

There is also the question of hooking up to Swift in general (a simple attempt failed in getting the right methods), or Codable in particular. The latter would require a somewhat different approach from now: instead of instantiating the object and actively setting its properties, you need to create a temporary structure that you then pass to the object's decoder so it can decode itself. Again, the MAX superclass uses this approach, so it probably won't be too hard to do, with the main trickiness probably in reconciling that more hierarchical/recursive approach with the streaming required by the protocol.

I can (and probably will) also go into a little more analysis of the hows and whys of this approach. So maybe provide some feedback: what would interest you most? Are you interested in a production version of this? Or more extreme optimizations (the ones so far were fairly tame)?

Note

I can help not just Apple, but also you and your company/team with performance and agile coaching, workshops and consulting. Contact me at info at metaobject.com.

TOC

Somewhat Less Lethargic JSON Support for iOS/macOS, Part 1: The Status Quo
Somewhat Less Lethargic JSON Support for iOS/macOS, Part 2: Analysis
Somewhat Less Lethargic JSON Support for iOS/macOS, Part 3: Dematerialization
Equally Lethargic JSON Support for iOS/macOS, Part 4: Our Keys are Small but Legion
Less Lethargic JSON Support for iOS/macOS, Part 5: Cutting out the Middleman
Somewhat Faster JSON Support for iOS/macOS, Part 6: Cutting KVC out of the Loop
Faster JSON Support for iOS/macOS, Part 7: Polishing the Parser
Faster JSON Support for iOS/macOS, Part 8: Dematerialize All the Things!
Beyond Faster JSON Support for iOS/macOS, Part 9: CSV and SQLite

Friday, April 24, 2020

Faster JSON Support for iOS/macOS, Part 7: Polishing the Parser

A convenient setback

One thing that you may have noticed last time around was that we were getting the instance variable names from the class, but then also still manually setting the common keys manually. That's a bit of duplicated and needlessly manual effort, because the common keys are exactly those ivar names.

However, the two pieces of information are in different places, the ivar names in the builder and the common strings in the in the parse itself. One way of consolidating this information is by creating a convenience intializer for decoding to objects as follows:



-initWithClass:(Class)classToDecode
{
    self = [self initWithBuilder:[[[MPWObjectBuilder alloc] initWithClass:classToDecode] autorelease]];
    [self setFrequentStrings:(NSArray*)[[[classToDecode ivarNames] collect] substringFromIndex:1]];
    return self;
}

We still compute the ivar names twice, but that's not really such a big deal, so something we can fix later, just like the issue that we should probably be using property names instead of instance variable names that in the case of properties we have to post-process to get rid of the underscores added by ivar synthesis.

With that, the code to parse to objects simplifies to the following, very similar to what you would see in Swift with JSONDecoder.


-(void)decodeMPWDirect:(NSData*)json
{
    MPWMASONParser *parser=[[MPWMASONParser alloc] initWithClass:[TestClass class]];
    NSArray* objResult = [parser parsedData:json];
}

So, quickly verifying that performance is still the same (always do this!) and...oops! Performance dropped significantly, from 441ms to over 700ms. How could such an innocuous change lead to a 50% performance regression?

The profile shows that we are now spending significantly more time in MPWSmallStringTable's objectForKey: method, where it gets the bytes out of the NSString/CFString, but why that should be the case is a bit mysterious, since we changed virtually nothing.

A little further sleuthing revealed that the strings in question are now instances of NSTaggedPointerString, where previously they were instances of __NSCFConstantString. The latter has a pointer to its byte-oriented character orientation, which it can simply return, while the former cleverly encodes the characters in the pointer itself, so it first has to reconstruct that byte representation. The method of constructing that representation and computing the size of such a representation also appears to be fairly generic and slow via a stream.

This isn't really easy to solve, since the creation of NSTaggedPointerStrring instances is hardwired pretty deep in CoreFoundation with no way to disable this "optimization". Although it would be possible to create a new NSString subclass with a byte buffer, make sure to convert to that class before putting instances in the lookup table, that seems like a lot of work. Or we could just revert this convenience.

Damn the torpedoes and full speed ahead!

Alternatively, we really wanted to get rid of this whole process of packing character data into NSString instances just to immediately unpack them again, so let's leave the regression as is and do that instead.

Where previously the builder had a NSString *key instance vaiable, it now has a char *keyStr and a int keyLen. The string-handling case in the JSON parser is now split betweeen the key and the non-key casse, with the non-key case still doing the conversion, but the key-case directly sending the char* and length to the builder.


			case '"':
                parsestring( curptr , endptr, &stringstart, &curptr  );
				if ( curptr[1] == ':' ) {
                    [_builder writeKeyString:stringstart length:curptr-stringstart];
					curptr++;
					
				} else {
                    curstr = [self makeRetainedJSONStringStart:stringstart length:curptr-stringstart];
					[_builder writeString:curstr];
				}
                curptr++;
				break;

This means that at least temporarily, JSON escape handling is disabled for keys. It's straightforward to add back, makeRetainedJSONStringStart:length: does all its processing in a character buffer, only converting to a string object at the very end.


-(void)writeString:(NSString*)aString
{
    if ( keyStr ) {
        MPWValueAccessor *accesssor=OBJECTFORSTRINGLENGTH(self.accessorTable, keyStr, keyLen);
        [accesssor setValue:aString forTarget:*tos];
        keyStr=NULL;
    } else {
        [self pushObject:aString];
    }
}

If there is a key, we are in a dictionary, otherwise an array (or top-level). In the dictionary case, we can now fetch the ValueAccessor via the OBJECTFORSTRINGLENGTH() macro.

The results are encouraging: 299ms, or 147 MB/s.

The MPWPlistBuilder also needs to be adjusted: as it builds and NSDictionary and not an object, it actually needs the NSString key, but the parser no longer delivers those. So it just creates them on the fly:


-(NSString*)key
{
    NSString *key=nil;
    if ( keyStr) {
        if ( _commonStrings ) {
            key=OBJECTFORSTRINGLENGTH(_commonStrings, keyStr, keyLen);
        }
        if ( !key ) {
            key=[[[NSString alloc] initWithBytes:keyStr length:keyLen encoding:NSUTF8StringEncoding] autorelease];
        }
    }
    return key;
}

Surprisingly, this makes the dictionary parsing code slightly faster, bringing up to par with NSSJSSONSerialization at 421ms.

Eliminating NSNumber

Our use of NSNumber/CFNumber values is very similar to our use of NSString for keys: the parser wraps the parsed number in the object, the builder then unwraps it again.

Changing that, initially just for integers, is straightforward: add an integer-valued message to the builder protocol and implement it.


-(void)writeInteger:(long)number
{
    if ( keyStr ) {
        MPWValueAccessor *accesssor=OBJECTFORSTRINGLENGTH(_accessorTable, keyStr, keyLen);
        [accesssor setIntValue:number forTarget:*tos];
        keyStr=NULL;
    } else {
        [self pushObject:@(number)];
    }
}

The actual integer parsing code is not in MPWMASONParser but its superclasss, and as we don't want to touch that for now, let's just copy-paste that code, modifying it to return a C primitive type instead of an object.


-(long)longElementAtPtr:(const char*)start length:(long)len
{
    long val=0;
    int sign=1;
    const char *end=start+len;
    if ( start[0] =='-' ) {
        sign=-1;
        start++;
    } else if ( start[0]=='+' ) {
        start++;
    }
    while ( start < end && isdigit(*start)) {
        val=val*10+ (*start)-'0';
        start++;
    }
    val*=sign;
    return val;
}

I am sure there are better ways to turn a string into an int, but it will do for now. Similarly to the key/string distinction, we now special case integers.
                if ( isReal) {
                    number = [self realElement:numstart length:curptr-numstart];

                    [_builder writeString:number];
                } else {
                    long n=[self longElementAtPtr:numstart length:curptr-numstart];
                    [_builder writeInteger:n];
                }

Again, not pretty, but we can clean it up later.

Together with using direct instance variable access instead of properties to get to the accessorTable, this yields a very noticeable speed boost:

229 ms, or 195 MB/s.

Nice.

Discussion

What happened here? Just random hacking on the profile and replacing nice object-oriented programming with ugly but fast C?

Although there is obviously some truth in that, profiles were used and more C primitive types appeared, I would contend that what happened was a move away from objects, and particularly away from generic and expensive Foundation objects ("Foundation oriented programming"?) towards message oriented programming.

I'm sorry that I long ago coined the term "objects" for this topic because it gets many people to focus on the lesser idea.

The big idea is "messaging" -- that is what the kernal of Smalltalk/Squeak is all about (and it's something that was never quite completed in our Xerox PARC phase). The Japanese have a small word -- ma -- for "that which is in between" -- perhaps the nearest English equivalent is "interstitial". The key in making great and growable systems is much more to design how its modules communicate rather than what their internal properties and behaviors should be.

It turns out that message oriented programming (or should we call it Protocol Oriented Programming?) is where Objective-C shines: coarse-grained objects, implemented in C, that exchange messages, with the messages also as primitive as you can get away with. That was the idea, and when you follow that idea, Objective-C just hums, you get not just fast, but also flexible and architecturally nicely decoupled objects: elegance.

The combination of objects + primitive messages is very similar to another architecturally elegant and productive style: Unix pipes and filters. The components are in C and can have as rich an internal structure as you want, but they have to talk to each other via byte-streams. This can also be made very fast, and also prevents or at least reduces coupling between the components.

Another aspect is the tension between an API for use and an API for reuse, particularly within the constraints of call/return. When you get tasked with "Create a component + API for parsing JSON", something like NSJSONSerialization is something you almost have to come up with: feed it JSON, out comes parsed JSON. Nothing could be more convenient to use for "parsing JSON".

MPWMASONParser on the other hand is not convenient at all when viewed in isolation, but it's much more capable of being smoothly integrated into a larger processing chain. And most of the work that NSJSONSerialization did in the name of convenience is now just wasted, it doesn't make further processing any easier but sucks up enormous amounts of time.

Anyway, let's look at the current profile:

First, times are now small enough that high-resolution (100µs) sampling is now necessary to get meaningful results. Second, the NSNumber/CFNumber and NSString packing and unpacking is gone, with an even bigger chunk of the remaining time now going to object creation. objc_msgSend() is now starting to actually become noticeable, as is the (inefficient) character level parsing. The accessors of our test objects start to appear, if barely.

With the work we've done so far, we've improved speed around 5x from where we started, and at 195 MB/s are almost 20x faster than Swift's JSONDecoder.

Can we do better? Stay tuned.

Note

I can help not just Apple, but also you and your company/team with performance and agile coaching, workshops and consulting. Contact me at info at metaobject.com.

TOC

Somewhat Less Lethargic JSON Support for iOS/macOS, Part 1: The Status Quo
Somewhat Less Lethargic JSON Support for iOS/macOS, Part 2: Analysis
Somewhat Less Lethargic JSON Support for iOS/macOS, Part 3: Dematerialization
Equally Lethargic JSON Support for iOS/macOS, Part 4: Our Keys are Small but Legion
Less Lethargic JSON Support for iOS/macOS, Part 5: Cutting out the Middleman
Somewhat Faster JSON Support for iOS/macOS, Part 6: Cutting KVC out of the Loop
Faster JSON Support for iOS/macOS, Part 7: Polishing the Parser
Faster JSON Support for iOS/macOS, Part 8: Dematerialize All the Things!
Beyond Faster JSON Support for iOS/macOS, Part 9: CSV and SQLite