Sunday, June 21, 2020

Beyond Faster JSON Support for iOS/macOS, Part 9: CSV and SQLite

When looking at the MPWPlistStreaming protocol that I've been using for my JSON parsing series, one thing that was probably noticeable is that it isn't particularly JSON-focused. In fact, it wasn't even initially designed for parsing, but for generating.

So could we use this for other de-serialization tasks? Glad you asked!

CSV parsing

One of the examples in my performance book involves parsing Comma Separated Values quickly, within the context of getting the time to convert a 139Mb GTFS file to something usable on the phone down from 20 minutes using using CoreData/SQLite to slightly less than a second using custom in-memory data structures that are also several orders of magnitude faster to query on-device.

The original project's CVS parser took around 18 seconds, which wasn't a significant part of the 20 minutes, but when the rest only took a couple of hundred milliseconds, it was time to make that part faster as well. The result, slightly generalized, is MPWDelimitedTable ( .h .m ).

The basic interface is block-based, with the block being called for every row in the table, called with a dictionary composed of the header row as keys and the contents of the row as values.


-(void)do:(void(^)(NSDictionary* theDict, int anIndex))block;

Adapting this to the MPWPlistStreaming protocol is straightforward:
-(void)writeOnBuilder:(id )builder
{
    [builder beginArray];
    [self do:^(NSDictionary* theDict, int anIndex){
        [builder beginDictionary];
        for (NSString *key in self.headerKeys) {
            [builder writeObject:theDict[key] forKey:key];
        }
        [builder endDictionary];
    }];
    [builder endArray];
}

This is a quick-and-dirty implementation based on the existing API that is clearly sub-optimal: the API we call first constructs a dictionary from the row and the header keys and then we iterate over it. However, it works with our existing set of builders and doesn't build an in-memory representation of the entire CSV.

It will also be relatively straightforward to invert this API usage, modifying the low-level API to use MPWPlistStreaming and then creating a higher-level block- and dictionay-based API on top of that, in a way that will also work with other MPWPlistStreaming clients.

SQLite

Another tabular data format is SQL data bases. On macOS/iOS, one very common database is SQLite, usually accessed via CoreData or the excellent and much more light-weight fmdb.

Having used fmdb myself before, and bing quite delighted with it, my first impulse was to write a MPWPlistStreaming adapter for it, but after looking at the code a bit more closely, it seemed that it was doing quite a bit that I would not need for MPWPlistStreaming.

I also think I saw the same trade-off between a convenient and slow convenience based on NSDictionary and a much more complex but potentially faster API based on pulling individual type values.

So Instead I decided to try and do something ultra simple that sits directly on top of the SQLite C-API, and the implementation is really quite simple and compact:


@interface MPWStreamQLite()

@property (nonatomic, strong) NSString *databasePath;

@end

@implementation MPWStreamQLite
{
    sqlite3 *db;
}

-(instancetype)initWithPath:(NSString*)newpath
{
    self=[super init];
    self.databasePath = newpath;
    return self;
}

-(int)exec:(NSString*)sql
{
    sqlite3_stmt *res;
    int rc = sqlite3_prepare_v2(db, [sql UTF8String], -1, &res, 0);
    @autoreleasepool {
        [self.builder beginArray];
        int step;
        int numCols=sqlite3_column_count(res);
        NSString* keys[numCols];
        for (int i=0;i < numCols; i++) {
            keys[i]=@(sqlite3_column_name(res, i));
        }
        while ( SQLITE_ROW == (step = sqlite3_step(res))) {
            @autoreleasepool {
                [self.builder beginDictionary];
                for (int i=0; i < numCols; i++) {
                    const char *text=(const char*)sqlite3_column_text(res, i);
                    if (text) {
                        [self.builder writeObject:@(text) forKey:keys[i]];
                    }
                }
                [self.builder endDictionary];
            }
        }
        sqlite3_finalize(res);
        [self.builder endArray];
    }
    return rc;
}

-(int)open
{
    return sqlite3_open([self.databasePath UTF8String], &db);
}

-(void)close
{
    if (db) {
        sqlite3_close(db);
        db=NULL;
    }
}


Of course, this doesn't do a lot, chiefly it only reads, no updates, inserts or deletes. However, the code is striking in its brevity and simplicity, while at the same time being both convenient and fast, though with still some room for improvement.

In my experience, you tend to not get all three of these properties at the same time: code that is simple and convenient tends to be slow, code that is convenient and fast tends to be rather tricky and code that's simple and fast tends to be inconvenient to use.

How easy to use is it? The following code turns a table into an array of dictionaries:


#import <MPWFoundation/MPWFoundation.h>

int main(int argc, char* argv[]) {
   MPWStreamQLite *db=[[MPWStreamQLite alloc] initWithPath:@"chinook.db"];
   db.builder = [MPWPListBuilder new];
   if( [db open] == 0 ) {
       [db exec:@"select * from artists;"];
       NSLog(@"results: %@",[db.builder result]);
       [db close];
   } else {
       NSLog(@"Can't open database: %s\n", [db error]);
   }
   return(0);
}

This is pretty good, but probably roughly par for the course for returning a generic data structure such as array of dictionaries, which is not going to be particularly efficient. (One of my first clues that CoreData's predecessor EOF wasn't particularly fast was when I read that fetching raw dictionaries was an optimization, much faster than fetching objects.)

What if we want to get objects instead? Easy, just replace the MPWPListBuilder with an MPWObjectBuilder, parametrized with the class to create. Well, and define the class, but presumably you already havee that if the task is to convert to objects of that class. And it cold obviously also be automated.



#import <MPWFoundation/MPWFoundation.h>

@interface Artist : NSObject { }

@property (assign) long ArtistId;
@property (nonatomic,strong) NSString *Name;

@end

@implementation Artist

-(NSString*)description
{
	return [NSString stringWithFormat:@"<%@:%p id: %ld name: %@>",[self class],self,self.ArtistId,self.Name];
}

@end

int main(int argc, char* argv[]) {
   MPWStreamQLite *db=[[MPWStreamQLite alloc] initWithPath:@"chinook.db"];
   db.builder = [[MPWObjectBuilder alloc] initWithClass:[Artist class]];
   if( [db open] == 0) {
       [db exec:@"select * from artists"];
       NSLog(@"results: %@",[db.builder result]);
       [db close];
   } else {
       NSLog(@"Can't open database: %s\n", [db error]);
   }
   return(0);
}

Note that this does not generate a plist representation as an intermediate step, it goes straight from database result sets to objects. The generic intermediate "format" is the MPWPlistStreaming protocol, which is a dematerialized representation, both plist and objects are peers.

Sunday, June 14, 2020

The Curious Case of Swift's Adoption of Smalltalk Keyword Syntax

I was really surprised to learn that Swift recently adopted Smalltalk keyword syntax: [Accepted] SE-0279: Multiple Trailing Closures. That is: a keyword terminated by a colon, followed by an argument and without any surrounding braces.

The mind boggles.

A little.

Of course, Swift wouldn't be Swift if this weren't a special case of a special case, specifically the case of multiple trailing closures, which is a special case of trailing closures, which are weird and special-casey enough by themselves. Below is an example:


UIView.animate(withDuration: 0.3) {
  self.view.alpha = 0
} completion: { _ in
  self.view.removeFromSuperview()
}

Note how the arguments to animate() would seem to terminate at the closing parenthesis, but that's actually not the case. The curly braces after the closing paren start a closure that is actually also an argument to the method, a so-called trailing closure. I have a little bit of sympathy for this construct, because closures inside of the parentheses look really, really awkward. (Of course, all params apart from a sole x inside f(x) look awkward, but let's not quibble. For now.).

Another thing this enables is methods that reasonably resemble control structures, which I heard is a really great idea.

The problem is that sometimes you have more than one closure argument, and then just stacking them up behind what appears to be end of the function/method call gets really, really awkward, and you can't tell which block is which argument, because the trailing closure doesn't get a keyword.

Well, now it does. And we now have 4 different method syntaxes in one!

  1. Traditional C/Pascal/C++/Java function call syntax x.f()
  2. The already weird-ish addition of Smalltalk/Objective-C keywords inside the f(x) syntax: f(arg:x)
  3. Original trailing-closure syntax, which is just its own thing, for the first closure
  4. Smalltalk non-brackted keyword syntax for the 2nd and subsequent closures.
That is impressive, in a scary kind of way.
Swift is a crescendo of special cases stopping just short of the general; the result is complexity in the semantics, complexity in the behaviour (i.e. bugs), and complexity in use (i.e. workarounds).
In understand that this proposal was quite controversial, with heated discussion between opponents and proponents. I understand and sympathize with both sides. On the one hand, this is markedly better than alternatives. On the other hand it is a special case of a special case that is difficult to justify as an addition of all that is already there.

Special cases beget special cases beget special cases.

Of course the answer was always there: Smalltalk keyword syntax is not just the only reasonable solution in this case, it also solves all the other cases. It is the general solution. Here's how this could look in Objective-Smalltalk (which uses curly braces instead for closures instead of Smalltalk-80's square brackets):


UIView animate:{ self.view.alpha ← 0. } withDuration:0.3 completion:{ self view removeFromSuperview. }.

No special cases, every argument is labeled, no syntax mush of brackets inside parentheses etc. And yes, this also handles user-defined control structures, to:do: is just a method on NSNumber:


1 to:10 do:{:i | stdout println:"I will not introduce {i} special cases willy nilly.".}.

And since keywords naturally go between their arguments, there is no need for "operators", as a very different and special syntax form. You just allow some "binary" keywords to look a little different, so instead of 2 multiply:3 you can write 2 * 3. And when you have 2 raisedTo:3 instead of pow(2,3) (with the signature: func pow(_ x: Decimal, _ y: Int) -> Decimal), do you really neeed to go to the trouble of defining an "operator"?

Or Swift's a as b, another special kind of syntax. How about a as:b? (Yes I know there are details, but those are ... details.). And so on and so forth.

But of course, it's too late now. When I chose Smalltalk as the base syntax for the language that has turned into Objective-Smalltalk, it wasn't just because I just like it or have gotten used to it via Objective-C. Smalltalk's syntax is surprisingly flexible and general, Smalltalk APIs look a lot like DSLs, without any of the tooling or other overheads.

And that's the frustrating part: this stuff was and is available and well-known. At least if you bother to look and/or ask. But instead, we just choose these things willy-nilly and everybody has to suffer the consequences.

UPDATE:

I guess what I am trying to get at is that if you'd thought things through just a little bit, you could have had almost the entire syntax of your language for the cost (complexity, implementation size and brittleness, cognitive load, etc.) of this one special case of a special case. And it would have been overall better to boot.

Monday, June 1, 2020

MPWTest Only Tests Frameworks

It should be noted, if it wasn't obvious, that MPWTest is opinionated software, meaning it achieves some of its smoothness by gleefully embracing constraints that some might view as potentially crippling limitations.

Maybe the biggest of these constraints, mentioned in the previous post, is that MPWTest only tests frameworks. This means that the following workflow is not supported out of the box:

The point being that this is a workflow I not just somewhat indifferently do not want, but rather emphatically and actively want to avoid. Tests that are run (only?) when launching the app are application tests. My perspective is that unit tests are an integral part of the class. This may seem a subtle distinction, but subtle differences in something you do constantly can have huge impacts. "Steter Tropfen höhlt den Stein."

Another aspect is that launching the app for testing as a permanent and fixed part of your build process seems highly annoying at best. Linker finishes, app pops up, runs for a couple of seconds, shuts down again. I don't see that as viable. For testing to be integral and pervasive, it has to be invisible when the tests succeed.

The testing pyramid is helpful here: my contention is that you want to be at the bottom of that pyramid, ideally all of the time. Realistically, you're probably not going to get there, but you should push really, really hard, even making sacrifices that appear to be unreasonable to achieve that goal.

Framework-oriented programming

Only testing frameworks begs the question as to how to test those parts of the application not in frameworks. For me the answer is simple: there isn't any production code outside of frameworks.

None. Not the UI, not the application delegate. Only the auto-generated main().

The benefits of this approach are plentiful, the effort minimal. And if you think this is an, er, eccentric position to take, the program you almost certainly use to create apps for iOS/macOS etc. takes the same eccentric position: Xcode's main executable is 45K in size and only contains a main() function and some Swift boilerplate.

If all your code is in frameworks, only testing frameworks is not a problem. That may seem like a somewhat extreme case of sour grapes, with the arbitrary limitations of a one-off unit testing framework driving major architectural decisions, but the causality is the other way around: I embraced framework-oriented programming before and independently of MPWTest.

iOS

Another issue is iOS. Running a command-line tool that dynamically loads and tests frameworks is at least tricky and may be impossible, so that approach currently does not work. My current approach is that I view on-device and on-simulator tests as higher-up in the testing hierarchy: they are more costly, less numerous and run less frequently.

The vast majority of code lives in cross-platform frameworks (see: Ports and Adapters) and is developed and tested primarily on macOS. I have found this to be much faster than using the simulator or a device in day-to-day programming, and have used this "mac-first" technique even on projects where we were using XCTest.

Although not testing on the target platform may be seen as a problem, I have found discrepancies to be between exceedingly rare and non-existent, with "normal" code trending towards the latter. One of the few exceptions in the not-quite-so-normal code that I sometimes create was the change of calling conventions on arm64, which meant that plain method pointers (IMPs) no longer worked, but had to be cast to the "correct" pointer type, only on device. Neither macOS nor the simulator would show the problem.

For that purpose, I hacked together a small iOS app that runs the tests specified in a plist in the app bundle. There is almost certainly a better way to handle this, but I haven't had the cycles or motivation to look into it.

How to approximate

So you can't or don't want to adopt MPWTest. That doesn't mean you can't get at least some of the benefits of the approach. As a start, instead of using Cmd-B in Xcode to build, just use Cmd-U instead. That's what I did when working on Wunderlist, where we used XCTest.

Second, adopt framework-oriented programming and the Ports and Adapters style as much as possible. Put all your code in frameworks, and as much as possible in cross-platform frameworks that you can test/run on macOS, and even if you are developing exclusively for iOS, create a macOS target for that framework. This makes using Cmd-U to build much less painful.

Third, adhere to a strict 1:1 mapping between production classes and test classes, and place your test classes in the same file as the class they are testing.

My practical experience with both JUnit and XCTest on medium-sized projects does not square with the assertion that the difference is not that big: you still have to create these additional classes, they have to communicate with the class under tests (self in MPWTest), you have to track changes etc. And of course, you have to know to configure und use the framework differently from the way it was built, intended and documented. And what I've seen of OCUnit use was that the tests were not co-located with the class, but in a separate part of the project.

A final note is that the trick of interchangeably using the class as the test fixture is only really possible in a language like Objective-C where classes are first class objects. It simply wouldn't be possible in Java. This is how the class can test itself, and the tests become an integral part of the class, rather than something that's added somewhere else.