Showing posts with label Polymorphic Identifiers. Show all posts
Showing posts with label Polymorphic Identifiers. Show all posts

Monday, June 20, 2022

Blackbird: A reference architecture for local-first connected mobile apps

Wow, what a mouthful! Although this architecture has featured in a number of my other writings, I haven't really described it in detail by itself. Which is a shame, because I think it works really well and is quite simple, a case of Sophisticated Simplicity.

Why a reference architecture?

The motivation for creating and now presenting this reference architecture is that the way we build connected mobile apps is broken, and none of the proposed solutions appear to help. How are they broken? They are overly complex, require way too much code, perform poorly and are unreliable.

Very broadly speaking, these problems can be traced to the misuse of procedural abstraction for a problem-space that is broadly state-based, and can be solved by adapting a state-based architectural style such as in-process REST and combining it with well-known styles such as MVC.

More specifically, MVC has been misapplied by combining UI updates with the model updates, a practice that becomes especially egregious with asynchronous call-backs. In addition, data is pushed to the UI, rather than having the UI pull data when and as needed. Asynchronous code is modelled using call/return and call-backs, leading to call-back hell, needless and arduous transformation of any dependent code into asynchronous code (see "what color is your function") that is also much harder to read, discouraging appropriate abstractions.

Backend communication is also an issue, with newer async/await implementations not really being much of an improvement over callback-based ones, and arguably worse in terms of actual readability. (They seem readable, but what actually happens is different  enough that the simplicity is deceptive).

Overview

The overall architecture has four fundamental components:
  1. The model
  2. The UI
  3. The backend
  4. The persistence
The main objective of the architecture is to keep these components in sync with each other, so the whole thing somewhat resembles a control loop architecture: something disturbs the system, for example the user did something in the UI, and the system responds by re-establishing equilibrium.

The model is the central component, it connects/coordinates all the pieces and is also the only one directly connected to more than one piece. In keeping with hexagonal architecture, the model is also supposed to be the only place with significant logic, the remainder of the system should be as minimal, transparent and dumb as possible.

memory-model := persistence.
persistence  |= memory-model.
ui          =|= memory-model. 
backend     =|= memory-model.

Graphically:

Elements

Blackbird depends crucially on a number of architectural elements: first are stores of the in-process REST architectural style. These can be thought of as in-process HTTP servers (without the HTTP, of course) or composable dictionaries. The core store protocol implements the GET, PUT and DELETE verbs as messages.

The role of URLs in REST is taken by Polymorphic Identifiers. These are objects that can reference identify values in the store, but are not direct pointers. For example, they need to be a able to reference objects that aren't there yet.

Polymorphic Identifiers can be application-specific, for example they might consist just of a numeric id,

MVC

For me, the key part of the MVC architectural style is the decoupling of input processing and resultant output processing. That is, under MVC, the view (or a controller) make some change to the model and then processing stops. At some undefined later time (could be synchronous, but does not have to be) the Model informs the UI that it has changed using some kind of notification mechanism.

In Smalltalk MVC, this is a dependents list maintained in the model that interested views register with. All these views are then sent a #changed message when the model has changed. In Cocoa, this can be accomplished using NSNotificationCenter, but really any kind of broadcast mechanism will do.

It is then the views' responsibility to update themselves by interrogating the model.

For views, Cocoa largely automates this: on receipt of the notification, the view just needs invalidate itself, the system then automatically schedules it for redrawing the next time through the event loop.

The reason the decoupling is important to maintain is that the update notification can come for any other reason, including a different user interaction, a backend request completing or even some sort of notification or push event coming in remotely.

With the decoupled M-V update mechanism, all these different kinds of events are handled identically, and thus the UI only ever needs to deal with the local model. The UI is therefore almost entirely decoupled from network communications, we thus have a local-first application that is also largely testable locally.

Blackbird refines the MVC view update mechanism by adding the polymorphic identifier of the modified item in question and placing those PIs in a queue. The queue decouples model and view even more than in the basic MVC model, for example it become fairly trivial to make the queue writable from any thread, but empty only onto the main thread for view updates. In addition, providing update notifications is no longer synchronous, the updater just writes an entry into the queue and can then continue, it doesn't wait for the UI to finish its update.

Decoupling via a queue in this way is almost sufficient for making sure that high-speed model updates don't overwhelm the UI or slow down the model. Both these performance problems are fairly rampant, as an example of the first, the Microsoft Office installer saturates both CPUs on a dual core machine just painting its progress bar, because it massively overdraws.

An example of the second was one of the real performance puzzlers of my career: an installer that was extremely slow, despite both CPU and disk being mostly idle. The problem turned out to be that the developers of that installer not only insisted on displaying every single file name the installer was writing (bad enough), but also flushing the window to screen to make sure the user got a chance to see it (worse). This then interacted with a behavior of Apple's CoreGraphics, which disallows screen flushes at a rate greater than the screen refresh rate, and will simply throttle such requests. You really want to decouple your UI from your model updates and let the UI process updates at its pace.

Having polymorphic identifiers in the queue makes it possible for the UI to catch up on its own terms, and also to remove updates that are no longer relevant, for example discarding duplicate updates of the same element.

The polymorphic identifier can also be used by views in order to determine whether they need to update themselves, by matching against the polymorphic identifier they are currently handling.

Backend communication

Almost every REST backend communication code I have seen in mobile applications has created "convenient" cover methods for every operation of every endpoint accessed by the application, possibly automatically generated.

This ignores the fact that REST only has a few verbs, combined with a great number of identifiers (URLs). In Blackbird, there is a single channel for backend communication: a queue that takes a polymorphic identifier and an http verb. The polymorphic identifier is translated to a URL of the target backend system, the resulting request executed and when the result returns it is placed in the central store using the provided polymorphic identifier.

After the item has been stored, an MVC notification with the polymorphic identifier in question is enqueued as per above.

The queue for backend operations is essentially the same one we described for model-view communication above, for example also with the ability to deduplicate requests correctly so only the final version of an object gets sent if there are multiple updates. The remainder of the processing is performed in pipes-and-filters architectural style using polymorphic write streams.

If the backend needs to communicate with the client, it can send URLs via a socket or other mechanism that tells the client to pull that data via its normal request channels, implementing the same pull-constraint as in the rest of the system.

One aspect of this part of the architecture is that backend requests are reified and explicit, rather than implicitly encoded on the call-stack and its potentially asynchronous continuations. This means it is straightforward for the UI to give the user appropriate feedback for communication failures on the slow or disrupted network connections that are the norm on mobile networks, as well as avoid accidental duplicate requests.

Despite this extra visibility and introspection, the code required to implement backend communications is drastically reduced. Last not least, the code is isolated: network code can operate independently of the UI just as well as the UI can operate independently of the network code.

Persistence

Persistence is handled by stacked stores (storage combinators).

The application is hooked up to the top of the storage stack, the CachingStore, which looks to the application exactly like the DictStore (an in-memory store). If a read request cannot be found in the cache, the data is instead read from disk, converted from JSON by a mapping store.

For testing the rest of the app (rather than the storage stack), it is perfectly fine to just use the in-memory store instead of the disk store, as it has the same interface and behaves the same, except being faster and non-persistent.

Writes use the same asynchronous queues as the rest of the system, with the writer getting the polymorphic identifiers of objects to write and then retrieving the relevant object(s) from the in-memory store before persisting. Since they use the same mechanism, they also benefit from the same uniquing properties, so when the I/O subsystem gets overloaded it will adapt by dropping redundant writes.

Consequences

With the Blackbird reference architecture, we not only replace complex, bulky code with much less and much simpler code, we also get to reuse that same code in all parts of the system while making the pieces of the system highly independent of each other and optimising performance.

In addition, the combination of REST-like stores that can be composed with constraint- and event-based communication patterns makes the architecture highly decoupled. In essence it allows the kind of decoupling we see in well-implemented microservices architectures, but on mobile apps without having to run multiple processes (which is often not allowed).

Sunday, May 9, 2021

Talking to pins

The last few weeks, I spent a little time getting Objective-S working well on the Raspberry Pi, specifically my Pi400. It's a really wonderful little machine, and the form factor and price remind me very much of the early personal computers.

What's missing, IMHO, is an experience akin to the early BASICs. And I really mean "akin", not a nostalgia project, but recovering a real quality that has been lost: not really "simplicity", more "straightforwardness".

Of course, one of the really cool thing about the Pi is its GPIO interface that lets you do all sorts of electronics experiments, and I hear that the equivalent of "Hello World" for the Raspi is making an LED blink.


import RPi.GPIO as GPIO 
from time import sleep 
GPIO.setwarnings(False) 
 
GPIO.setmode(GPIO.BCM) 
GPIO.setup(17, GPIO.OUT) 
 
while True: 
    GPIO.output(17, True) 
    sleep(1) 
    GPIO.output(17, False) 
    sleep(1) 

Hmm. That's a a lot of semantic noise for something so conceptually simple. All we want to is set the value of a pin. As soon as I saw this, I knew it would be ideal for Polymorphic Identifiers, because a pin is the ultimate state, and PIs and their stores are made for abstracting over state.

Of course, I first had to to get Objective-S running on the Pi, which meant getting GNUstep to run. While there is a wonderful set of install scripts, the one for the Raspi only worked with an ancient clang version and libobjc 1.9. Alas, that version has some bugs on the Raspi, for example with the imp_implentationWithBlock() runtime function that Objective-S uses to define methods.

Long story short, after learning about GNUstep installs and waiting for the wonderful David Chisnall to remove some obsolete 32 bit exception-version detection code from libobjc, we now have a script that installs current GNUstep with a reasonably current clang: https://github.com/plaurent/gnustep-build/tree/master/raspbian-10-clang-9.0-runtime-2.1-ARM. With that in hand, a few bug fixes in MPWFoundation and Objective-S, I could add a really rudimentary Store that manages talking to the pins. And this allows me to write the following in an interactive shell to drive the customary GPIO pin 17 that I connected to the LED via resistor:


gpio:17 ← 1.

Now that's what I am talking about!

Of course, we're supposed to make it blink, not just turn it on. We could use the same looping approach as the Python script, or convenience methods like the ones provided, but the breadboard and pins make me think of wanting to connect components to do the job instead.

So let's connect some components, software architecture style! The following script creates an instance of a Blinker object (using an object literal), which emits alternating ones and zeros and connects it to the pin.


blinker ← #Blinker{ #seconds: 1 }. 
blinker → ref:gpio:17. 
blinker run.
gpio:17 ← 0.

Once connected it tells the blinker to start running, which creates an NSTimer adds it to the current runloop and then runs the run loop. That run is interruptible, so Ctrl-C breaks and runs the cleanup code.

What about setting up the pin for output? Happens automatically when you first output to it, but I will add code so you can do it manually.

Where does the Blinker come from? That's actually an object-template based on an MPWFixedValueSource.


object Blinker : #MPWFixedValueSource{ #values: #(0,1) }

You can, of course, hook up a fixed-value source to any kind of stream.

While getting here took a lot of work, and resulted in me (re-)learning a lot about GNUstep, the result, even this intermediate one, is completely worth it and makes me very happy. This stuff really works even better than I thought it would.

Thursday, May 14, 2020

Embedding Objective-Smalltalk

Ilja just asked for embedded scripting-language suggestions, presumably for his GarageSale E-Bay listings manager, and so of course I suggested Objective-Smalltalk.

Unironically :-)

This is a bit scary. On the one hand, Objective-Smalltalk has been in use in my own applications for well over a decade and runs the http://objective.st site, both without a hitch and the latter shrugging of a Hacker News "Hug of Death" without even the hint of glitch. On the other hand, well, it's scary.

As for usability, you include two frameworks in your application bundle, and the code to start up and interact with the interpreter or interpreters is also fairly minimal, not least of all because I've been doing so in quite a number of applications now, so inconvenience gets whittled away over time.

In terms of suitability, I of course can't answer that except for saying it is absolutely the best ever. I can also add that another macOS embeddable Smalltalk, FScript, was used successfully in a number of products. Anyway, Ilja was kind enough to at least pretend to take my suggestion seriously, and responded with the following question as to how code would look in practice:

I am only too happy to answer that question, but the answer is a bit beyond the scope of twitter, hence this blog post.

First, we can keep things very close to the original, just replacing the loop with a -select: and of course changing the syntax to Objective-Smalltalk.


runningListings := context getAllRunningListings.
listingsToRelist := runningListings select:{ :listing |
    listing daysRunning > 30 and: listing watchers < 3 .
}
ebay endListings:listingsToRelist ended:{ :ended | 
     ebay relistListings:ended relisted: { :relisted |
         ui alert:"Relisted: {relisted}".
     }
}

Note the use of "and:" instead of "&&" and the general reduction of sigils. Although I personally don't like the pyramid of doom, the keyword message syntax makes it significantly less odious.

So much in fact, that Swift recently adopted open keyword syntax for the special case of multiple trailing closures. Of course the mind boggles a bit, but that's a topic for a separate post.

So how else can we simplify? Well, the context seems a little unspecific, and getAllRunningListings a bit specialized, it probably has lots of friends that result from mapping a website with lots of resources onto a procedural interface.

Let's instead use URLs for this, so an ebay: scheme that encapsulates the resources that EBay lets us play with.


listingsToRelist := ebay:listings/running select:{ :listing |
    listing daysRunning > 30 and: listing watchers < 3 .
}
ebay endListings:listingsToRelist ended:{ :ended | 
     ebay relistListings:ended relisted: { :relisted |
         ui alert:"Relisted {relisted} listings".
     }
}

I have to admit I also don't really understand the use of callbacks in the relisting process, as we are waiting for everything to complete before moving to the next stage. So let's just implement this as plain sequential code:
listingsToRelist := ebay:listings/running select:{ :listing |
    listing daysRunning > 30 and: listing watchers < 3 .
}
ended := ebay endListings:listingsToRelist.
relisted := ebay relistListings:ended.
ui alert:"Relisted: {relisted}".

(In scripting contexts, Objective-Smalltalk currently allows defining variables by assigning to them. This can be turned off.)

However, it seems odd and a bit non-OO that the listings shouldn't know how to do stuff, so how about just having relist and end be methods on the listings themselves? That way the code simplifies to the following:


listingsToRelist := ebay:listings/running select:{ :listing |
    listing daysRunning > 30 and: listing watchers < 3 .
}
ended := listingsToRelist collect end.
relisted := ended collect relist.
ui alert:"Relisted: {relisted}".

If batch operations are typical, it probably makes sense to have a listings collection that understands about those operations:
listingsToRelist := ebay:listings/running select:{ :listing |
    listing daysRunning > 30 and: listing watchers < 3 .
}
ended := listingsToRelist end.
relisted := ended relist.
ui alert:"Relisted: {relisted}".

Here I am assuming that ending and relisting can fail and therefore these operations need to return the listings that succeeded.

Oh, and you might want to give that predicate a name, which then makes it possible to replace the last gobbledygook with a clean, "do what I mean" Higher Order Message. Oh, and since we've had Unicode for a while now, you can also use '←' for assignment, if you want.


extension EBayListing {
  -<bool>shouldRelist {
      self daysRunning > 30 and: self watchers < 3.
  }
}

listingsToRelist ← ebay:listings/running select shouldRelist.
ended ← listingsToRelist end.
relisted ← ended relist.
ui alert:"Relisted: {relisted}".

To my obviously completely unbiased eyes, this looks pretty close to a high-level, pseudocode specification of the actions to be taken, except that it is executable.

This is a nice step-by-step script, but with everything so compact now, we can get rid of the temporary variables (assuming the extension) and make it a one-liner (plus the alert):


relisted ← ebay:listings/running select shouldRelist end relist.
ui alert:"Relisted: {relisted}".

It should be noted that the one-liner came to be not as a result of sacrificing readability in order to maximally compress the code, but rather as an indirect result of improving readability by removing the cruft that's not really part of the problem being solved.

Although not needed in this case (the precedence rules of unary message sends make things unambiguous) some pipe separators may make things a bit more clear.


relisted ← ebay:listings/running select shouldRelist | end | relist.
ui alert:"Relisted: {relisted}".

Whether you prefer the one-liner or the step-by-step is probably a matter of taste.

Monday, April 20, 2020

Somewhat Faster JSON Support for iOS/macOS, Part 6: Cutting KVC out of the Loop

Last time, we actually made some significant headway by taking advantage of the dematerialisation of the plist intermediate representation. So instead of first producing an array of dictionaries, we went directly from JSON to the final object representation.

This got us down from around 1.1 seconds to a little over 600 milliseconds.

It was accomplished by using the Key Value Coding method setValue:forKey: to directly set the attributes of the objects from the parsed JSON. Oh, and instantiating those objects in the first place, instead of dictionaries.

That this should be so much faster than most other methods, for example beating Swift's JSONDecoder() by a cool 7x, is a little surprising, given that KVC is, as I mentioned in the first article of the series, the slowest mechanism for getting data in and out of objcets short of deliberate Rube Goldber Mechanisms.

What is KVC and why is it slow?

Key Value Coding was a core part of NeXT's Enterprise Object Framework, introduced in 1994.
Key-value coding is a data access mechanism in which the properties of an object are accessed indirectly by key or name, rather than directly as fields or by invocation of accessor methods. It is used throughout Enterprise Objects but is perhaps most useful to you when accessing data in relationships between enterprise objects.

Key-value coding enables the use of keypaths to traverse relationships. For example, if a Person entity has a relationship called toPhoto whose destination entity (called PersonPhoto) contains an attribute called photo, you could access that data by traversing the keypath toPhoto.photo from within a Person object.

Keypaths are just one way key-value coding is an invaluable feature of Enterprise Objects. In general, though, it is most useful in providing a consistent way to access an object's data. Rather than needing to know if an object's data members have accessor methods, what the names of those accessor methods are, or if the data is accessible through fields, all you need to know are the keys that represent an object’s data. Key-value coding automatically finds the data, regardless of how the object provides its data. In this context, key-value coding satisfies the classic design pattern principle to “encapsulate the things that varies.”

It still is an extremely powerful programming technique that lets us write algorithms that work generically with any object properties, and is currently the basis for CoreData, AppleScript support, Key Value Observing and Bindings. (Though I am somewhat skeptical of some of these, not least for performance reasons, see The Siren Call of KVO and (Cocoa) Bindings). It was also part of the inspiration for Polymorphic Identifiers.

The core of KVC are the valueForKey: and setValue:forKey: messages, which have default implementations in NSObject. These default implementations take the NSString key, derive an accessor message from that key and then send the message, either setting or returning a value. If the value that the underlying message takes/returns is a non-object type, then KVC wraps/unwraps as necessary.

If this sounds expensive, then that's because it is. To derive the set accessor from the key, the first character of the key has to be capitalized, the the string "set" prepended and the string converted to an Objective-C selector (SEL). In theory, this has to be done on every call to one of the KVC methods, and it has to be done with NSString objects, which do a fantastic job of representing human-visible text, but are a bit heavy-weight for low-level work.

Doing the full computation on every invocation would be way too expensive, so Apple caches some of the intermediate results. As there is no obvious place to put those intermediate results, they are placed in global hash tables, keyed by class and property/key name. However, even those lookups are still significantly more expensive than the final set or get property accesss, and we have to do multiple lookups. Since theses tables have to be global, locking is also required.

ValueAccessor

All this expense could be avoided if we had a custom object to mediate the access, rather than a naked NSString. That object could store those computed values, and then provide fast and generic access to arbitrary properties. Enter MPWValueAccesssor (.h .m).

A word of warning: unlike MPWStringtable, MPWValueAccesssor is mostly experimental code. It does have tests and largely works, but it is incomplete in many ways and also contains a bunch of extra and probably extraneous ideas. It is sufficient for our current purpose.

The core of this class is the AccessPathComponent struct.


typedef struct {
    Class   targetClass;
    int     targetOffset;
    SEL     getSelector,putSelector;
    IMP0    getIMP;
    IMP1    putIMP;
    id      additionalArg;
    char    objcType;
} AccessPathComponent;

This struct contains a number of different ways of getting/setting the data:
  1. the integer offset into the object where the ivar is located
  2. a pair of Objective-C selectors/message names, one for getting, one for setting.
  3. a pair of function pointers to the Objective-C methods that the respective selectors resolve to
  4. the additional arg is the key, to be used for keyed access
The getIMP and putImp are initialized to objc_msgSend(), so they can always be used. If we bind the ValueAccessor to a class, those function pointers get resolved to the actual getter/setter methods. In addition the objcType gets set to the type of the instance variable, so we can do automatic conversions like KVC. (This was some code I actually had to add between the last instalment and the current one.)

The key takeaway is that all the string processing and lookup that KVC needs to do on every call is done once during initialization, after that it's just a few messages and/or pre-resolved function calls.

Hooking up the ValueAccessor

Adapting the MPWObjectBuilder (.h .m) to use MPWValueAccessor was much easier than I had expected. Thee following shows the changes made:
@property (nonatomic, strong) MPWSmallStringTable *accessorTable;

...

-(void)setupAcceessors:(Class)theClass
{
    NSArray *ivars=[theClass ivarNames];
    ivars=[[ivars collect] substringFromIndex:1];
    NSMutableArray *accessors=[NSMutableArray arrayWithCapacity:ivars.count];
    for (NSString *ivar in ivars) {
        MPWValueAccessor *accessor=[MPWValueAccessor valueForName:ivar];
        [accessor bindToClass:theClass];
        [accessors addObject:accessor];
    }
    MPWSmallStringTable *table=[[[MPWSmallStringTable alloc] initWithKeys:ivars values:accessors] autorelease];
    self.accessorTable=table;
}

-(void)writeObject:anObject forKey:aKey
{
    MPWValueAccessor *accesssor=[self.accessorTable objectForKey:aKey];
    [accesssor setValue:anObject forTarget:*tos];
}



The bulk of the changes come as part of the new -setupAccessors: method. It first asks the class what its instance variables are, creates a value accessor for that instance variabl(-name), binds the accessor to the class and finally puts the accessors in a lookup table keyed by name.

The -writeObject:forKey: method is modified to look up and use a value accessor instead of using KVC.

Results

The parsing driver code didn't have to be changed, re-running it on our non-representative 44 MB JSON file yields the following time:

441 ms.

Now we're really starting to get somewhere! This is just shy of 100 MB/s and 10x faster then Swift's JSONDecoder, and within 5% of raw NSJSONSerialization.

Analysis and next steps

Can we do better? Why yes, glad you asked. Let's have a look at the profile.

First thing to note is that object-creation (beginDictionary) is now the #1 entry under the parse, as it should be. This is another indicator that we are not just moving in the right direction, but also closing in on the endgame.

However, there is still room for improvement. For example, although actually searching the SmallStringTable for the ValueAccessor (offsetOfCStringWithLengthInTableOfLength()) takes only 2.7% of the time, about the same as getting the internal char* out of a CFString via the fast-path (CFStringGetCStringPtr()), the total time for the -objectForKey: is a multiple of that, at 13%. This means that unwrapping the NSString takes more time than doing the actual work. Wrapping the char* and length into an NSString also takes significant time, and all of this work is redundant...we would be better of just passing along the char* and length.

A similar wrap/unwrap situation occurs with integers, which we first turn into NSNumbers, only to immediately get the integer out again so we can set it.

objc_msgSend() also starts getting noticeable, so looking at a bit of IMP-caching and just eliminating unnecessary indirection also seems like a good idea.

That's another aspect of optimization work: while the occasional big win is welcome, getting to truly outstanding performance means not being satisfied with that, but slogging through all the small-ish seeming detail.

Note

I can help not just Apple, but also you and your company with performance and agile coaching, workshops and consulting. Contact me at info at metaobject.com.

TOC

Somewhat Less Lethargic JSON Support for iOS/macOS, Part 1: The Status Quo
Somewhat Less Lethargic JSON Support for iOS/macOS, Part 2: Analysis
Somewhat Less Lethargic JSON Support for iOS/macOS, Part 3: Dematerialization
Equally Lethargic JSON Support for iOS/macOS, Part 4: Our Keys are Small but Legion
Less Lethargic JSON Support for iOS/macOS, Part 5: Cutting out the Middleman
Somewhat Faster JSON Support for iOS/macOS, Part 6: Cutting KVC out of the Loop
Faster JSON Support for iOS/macOS, Part 7: Polishing the Parser
Faster JSON Support for iOS/macOS, Part 8: Dematerialize All the Things!
Beyond Faster JSON Support for iOS/macOS, Part 9: CSV and SQLite

Saturday, December 14, 2019

The Four Stages of Objective-Smalltalk

One of the features that can be confusing about Objective-Smalltalk is that it actually has several parts that are each significant on their own, so frequently will focus on just one of these (which is fine!), but without realising that the other parts also exist, which is unfortunate as they are all valuable and complement each other. In fact, they can be described as stages that are (logically) built on top of each other.

1. WebScript 2 / "Shasta"

Objective-C has always had great integration with other languages, particularly with a plethora of scripting languages, from Tcl to Python and Ruby to Lisp and Scheme and their variants etc. This is due not just to the fact that the runtime is dynamic, but also that it is simple and C-based not just in terms of being implemented in C, but being a peer to C.

However, all of these suffer from having two somewhat disparate languages, with competing object models, runtimes, storage strategies etc. One language that did not have these issues was WebScript, part of WebObjects and essentially Objective-C-Script. The language was interpreted, a peer in which you could even implement categories on existing Objective-C objects, and so syntactically compatible that often you could just copy-paste code between the two. So close to the ideal scripting language for that environment.

However, the fact that Objective-C is already a hybrid with some ugly compromises means that these compromises often no longer make sense at all in the WebScript environment. For example, Objective-C strings need an added "@" character because plain double quotes are already taken by C strings, but there are no C strings in WebScripts. Primitive types like int can be declared, but are really objects, the declaration is a dummy, a NOP. Square brackets for message sends are needed in Objective-C to distinguish messages from the rest of the C syntax, but the that's also irrelevant in WebScript. And so on.

So the first stage of Objective-Smalltalk was/is to have all the good aspects of WebScript, but without the syntactic weirdness needed to match the syntactic weirdness of Objective-C that was needed because Objective-C was jammed into C. I am not the only one who figured out the obvious fact that such a language is, essentially, a variant of Smalltalk, and I do believe this pretty much matches what Brent Simmons called Shasta.

Implementation-wise, this works very similarly to WebScript in that everything in the language is an object and gets converted to/from primitives when sending or receiving messages as needed.

This is great for a much more interactive programming model than what we have/had (and the one we have seems to be deteriorating as we speak):

And not just for isolated fragments, but for interacting with and tweaking full applications as they are running:

2. Objective-C without the C

Of course, getting rid of the (syntactic) weirdnesses of Objective-C in our scripting language means that it is no longer (syntactically) compatible with Objective-C. Which is a shame.

It is a shame because this syntactic equivalence between Objective-C and WebScript meant that you could easily move code between them. Have a script that has become stable and you want to reuse it? Copy and paste that code into an Objective-C file and you're good to go. Need it faster? Same. Have some Objective-C code that you want to explore, create variants of etc? Paste it into WebScript. Such a smooth integration between scripting and "programming" is rare and valuable.

The "obvious" solution is to have a native AOT-compiled version of this scripting language and use it to replace Objective-C. Many if not all other scripting languages have struggled mightily with becoming a compiled language, either not getting there at all or requiring JIT compilers of enormous size, complexity, engineering effort and attack surface.

Since the semantic model of our scripting language ist just Objective-C, we know that we can AOT-compile this language with a fairly straightforward compiler, probably a lot simpler than even the C/Objective-C compilers currently used, and plugging into the existing toolchain. Which is nice.

The idea seems so obvious, but apparently it wasn't.

Everything so far would, taken together, make for a really nice replacement for Objective-C with a much more productive and, let's face it, fun developer experience. However, even given the advantages of a simpler language, smoothly integrated scripting/programming and instant builds, it's not really clear that yet another OO language is really sufficient, for example the Etoilé project or the eero language never went anywhere, despite both being very nice.

3. Beyond just Objects: Architecture Oriented Programming

Ever since my Diplomarbeit, Approaches to Composition and Refinement in Object-Oriented Design back in 1997, I've been interested in Software Architecture and Architecture Description Languages (ADLs) as a way of overcoming the problems we have when constructing larger pieces of software.

One thing I noticed very early is that the elements of an ADL closely match up with and generalise the elements of a programming language, for example an object-oriented language: object generalises to component, message to connector. So it seemed that any specific pogramming language is just a specialisation or instantiation of a more general "architecture language".

To explore this idea, I needed a language that was amenable to experimentation, by being both malleable enough as to allow a metasystem that can abstract away from objects and messages and simple/small enough to make experimentation feasible. A simple variant of Smalltalk would do the trick. More mature variants tend to push you towards building with what is there, rather than abstracting from it, they "...eat their young" (Alan Kay).

So Objective-Smalltalk fits the bill perfectly as a substrate for architecture-oriented programming. In fact, its being built on/with Objective-C, which came into being largely to connect the C/Unix world with the Smalltalk world, means it is already off to a good start.

What to build? How about not reinventing the wheel and simply picking the (arguably) 3 most successful/popular architectural styles:

  • OO (subsuming the other call/return styles)
  • Unix Pipes and Filters
  • REST
Again, surprisingly, at least to me, even these specific styles appear to align reasonably well with the elements we have in a programming language. OO is already well-developed in (Objective-)Smalltalk, dataflow maps to Smalltalk's assignment operator, which needed to be made polymorphic anyway, and REST at least partially maps to non-message identifiers, which also are not polymorphic in Smalltalk.

Having now built all of these abstractions into Objective-Smalltalk, I have to admit again to my surprise how well they work and work together. Yes, it was my thesis, and yes, I can now see confirmation bias everywhere, but it was also a bit of a long-shot.

4. Architecture Oriented Metaprogramming

The architectural styles described above are implemented in frameworks and their interfaces hard-coded into the language implementation. However, with three examples , it should now be feasible to create linguistic support for defining the architectural styles in the language itself, allowing users to define and refine their own architectural styles. This is ongoing work.

What now?

One of the key takeaways from this is that each stage is already quite useful, and probably a worthy project all by itself, it just gets Even Better™ with the addition of later stages. Another is that I need to get back to getting stage ready, as it wasn't actually needed for stage 3, at least not initially.

Monday, November 11, 2019

What Alan Kay Got Wrong About Objects

One of the anonymous reviewers of my recently published Storage Combinators paper (pdf) complained that hiding disk-based, remote, and local abstractions behind a common interface was a bad idea, citing Jim Waldo's A Note on Distributed Computing.

Having read both this and the related 8 Fallacies of Distributed Computing a while back, I didn't see how this would apply, and re-reading confirmed my vague recollections: these are about the problems of scaling things up from the local case to the distributed case, whereas Storage Combinators and In-Process REST are about scaling things down from the distributed case to the local case. Particularly the Waldo paper is also very specifically about objects and messages, REST is a different beast.

And of course scaling things down happens to be time-honored tradtition with a pretty good track record:

In computer terms, Smalltalk is a recursion on the notion of computer itself. Instead of dividing "computer stuff" into things each less strong than the whole—like data structures, procedures, and functions which are the usual paraphernalia of programming languages—each Smalltalk object is a recursion on the entire possibilities of the computer. Thus its semantics are a bit like having thousands and thousands of computers all hooked together by a very fast network.
Mind you, I think this is absolutely brilliant: in order to get something that will scale up, you simply start with something large and then scale it down!.

But of course, this actually did not happen. As we all experienced scaling local objects and messaging up to the distributed case did not (CORBA, SOAP,...), and as Waldo explains, cannot, in fact, work. What gives?

My guess is that the method described wasn't actually used: when Alan came up with his version of objects, there were no networks with thousands of computers. And so Alan could not actually look at how they communicated, he had to imagine it, it was a Gedankenexperiment. And thus objects and messages were not a scaled-down version of an actual larger thing, they were a scaled down version of an imagined larger thing.

Today, we do have a large network of computers, with not just thousands but billions of nodes. And they communicate via HTTP using the REST architectural style, not via distributed objects and messages.

So maybe if we took that communication model and scaled it down, we might be able to do even better than objects and messages, which already did pretty brilliantly. Hence In-Process REST, Polymorphic Identifiers and Storage Combinators, and yes, the results look pretty good so far!

The big idea is "messaging" -- that is what the kernal of Smalltalk/Squeak is all about (and it's something that was never quite completed in our Xerox PARC phase). The Japanese have a small word -- ma -- for "that which is in between" -- perhaps the nearest English equivalent is "interstitial". The key in making great and growable systems is much more to design how its modules communicate rather than what their internal properties and behaviors should be. Think of the internet -- to live, it (a) has to allow many different kinds of ideas and realizations that are beyond any single standard and (b) to allow varying degrees of safe interoperability between these ideas.

So of course Alan is right after all, just not about objects and messages, which are too specific: "ma", or "interstitialness" or "connector" is the big idea, messaging is just one incarnation of that idea.