Sunday, June 13, 2021

Asynchronous Sequences and Polymorphic Streams

Browsing the WWDC '21 session videos, I came across the session on Asynchronous Sequences. The preview image showcased some code for asynchronously fetching and massaging current earthquake data from the U.S. Geological Survey:
@main
struct QuakesTool {
   static func main() async throws {
      let endpointURL = URL(string: "https://earthquake.usgs.gov/earthquakes/feed/v1.0/summary/all_month.csv")!

      for try await event in endpointURL.lines.dropFirst() {
         let values = event.split(separator: ",")
         let time = values[0]
         let latitude = values[1]
         let longitude = values[2]
         let magnitude = values[4]
         print("Magnitude \(magnitude) on \(time) at \(latitude) \(longitude)")
      }
   }
}

This is nice, clean code, and it certainly looks like it serves as a good showcase for the benefits of asynchronous coding with async/await and asynchronous sequences built on top of async/await.

Or does it?

Here is the equivalent code in Objective-S:


#!env stsh
stream ← ref:https://earthquake.usgs.gov/earthquakes/feed/v1.0/summary/all_month.csv linesAfter:1.

stream do: { :theLine |
   values ← theLine componentsSeparatedByString:','.
   time ← values at:0.
   latitude ← values at:1.
   longitude ← values at:2.
   magnitude ← values at:4.
   stdout println:"Quake: magnitude {magnitude} on {time} at {latitude} {longitude}".
}. 
stream awaitResultForSeconds:20.

Objective-S does not (and will not) have async/await, but it can nevertheless provide the equivalent functionality easily and elegantly. How? Two features:

  1. Polymorphic Write Streams
  2. Messaging
Let's see how these two conspire to make adding something equivalent to for try await trivial.

Polymorphic Write Streams

In the Objective-S implementation, https://earthquake.usgs.gov/earthquakes/feed/v1.0/summary/all_month.csv is not a string, but an actual identifier, a Polymorphic Identifier, adding the ref: prefix turns it into a binding, a first class variable. You can ask a binding for its value, but for bindings that can also be regarded as collections of some kind, you can also ask them for a stream of their values, in this particular case a MPWURLStreamingStream. This stream is a Polymorphic Write Stream that can be easily composed with other filters to create pipelines. The linesAfter: method is a convenience method that does just that: it composes the URL fetcher with a filter that converts from bytes to lines of text and another filter that drops the first n items.

Objective-S actually has convenient syntax for creating these compositions without having to do it via convenience methods, but I wanted to keep differences in the surrounding scaffolding small for this example, which is about the for try away and do:.

When I encountered the example, Polymorphic Write Streams actually did not have a do: for iteration, but it was trivial to add:


-(void)do:aBlock
{
    [self setFinalTarget:[MPWBlockTargetStream streamWithBlock:aBlock]];
    [self run];
}

(This code lives in MPWFoundation, so it is in Objective-C, not Objective-S).

Those 5 lines were all that was needed. I did not have to make substantive changes to the language or its implementation. One reason for this is that Polymorphic Write Streams are asynchrony-agnostic: although they are mostly implemented as straightforward synchronous code, they work just as well if parts of the pipeline they are in are asynchronous. It just doesn't make a difference, because the semantics are in the data flow, not in the control flow.

Messaging

The other big reason an asynchronous do: was easy to add is messaging.
If you focus on just messaging -- and realize that a good metasystem can late bind the various 2nd level architectures used in objects -- then much of the language-, UI-, and OS based discussions on this thread are really quite moot.
One of the many really, really neat ideas in Smalltalk is how control structures, which in most other languages are special language features, are just plain old messages and implemented in the library, not in the language.

So the for ... in loop in Swift is just the do: message sent to a collection, and the keyword syntax makes this natural:


for event in lines {
...
}
...
lines do: { :event |
...
}

Note how making loops regular like this also makes the special concept of "loop variable" disappear. The "loop variable" is just the block argument. And I just realized the same would go for a not-nil result of a nil test.

Anyway, if "loops" are just messages, it's easy to add a method implementing iteration to some other entity, for example a stream, the way that I did. (Smalltalk streams also support the iteration messages).

And when you can easily make stream processing, which can handle asynchrony naturally and easily, just as convenient as imperative programming, you don't need async/await, which tries to make asynchronous programming look like imperative programming in order to make it convenient.

Wednesday, June 9, 2021

Glue: the Dark Matter of Software

"Software seems 'large' and 'complicated' for what it does". I keep coming back to this quote by Alan Kay.

The same feeling has been nagging me pretty me much ever since I started writing software. On the one hand, there is the magic, almost literally: we write some text (spells) and the machine does things in the real world. On the other hand, it seems just way too much work to make the machine do anything more complex than:

10 PRINT "Hello"
20 GOTO 10
Almost like threading a needle with boxing gloves. And that's even if we are careful, if we avoid unnecessary complexity.

And the numbers appear to back that up, Alan Kay mentions Microsoft office at several hundred million lines of code. From my personal experience, the Wunderlist iOS client was not quite 200 KLOC. For the latter, I can attest to the attention given by the team to not introduce unnecessary bloat, and even to actively reduce it. (For example, we cut our core code by around 30KLOC thanks to some of the architectural mechanisms such as Storage Combinators). I am fairly sure I am not the only one with this experience.

So why so much code? After all Wunderlist was just a To Do List, albeit a really nice one. I can't really say much about Office, I don't think anyone can, because 400 MLOC is just way too much code to comprehend. I think the answer is:

Glue Code.

It's the unglamorous, invisible code that connects two pieces of software, makes sure that data that's in location A reaches location B unscathed (from the datbase to the UI, from the UI to the model, from the model to the backend and so on...). And like Dark Matter, it is invisible and massive.

Why do I say it is "invisible"? After all, the code is right there, isn't it? As far as I can tell, there are several related reasons:

  1. Glue code is deemed not important. It's just a couple of lines here, and another couple of lines over there ... and soon enough you're talking real MLOCs!
  2. We cannot directly express glue code. Most of our languages are what I call "DSLs for Algorithms" (See ALGOL, the ALGOrithmic Language), so glue can not be expressed intentionally, but only by describing algorithms for implementing the glue.
That's why it is invisible, and also partly why it is massive: not being able to express it directly means we cannot abstract and encapsulate it, we keep repeating slight variations of that glue. There is another reason why it's massive:
  1. Glue is quadratic. If you have N features that interact with each other, you have O(N²) pieces of glue to get them to talk to each other.

This last point was illustrated quite nicely by Kevin Greer in a video comparing Multics and Unix development, with the crucial insight being that you need to "program the perimeter, not the area":

For him, the key difference is that Unix had the pipe, and I would agree. The pipe is one-character glue: "|". This is absolutely crucial.

If you have to write even a little custom code every time you connect two modules, you will be in quadratic complexity, meaning that as your features grow your glue code will overwhelm the core functionality. And you will only notice this when it's far too late to do anything about it, because the initial growth rate will be low.

So what can we do about it? I think we need to make glue first class so we can actually write down the glue itself, and not the algorithms that implement the glue. Once we have that, we can and hopefully will create better kinds of glue, ones like the Unix pipe in that they can connect components generically, without requiring custom glue per component pair.

UPDATE

There were some questions as to what to do about this. Well, I am working on it, with Objective-S, and I write fairly frequently on this blog (and occasionally submit my writing to scientific conferences), one post that would be immediately relevant is: Why Architecture Oriented Programming Matters.

I also don't see Unix Pipes and Filters as The Answer™, they just demonstrate the concept of minimized and constant glue. Expanding on this, and as I wrote in Why Architecture Oriented Programming Matters, I also don't see any one single connector as "the" solution. We need different kinds of connectors, and we need to write them down, to abstract over them and use them natively. Not simulate everything by calling procedures, methods or functions. See also Foxes vs. Hedgehogs.

Tuesday, June 1, 2021

Towards a ToDoMVC Backend in Objective-S

A couple of weeks ago, I showed a little http backend. Well, tiny is probably a more apt description, and also aptly describes its functionality, which is almost non-existent. All it does is define a simplistic Task class, create an array with two sample instances and then serves that array of tasks over http. And it serves the -description of those tasks rather than anything usefuk like a JSON encoding.

For reference, this is the original code, hacked up in maybe 15 minutes:


#!env stsh
framework:ObjectiveHTTPD load.

class Task {
   var <bool> done.
   var title.
   -description { "Task: {this:title} done: {this:done}". }
}

taskList ← #( #Task{ #title: 'Clean my room', #done: false }, #Task{ #title: 'Check twitter feed', #done: true } ).

scheme todo {
   var taskList.
   /tasks { 
      |= { 
         this:taskList.
      }
   }
}.

todo := #todo{ #taskList: taskList }.
server := #MPWSchemeHttpServer{ #scheme: todo, #port: 8082 }.
server start.
shell runInteractiveLoop.

What would it take to make this borderline useful? First, we would probably need to encode the result as JSON, rather than serving a description. This is where Storage Combinators come in. We (now) have a MPWJSONConverterStore that's a mapping store, it passes its "REST" requests through while performing certain transformations on the data and/or the references. In this case the transformation is serializing or deserialzing objects from/to JSON, depending on which way the request is going and which way the converter is pointing.

In this case, the converter is pointing "up", that is it serializes objects read from its source to JSON and deserializes data written to its source from JSON to objects. We also tell it that it is dealing with Task objects. When we have the converter we connect it to our todo scheme and tell the HTTP server to talk to the json converter (which talks to our todo scheme):


todo := #todo{ #taskList: taskList, #store: persistence }.
json := #MPWJSONConverterStore{  #up: true, #class: class:Task }.
json → todo.
server := #MPWSchemeHttpServer{ #scheme: json, #port: 8082 }.

Second, we also want to be to interact with individual tasks. No problem, just add a /task/:id proprerty path to our store/scheme handler, along with GET ("|=") and PUT ("=|") handlers. I am not fully sold yet on the "|=" syntax for this, but I would like to avoid names for this sort of structural component. Maybe arrows?
	/task/:id {
		|= {
			this:taskDict at:id .
		}
		=| {
			this:taskDict at:id put:newValue.
		}

In order to facilitate this, the taskList was changed to a dictionary. Once we make changes to our data, we probably also want to persist it. One easy way to do this is to store the tasks as JSON on disk. This allows us to reuse the JSON converter from above, but this time pointing "down". We connect this converter to the filesystem at the directory /tmp/tasks and to the store:
json → todo → #MPWJSONConverterStore{  #class: class:Task } → ref:file:/tmp/tasks/ asScheme.

In addition, we need to trigger saving in the PUT handler:
		=| {
			this:taskDict at:id put:newValue.
			self persist.
		}
	-persist {
		source:tasks := this:taskDict allValues.
	}
}

This will (synchronously) write the entire task list on every PUT. The full code is here:
#!env stsh
framework:ObjectiveHTTPD load.

class Task {
	var id.
	var  done.
	var title.
	-description { "Task: {this:title} done: {this:done} id: {this:id}". }
	-writeOnJSONStream:aStream {
		aStream writeDictionaryLikeObject:self withContentBlock:{ :writer |
			writer writeInteger: this:id forKey:'id'.
			writer writeString: this:title forKey:'title'.
			writer writeInteger: this:done forKey:'done'.
		}.
	}
}

taskList ← #( #Task{ #id: '1', #title: 'Clean Room', #done: false }, #Task{ #id: '2', #title: 'Check Twitter', #done: true } ).

scheme todo : MPWMappingStore {
	var taskDict.
	-setTaskList:aList {
		this:taskDict := NSMutableDictionary dictionaryWithObjects: aList forKeys: aList collect id.
	}
	/tasks { 
		|= { 
			this:taskDict allValues.
		}
	}
	/task/:id {
		|= {
			this:taskDict at:id .
		}
		=| {
			this:taskDict at:id put:newValue.
			self persist.
		}
	}
	-persist {
		source:tasks := this:taskDict allValues.
	}
}.

todo := #todo{ #taskList: taskList }.
json := #MPWJSONConverterStore{  #up: true, #class: class:Task }.
json → todo → #MPWJSONConverterStore{  #class: class:Task } → ref:file:/tmp/tasks/ asScheme.
server := #MPWSchemeHttpServer{ #scheme: json, #port: 8082 }.
server start.
shell runInteractiveLoop.

The writeOnJSONStream: method is currently still needed by the serializer to encode the task object as JSON. The parser doesn't need any support, it can figure things out by itself for simple mappings. Yes, this makes no sense, as serializing is easier than parsing, but I haven't gotten around to the automation for serializing yet.

Analysis

So there you have it, an almost functional Todo backend, in refreshingly little code, and with refreshingly little magic. What I find particularly pleasing is that this conciseness can be achieved while keeping the architecture fully visible and maintaining a hexagonal/ports-and-adapters style.

What is the architecture of this app? It says so right at the end: the server is parametrized by its scheme, and that scheme is a JSON serializer hooked up to my todo scheme handler, hooked up to another JSON serializer hooked up to the directory /tmp/tasks.

Although a Rails app contains comparably little code, this code is scattered over different classes and is only comprehensible as a plugin to Rails. All the architecture is hidden inside Rails, it is not at all visible in the code and simply cannot be divined from looking at the code. Although there are many reasons for this, one fundamental one is that Ruby is a call/return language, and Rails does its best to translate from the REST architectural style to something that is more natural in the call/return style. And it does an admirable job at it.

I do think that this examples gives us a little glimpse into what I believe to be the power of Architecture Oriented Programming: the power and succinctness of frameworks, but with the simplicity, straightforwardness and reusability of more library-oriented styles.

Performance

I obviously couldn't resist benchmarking this, and to my great joy found that wrk now works on the M1. Since the interpreter isn't thread safe, I had to restrict it to a single connection and thread. My expectations were that it requests/s would be in the double to low triple digits, my fear was that it would be single digits. (The reason for that fear is the writeOnJSONStream: method that is called for every object serialized and is in interpreted Objective-S, probably one of the slowest language implementations currently in existence). To say I was surprised is an understatement. Stunned is more like it:
wrk -c 1 -t 1 http://localhost:8082/task/1 
Running 10s test @ http://localhost:8082/task/1
  1 threads and 1 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency   133.62us   14.45us   0.97ms   98.52%
    Req/Sec     7.50k   311.09     7.62k    99.01%
  75326 requests in 10.10s, 12.28MB read
Requests/sec:   7458.60
Transfer/sec:      1.22MBTransfer/sec:      1.97MB

More than 7K requests per second! Those M1 Macs really are fast. I wonder what it will be once I remove the need for the manually written writeOnJSONStream: method.

(NOTE: previous version said >12K requests/s, which is even more insane, but was with an incorrect URL that had the server returning 404s)

Thursday, May 20, 2021

Why are there no return statements in Objective-S?

My previous example raised a question: why no return statements? I am assuming this was about this part of the example:

   -description { "Task: {this:title} done: {this:done}". }


The answer is that I would like to do without return statements if and as much as I can. We will see how much that is. In general, I am in favor of expression-orientation in programming languages. A simple example is if-statements vs. conditional expressions. In most languages today, like C, Objective-C and Swift, if is a statement. That means I write something as follows:
if ( condition ) {
   do something if true
} else {
   do something different if false
}

This seems obvious and is general, but very often you don't want to do arbitrary stuff, you just want to have some variable have some value in one case and a different value in another case.
int foo;
if ( condition ) {
   foo = 1;
} else {
   foo = 42;
}

In that case, it is annoying that the if is defined to be a statement and not an expression, because you can't just write the following:
int foo;
foo = if ( condition ) { 1; } else { 42; }

In addition, as hinted to in the previous examples, you can't use a statement to initialize a variable, that definitely has to be an expression. Which is why C and many derived languages have the "ternary" operator (?:), which is really just an if/else in expression form.
int foo=condition ? 1 : 42;

That solves the problem, but now you have two conditionals. Why not have just one? LISP, most of the FP languages as well as Smalltalk and Objective-S have an if that returns a value.
a := condition ifTrue:{ 1. } ifFalse:{ 42. }.

So that's why expression-orientation is useful in general. What about methods? The same general idea applies. Whereas in Java, for example, a read accessor is called getX(), indicating an action that is performed ("get the value of x") in Objective-C Smalltalk and Objective-S, it is just called x, ("the value of x").

The same idea applies to dropping return statements where possible. It's not "get me the description of this object", it is "the description of this object is...". And inside the method, it's not "this statement now returns the following string as the description", but, again, "the description is...".

Describing things that are, rather than actions to perform, is at the heart of Objective-S, as discussed in Can Programmers Escape the Gentle Tyranny of Call/Return.

As Guy Steele put it:

Another weakness of procedural and functional programming is that their viewpoint assumes a process by which "inputs" are transformed into "outputs"; there is equal concern for correctness and for termination (and proofs thereof). But as we have connected millions of computers to form the Internet and the World Wide Web, as we have caused large independent sets of state to interact–I am speaking of databases, automated sensors, mobile devices, and (most of all) people–in this highly interactive, distributed setting, the procedural and functional models have failed, another reason why objects have become the dominant model. Ongoing behavior, not completion, is now of primary interest. Indeed, object-oriented programming had its origins in efforts to simulate the ongoing behavior of interacting real-world entities–thus the programming language SIMULA was born.
So wherever possible, Objective-S tries to push towards expressing things as statically as possible, pushing away from action-orientation. For example, hooking up a timed source to a pin:
#Blinker{ #seconds: 1, #active: true} → ref:gpio:17. 

instead of executing a loop:
while True: 
    GPIO.output(17, True) 
    sleep(1) 
    GPIO.output(17, False) 
    sleep(1) 

The same goes for many other relationships: instead of writing procedural code that initiates and/or maintains the relatinship, with the actual relationship remainig implicit, describe the actual relationship, make that explicit, and instead keep the procedural code that maintains it as a hidden implementation detail.

If the return statement comes back, and it very well might, I am hoping it will be in a slightly more general form. I recall Smalltalk's "^" being described as "send back". I've already taken that and generalised it to mean "send result", using it in filter definitions, where "^" means "send a result to the next filter in the pipeline". It is needed there because filters are not limited to sending a single result, they can send zero or many.

With those more general semantics, "^" might also be used to send back results to the sender of an asynchronous message, which is obviously quite different from a "return".

And of course it would be useful for early returns, which are currently not possible.

What about void methods?

Objective-S does have void methods, after all its procedural part is essentially identical to Objective-C, which also has them. However, I agree with the FP folk that functions (procedures, methods) should be as (side-)effect free as possible, and void methods by definition are effectful (or no-ops).

So where do the effects go? Two places:

  1. The left hand side of the "←".

    In most current programming languages, assignment is severely crippled, and therefore not really useful for generalised effects. With Polymorphic Identifiers and Storage Combinators, there is enough expressive power and ability to abstract that we should need far fewer void methods.

  2. Connecting via "→"

    Much of the need for effectful methods in OO is for constructing and connecting objects. In Objective-S, you don't need to call methods that result in a connection being established as a side effect of munging on some state, you define connections between objects directly using "→".

    Well, and you define objects using object literals such as  #Blinker{ #seconds: 1, #active: true}  instead of setting instance variables procedurally.

That's the plan, anyway. Although a lot of that plan is coming true at the moment. Exciting times! (And one of the reasons I haven't been blogging all that much).

A far too simple (hardcoded) tasks backend in Objective-S

Recently there was a question as to what one should use to create a backend for an iOS/macOS app these days. I couldn't resist mentioning Objective-S, and just to check for myself whether that's feasible, I quickly jotted down the following tiny backend that returns a hardcoded list of tasks via HTTP:
#!env stsh
framework:ObjectiveHTTPD load.

class Task {
   var <bool> done.
   var title.
   -description { "Task: {this:title} done: {this:done}". }
}

taskList ← #( #Task{ #title: 'Clean my room', #done: false }, #Task{ #title: 'Check twitter feed', #done: true } ).

scheme todo {
   var taskList.
   /tasks { 
      |= { 
         this:taskList.
      }
   }
}.

todo := #todo{ #taskList: taskList }.
server := #MPWSchemeHttpServer{ #scheme: todo, #port: 8082 }.
server start.
shell runInteractiveLoop.

After loading the HTTP framework, we define a Task and a list of two example tasks. Then we define a scheme with a single path, just /tasks, which returns said tasks list. We then instantiate the scheme and serve it via HTTP on port 8082. Since this is a shell script and starting the server does not block, we finally start up the REPL.

Details such as coding the tasks as JSON, accessing a single task and modifying tasks are left as exercises for the reader.

Sunday, May 9, 2021

Talking to pins

The last few weeks, I spent a little time getting Objective-S working well on the Raspberry Pi, specifically my Pi400. It's a really wonderful little machine, and the form factor and price remind me very much of the early personal computers.

What's missing, IMHO, is an experience akin to the early BASICs. And I really mean "akin", not a nostalgia project, but recovering a real quality that has been lost: not really "simplicity", more "straightforwardness".

Of course, one of the really cool thing about the Pi is its GPIO interface that lets you do all sorts of electronics experiments, and I hear that the equivalent of "Hello World" for the Raspi is making an LED blink.


import RPi.GPIO as GPIO 
from time import sleep 
GPIO.setwarnings(False) 
 
GPIO.setmode(GPIO.BCM) 
GPIO.setup(17, GPIO.OUT) 
 
while True: 
    GPIO.output(17, True) 
    sleep(1) 
    GPIO.output(17, False) 
    sleep(1) 

Hmm. That's a a lot of semantic noise for something so conceptually simple. All we want to is set the value of a pin. As soon as I saw this, I knew it would be ideal for Polymorphic Identifiers, because a pin is the ultimate state, and PIs and their stores are made for abstracting over state.

Of course, I first had to to get Objective-S running on the Pi, which meant getting GNUstep to run. While there is a wonderful set of install scripts, the one for the Raspi only worked with an ancient clang version and libobjc 1.9. Alas, that version has some bugs on the Raspi, for example with the imp_implentationWithBlock() runtime function that Objective-S uses to define methods.

Long story short, after learning about GNUstep installs and waiting for the wonderful David Chisnall to remove some obsolete 32 bit exception-version detection code from libobjc, we now have a script that installs current GNUstep with a reasonably current clang: https://github.com/plaurent/gnustep-build/tree/master/raspbian-10-clang-9.0-runtime-2.1-ARM. With that in hand, a few bug fixes in MPWFoundation and Objective-S, I could add a really rudimentary Store that manages talking to the pins. And this allows me to write the following in an interactive shell to drive the customary GPIO pin 17 that I connected to the LED via resistor:


gpio:17 ← 1.

Now that's what I am talking about!

Of course, we're supposed to make it blink, not just turn it on. We could use the same looping approach as the Python script, or convenience methods like the ones provided, but the breadboard and pins make me think of wanting to connect components to do the job instead.

So let's connect some components, software architecture style! The following script creates an instance of a Blinker object (using an object literal), which emits alternating ones and zeros and connects it to the pin.


blinker ← #Blinker{ #seconds: 1 }. 
blinker → ref:gpio:17. 
blinker run.
gpio:17 ← 0.

Once connected it tells the blinker to start running, which creates an NSTimer adds it to the current runloop and then runs the run loop. That run is interruptible, so Ctrl-C breaks and runs the cleanup code.

What about setting up the pin for output? Happens automatically when you first output to it, but I will add code so you can do it manually.

Where does the Blinker come from? That's actually an object-template based on an MPWFixedValueSource.


object Blinker : #MPWFixedValueSource{ #values: #(0,1) }

You can, of course, hook up a fixed-value source to any kind of stream.

While getting here took a lot of work, and resulted in me (re-)learning a lot about GNUstep, the result, even this intermediate one, is completely worth it and makes me very happy. This stuff really works even better than I thought it would.