Wednesday, June 30, 2021

Don't Generate Glue...Exterminate!!

Today I saw the news of github's release of the "AI Autopilot". As far as I can tell, it's an impressive piece of engineering that shouldn't exist. I mean, "Paste Code from Stack Overflow as a Service" was supposed to be a joke, not a product spec.

As of this writing, the first example given on the product page is some code to call a REST service that does sentiment analysis, which the AI helpfully completes. For reference, this is the code:


#!/usr/bin/env ts-node

import { fetch } from "fetch-h2";

// Determine whether the sentiment of text is positive
// Use a web service
async function isPositive(text: string): Promise<boolean> {
  const response = await fetch(`http://text-processing.com/api/sentiment/`, {
    method: "POST",
    body: `text=${text}`,
    headers: {
      "Content-Type": "application/x-www-form-urlencoded",
    },
  });
  const json = await response.json();
  return json.label === "pos";
}

Here is the same script in Objective-S:
#!env stsh
#-sentiment:text
((ref:http://text-processing.com/api/sentiment/ postForm:#{ #text: text }) at:'label') = 'pos'

And once you have it, reuse it. And keep those Daleks at bay.

Monday, June 28, 2021

Generating ARM Assembly: First Steps

Finally took the plunge to start generating ARM64 assembly. As expected, the actual coding was much easier than overcoming the barrier to just start doing it.

The following snippet generates a program that prints a message to stdout, so a classic "Hello World":


#!env stsh
#-gen:msg

messageLabel ← 'message'.
main ← '_main'.

framework:ObjSTNative load
arm := MPWARMAssemblyGenerator stream 
arm global: main;
    align:2;
    label:main;
    mov:0 value:1;
    adr:1 address:messageLabel;
    mov:2 value: msg length;
    mov:16 value:4;
    svc:128;
    mov:0 value:0;
    ret;
    label:messageLabel;
    asciiz:msg.

file:hello-main.s := arm target 

One little twist is that the message to print gets passed to the generator. I like how Smalltalk's keyword syntax keeps the code uncluttered, and often pretty close to the actual assembly that will be generated.

Of particular help here is message cascading using the semicolon. This means I don't have to repeat the receiver of the message, but can just keep sending it messages. Cascading works well together with streams, because there are no return values to contend with, we just keep appending to the stream.

When invoked using ./genhello-main.st 'Hi Marcel, finish that blog post and get on your bike!', the generated code is as follows:


.global	_main
.align	2
_main:
	mov	X0, #1
	adr	X1, message
	mov	X2, #54
	mov	X16, #4
	svc	#128
	mov	X0, #0
	ret
message:
	.asciz	"Hi Marcel, finish that blog post and get on your bike!"

And now I am going to do what it says :-)

Tuesday, June 15, 2021

if let it be

One of funkier aspects of Swift syntax is the if let statement. As far as I can tell, it exists pretty much exclusively to check that an optional variable actually does contain a value and if it does, work with a no-longer-optional version of that variable.

Swift packages this functionality in a combination if statement and let declaration:


if let value = value {
   print("value is \(value)")
}

This has a bunch of problems that are explained nicely in a Swift Evolution thread (via Michael Tsai) together with some proposals to fix it. One of the issues is the idiomatic repitition of the variable name, because typically you do want the same variable, just with less optionality. Alas, code-completion apparently doesn't handle this well, so the temptation is to pick a non-descriptive variable name.

In my previous post (Asynchronous Sequences and Polymorphic Streams) I noted how the fact that iteration in Smalltalk and Objecive-S is done via messages and blocks means that there is no separate concept of a "loop-variable", that is just an argument to the block.

Conditionals are handled the same way, with blocks and messages, but normally don't pass arguments to their argument blocks, because in normal conditionals those arguments would always be just the constants true or false. Not very interesting.

When I added ifNotNil: some time ago, I used the same logic, but it turns out the object is now actually potentially interesting. So ifNotNil: now passes the now-known-to-be-non-nil value to the block and can be used as follows:


value ifNotNil:{ :value |
    stdout println:value.
}

This doesn't eliminate the duplication, but does avoid the issue of having the newly introduced variable name precede the original variable. Well, that and the whole weird if let in the first place.

With anonymous block arguments, we actually don't have to name the parameter at all:


value ifNotNil:{ stdout println:$0. }

Alternatively, we can just take advantage of some conveniensces and use a HOM instead:


value ifNotNil printOn:stdout.

Of course, Objective-S currently doesn't care about optionality, and with the current nil-eating behavior, the ifNotNil is not strictly necessary, you could just write it as follow:


value printOn:stdout.

I haven't really done much thinking about it, but the whole idea of optionality shouldn't really be handled in the space of values, but in the space of references. Which are first class objects in Objective-S.

So you don't ask a value if it is nil or not, you ask the variable if it contains a value:


ref:value ifBound:{ :value | ... }

To me that makes a lot more sense than having every type be accompanied by an optional type.

So if we were to care about optionality so in the future, we have the tools to create a sensible solution. And we can let if let just be.

Sunday, June 13, 2021

Asynchronous Sequences and Polymorphic Streams

Browsing the WWDC '21 session videos, I came across the session on Asynchronous Sequences. The preview image showcased some code for asynchronously fetching and massaging current earthquake data from the U.S. Geological Survey:
@main
struct QuakesTool {
   static func main() async throws {
      let endpointURL = URL(string: "https://earthquake.usgs.gov/earthquakes/feed/v1.0/summary/all_month.csv")!

      for try await event in endpointURL.lines.dropFirst() {
         let values = event.split(separator: ",")
         let time = values[0]
         let latitude = values[1]
         let longitude = values[2]
         let magnitude = values[4]
         print("Magnitude \(magnitude) on \(time) at \(latitude) \(longitude)")
      }
   }
}

This is nice, clean code, and it certainly looks like it serves as a good showcase for the benefits of asynchronous coding with async/await and asynchronous sequences built on top of async/await.

Or does it?

Here is the equivalent code in Objective-S:


#!env stsh
stream ← ref:https://earthquake.usgs.gov/earthquakes/feed/v1.0/summary/all_month.csv linesAfter:1.

stream do: { :theLine |
   values ← theLine componentsSeparatedByString:','.
   time ← values at:0.
   latitude ← values at:1.
   longitude ← values at:2.
   magnitude ← values at:4.
   stdout println:"Quake: magnitude {magnitude} on {time} at {latitude} {longitude}".
}. 
stream awaitResultForSeconds:20.

Objective-S does not (and will not) have async/await, but it can nevertheless provide the equivalent functionality easily and elegantly. How? Two features:

  1. Polymorphic Write Streams
  2. Messaging
Let's see how these two conspire to make adding something equivalent to for try await trivial.

Polymorphic Write Streams

In the Objective-S implementation, https://earthquake.usgs.gov/earthquakes/feed/v1.0/summary/all_month.csv is not a string, but an actual identifier, a Polymorphic Identifier, adding the ref: prefix turns it into a binding, a first class variable. You can ask a binding for its value, but for bindings that can also be regarded as collections of some kind, you can also ask them for a stream of their values, in this particular case a MPWURLStreamingStream. This stream is a Polymorphic Write Stream that can be easily composed with other filters to create pipelines. The linesAfter: method is a convenience method that does just that: it composes the URL fetcher with a filter that converts from bytes to lines of text and another filter that drops the first n items.

Objective-S actually has convenient syntax for creating these compositions without having to do it via convenience methods, but I wanted to keep differences in the surrounding scaffolding small for this example, which is about the for try away and do:.

When I encountered the example, Polymorphic Write Streams actually did not have a do: for iteration, but it was trivial to add:


-(void)do:aBlock
{
    [self setFinalTarget:[MPWBlockTargetStream streamWithBlock:aBlock]];
    [self run];
}

(This code lives in MPWFoundation, so it is in Objective-C, not Objective-S).

Those 5 lines were all that was needed. I did not have to make substantive changes to the language or its implementation. One reason for this is that Polymorphic Write Streams are asynchrony-agnostic: although they are mostly implemented as straightforward synchronous code, they work just as well if parts of the pipeline they are in are asynchronous. It just doesn't make a difference, because the semantics are in the data flow, not in the control flow.

Messaging

The other big reason an asynchronous do: was easy to add is messaging.
If you focus on just messaging -- and realize that a good metasystem can late bind the various 2nd level architectures used in objects -- then much of the language-, UI-, and OS based discussions on this thread are really quite moot.
One of the many really, really neat ideas in Smalltalk is how control structures, which in most other languages are special language features, are just plain old messages and implemented in the library, not in the language.

So the for ... in loop in Swift is just the do: message sent to a collection, and the keyword syntax makes this natural:


for event in lines {
...
}
...
lines do: { :event |
...
}

Note how making loops regular like this also makes the special concept of "loop variable" disappear. The "loop variable" is just the block argument. And I just realized the same would go for a not-nil result of a nil test.

Anyway, if "loops" are just messages, it's easy to add a method implementing iteration to some other entity, for example a stream, the way that I did. (Smalltalk streams also support the iteration messages).

And when you can easily make stream processing, which can handle asynchrony naturally and easily, just as convenient as imperative programming, you don't need async/await, which tries to make asynchronous programming look like imperative programming in order to make it convenient.

Wednesday, June 9, 2021

Glue: the Dark Matter of Software

"Software seems 'large' and 'complicated' for what it does". I keep coming back to this quote by Alan Kay.

The same feeling has been nagging me pretty me much ever since I started writing software. On the one hand, there is the magic, almost literally: we write some text (spells) and the machine does things in the real world. On the other hand, it seems just way too much work to make the machine do anything more complex than:

10 PRINT "Hello"
20 GOTO 10
Almost like threading a needle with boxing gloves. And that's even if we are careful, if we avoid unnecessary complexity.

And the numbers appear to back that up, Alan Kay mentions Microsoft office at several hundred million lines of code. From my personal experience, the Wunderlist iOS client was not quite 200 KLOC. For the latter, I can attest to the attention given by the team to not introduce unnecessary bloat, and even to actively reduce it. (For example, we cut our core code by around 30KLOC thanks to some of the architectural mechanisms such as Storage Combinators). I am fairly sure I am not the only one with this experience.

So why so much code? After all Wunderlist was just a To Do List, albeit a really nice one. I can't really say much about Office, I don't think anyone can, because 400 MLOC is just way too much code to comprehend. I think the answer is:

Glue Code.

It's the unglamorous, invisible code that connects two pieces of software, makes sure that data that's in location A reaches location B unscathed (from the datbase to the UI, from the UI to the model, from the model to the backend and so on...). And like Dark Matter, it is invisible and massive.

Why do I say it is "invisible"? After all, the code is right there, isn't it? As far as I can tell, there are several related reasons:

  1. Glue code is deemed not important. It's just a couple of lines here, and another couple of lines over there ... and soon enough you're talking real MLOCs!
  2. We cannot directly express glue code. Most of our languages are what I call "DSLs for Algorithms" (See ALGOL, the ALGOrithmic Language), so glue can not be expressed intentionally, but only by describing algorithms for implementing the glue.
That's why it is invisible, and also partly why it is massive: not being able to express it directly means we cannot abstract and encapsulate it, we keep repeating slight variations of that glue. There is another reason why it's massive:
  1. Glue is quadratic. If you have N features that interact with each other, you have O(N²) pieces of glue to get them to talk to each other.

This last point was illustrated quite nicely by Kevin Greer in a video comparing Multics and Unix development, with the crucial insight being that you need to "program the perimeter, not the area":

For him, the key difference is that Unix had the pipe, and I would agree. The pipe is one-character glue: "|". This is absolutely crucial.

If you have to write even a little custom code every time you connect two modules, you will be in quadratic complexity, meaning that as your features grow your glue code will overwhelm the core functionality. And you will only notice this when it's far too late to do anything about it, because the initial growth rate will be low.

So what can we do about it? I think we need to make glue first class so we can actually write down the glue itself, and not the algorithms that implement the glue. Once we have that, we can and hopefully will create better kinds of glue, ones like the Unix pipe in that they can connect components generically, without requiring custom glue per component pair.

UPDATE

There were some questions as to what to do about this. Well, I am working on it, with Objective-S, and I write fairly frequently on this blog (and occasionally submit my writing to scientific conferences), one post that would be immediately relevant is: Why Architecture Oriented Programming Matters.

I also don't see Unix Pipes and Filters as The Answer™, they just demonstrate the concept of minimized and constant glue. Expanding on this, and as I wrote in Why Architecture Oriented Programming Matters, I also don't see any one single connector as "the" solution. We need different kinds of connectors, and we need to write them down, to abstract over them and use them natively. Not simulate everything by calling procedures, methods or functions. See also Foxes vs. Hedgehogs.

Tuesday, June 1, 2021

Towards a ToDoMVC Backend in Objective-S

A couple of weeks ago, I showed a little http backend. Well, tiny is probably a more apt description, and also aptly describes its functionality, which is almost non-existent. All it does is define a simplistic Task class, create an array with two sample instances and then serves that array of tasks over http. And it serves the -description of those tasks rather than anything usefuk like a JSON encoding.

For reference, this is the original code, hacked up in maybe 15 minutes:


#!env stsh
framework:ObjectiveHTTPD load.

class Task {
   var <bool> done.
   var title.
   -description { "Task: {this:title} done: {this:done}". }
}

taskList ← #( #Task{ #title: 'Clean my room', #done: false }, #Task{ #title: 'Check twitter feed', #done: true } ).

scheme todo {
   var taskList.
   /tasks { 
      |= { 
         this:taskList.
      }
   }
}.

todo := #todo{ #taskList: taskList }.
server := #MPWSchemeHttpServer{ #scheme: todo, #port: 8082 }.
server start.
shell runInteractiveLoop.

What would it take to make this borderline useful? First, we would probably need to encode the result as JSON, rather than serving a description. This is where Storage Combinators come in. We (now) have a MPWJSONConverterStore that's a mapping store, it passes its "REST" requests through while performing certain transformations on the data and/or the references. In this case the transformation is serializing or deserialzing objects from/to JSON, depending on which way the request is going and which way the converter is pointing.

In this case, the converter is pointing "up", that is it serializes objects read from its source to JSON and deserializes data written to its source from JSON to objects. We also tell it that it is dealing with Task objects. When we have the converter we connect it to our todo scheme and tell the HTTP server to talk to the json converter (which talks to our todo scheme):


todo := #todo{ #taskList: taskList, #store: persistence }.
json := #MPWJSONConverterStore{  #up: true, #class: class:Task }.
json → todo.
server := #MPWSchemeHttpServer{ #scheme: json, #port: 8082 }.

Second, we also want to be to interact with individual tasks. No problem, just add a /task/:id proprerty path to our store/scheme handler, along with GET ("|=") and PUT ("=|") handlers. I am not fully sold yet on the "|=" syntax for this, but I would like to avoid names for this sort of structural component. Maybe arrows?
	/task/:id {
		|= {
			this:taskDict at:id .
		}
		=| {
			this:taskDict at:id put:newValue.
		}

In order to facilitate this, the taskList was changed to a dictionary. Once we make changes to our data, we probably also want to persist it. One easy way to do this is to store the tasks as JSON on disk. This allows us to reuse the JSON converter from above, but this time pointing "down". We connect this converter to the filesystem at the directory /tmp/tasks and to the store:
json → todo → #MPWJSONConverterStore{  #class: class:Task } → ref:file:/tmp/tasks/ asScheme.

In addition, we need to trigger saving in the PUT handler:
		=| {
			this:taskDict at:id put:newValue.
			self persist.
		}
	-persist {
		source:tasks := this:taskDict allValues.
	}
}

This will (synchronously) write the entire task list on every PUT. The full code is here:
#!env stsh
framework:ObjectiveHTTPD load.

class Task {
	var id.
	var  done.
	var title.
	-description { "Task: {this:title} done: {this:done} id: {this:id}". }
	-writeOnJSONStream:aStream {
		aStream writeDictionaryLikeObject:self withContentBlock:{ :writer |
			writer writeInteger: this:id forKey:'id'.
			writer writeString: this:title forKey:'title'.
			writer writeInteger: this:done forKey:'done'.
		}.
	}
}

taskList ← #( #Task{ #id: '1', #title: 'Clean Room', #done: false }, #Task{ #id: '2', #title: 'Check Twitter', #done: true } ).

scheme todo : MPWMappingStore {
	var taskDict.
	-setTaskList:aList {
		this:taskDict := NSMutableDictionary dictionaryWithObjects: aList forKeys: aList collect id.
	}
	/tasks { 
		|= { 
			this:taskDict allValues.
		}
	}
	/task/:id {
		|= {
			this:taskDict at:id .
		}
		=| {
			this:taskDict at:id put:newValue.
			self persist.
		}
	}
	-persist {
		source:tasks := this:taskDict allValues.
	}
}.

todo := #todo{ #taskList: taskList }.
json := #MPWJSONConverterStore{  #up: true, #class: class:Task }.
json → todo → #MPWJSONConverterStore{  #class: class:Task } → ref:file:/tmp/tasks/ asScheme.
server := #MPWSchemeHttpServer{ #scheme: json, #port: 8082 }.
server start.
shell runInteractiveLoop.

The writeOnJSONStream: method is currently still needed by the serializer to encode the task object as JSON. The parser doesn't need any support, it can figure things out by itself for simple mappings. Yes, this makes no sense, as serializing is easier than parsing, but I haven't gotten around to the automation for serializing yet.

Analysis

So there you have it, an almost functional Todo backend, in refreshingly little code, and with refreshingly little magic. What I find particularly pleasing is that this conciseness can be achieved while keeping the architecture fully visible and maintaining a hexagonal/ports-and-adapters style.

What is the architecture of this app? It says so right at the end: the server is parametrized by its scheme, and that scheme is a JSON serializer hooked up to my todo scheme handler, hooked up to another JSON serializer hooked up to the directory /tmp/tasks.

Although a Rails app contains comparably little code, this code is scattered over different classes and is only comprehensible as a plugin to Rails. All the architecture is hidden inside Rails, it is not at all visible in the code and simply cannot be divined from looking at the code. Although there are many reasons for this, one fundamental one is that Ruby is a call/return language, and Rails does its best to translate from the REST architectural style to something that is more natural in the call/return style. And it does an admirable job at it.

I do think that this example gives us a little glimpse into what I believe to be the power of Architecture Oriented Programming: the power and succinctness of frameworks, but with the simplicity, straightforwardness and reusability of more library-oriented styles.

Performance

I obviously couldn't resist benchmarking this, and to my great joy found that wrk now works on the M1. Since the interpreter isn't thread safe, I had to restrict it to a single connection and thread. My expectations were that it requests/s would be in the double to low triple digits, my fear was that it would be single digits. (The reason for that fear is the writeOnJSONStream: method that is called for every object serialized and is in interpreted Objective-S, probably one of the slowest language implementations currently in existence). To say I was surprised is an understatement. Stunned is more like it:
wrk -c 1 -t 1 http://localhost:8082/task/1 
Running 10s test @ http://localhost:8082/task/1
  1 threads and 1 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency   133.62us   14.45us   0.97ms   98.52%
    Req/Sec     7.50k   311.09     7.62k    99.01%
  75326 requests in 10.10s, 12.28MB read
Requests/sec:   7458.60
Transfer/sec:      1.22MBTransfer/sec:      1.97MB

More than 7K requests per second! Those M1 Macs really are fast. I wonder what it will be once I remove the need for the manually written writeOnJSONStream: method.

(NOTE: previous version said >12K requests/s, which is even more insane, but was with an incorrect URL that had the server returning 404s)