Saturday, December 14, 2019

The Four Stages of Objective-Smalltalk

One of the features that can be confusing about Objective-Smalltalk is that it actually has several parts that are each significant on their own, so frequently will focus on just one of these (which is fine!), but without realising that the other parts also exist, which is unfortunate as they are all valuable and complement each other. In fact, they can be described as stages that are (logically) built on top of each other.

1. WebScript 2 / "Shasta"

Objective-C has always had great integration with other languages, particularly with a plethora of scripting languages, from Tcl to Python and Ruby to Lisp and Scheme and their variants etc. This is due not just to the fact that the runtime is dynamic, but also that it is simple and C-based not just in terms of being implemented in C, but being a peer to C.

However, all of these suffer from having two somewhat disparate languages, with competing object models, runtimes, storage strategies etc. One language that did not have these issues was WebScript, part of WebObjects and essentially Objective-C-Script. The language was interpreted, a peer in which you could even implement categories on existing Objective-C objects, and so syntactically compatible that often you could just copy-paste code between the two. So close to the ideal scripting language for that environment.

However, the fact that Objective-C is already a hybrid with some ugly compromises means that these compromises often no longer make sense at all in the WebScript environment. For example, Objective-C strings need an added "@" character because plain double quotes are already taken by C strings, but there are no C strings in WebScripts. Primitive types like int can be declared, but are really objects, the declaration is a dummy, a NOP. Square brackets for message sends are needed in Objective-C to distinguish messages from the rest of the C syntax, but the that's also irrelevant in WebScript. And so on.

So the first stage of Objective-Smalltalk was/is to have all the good aspects of WebScript, but without the syntactic weirdness needed to match the syntactic weirdness of Objective-C that was needed because Objective-C was jammed into C. I am not the only one who figured out the obvious fact that such a language is, essentially, a variant of Smalltalk, and I do believe this pretty much matches what Brent Simmons called Shasta.

Implementation-wise, this works very similarly to WebScript in that everything in the language is an object and gets converted to/from primitives when sending or receiving messages as needed.

This is great for a much more interactive programming model than what we have/had (and the one we have seems to be deteriorating as we speak):

And not just for isolated fragments, but for interacting with and tweaking full applications as they are running:

2. Objective-C without the C

Of course, getting rid of the (syntactic) weirdnesses of Objective-C in our scripting language means that it is no longer (syntactically) compatible with Objective-C. Which is a shame.

It is a shame because this syntactic equivalence between Objective-C and WebScript meant that you could easily move code between them. Have a script that has become stable and you want to reuse it? Copy and paste that code into an Objective-C file and you're good to go. Need it faster? Same. Have some Objective-C code that you want to explore, create variants of etc? Paste it into WebScript. Such a smooth integration between scripting and "programming" is rare and valuable.

The "obvious" solution is to have a native AOT-compiled version of this scripting language and use it to replace Objective-C. Many if not all other scripting languages have struggled mightily with becoming a compiled language, either not getting there at all or requiring JIT compilers of enormous size, complexity, engineering effort and attack surface.

Since the semantic model of our scripting language ist just Objective-C, we know that we can AOT-compile this language with a fairly straightforward compiler, probably a lot simpler than even the C/Objective-C compilers currently used, and plugging into the existing toolchain. Which is nice.

The idea seems so obvious, but apparently it wasn't.

Everything so far would, taken together, make for a really nice replacement for Objective-C with a much more productive and, let's face it, fun developer experience. However, even given the advantages of a simpler language, smoothly integrated scripting/programming and instant builds, it's not really clear that yet another OO language is really sufficient, for example the Etoilé project or the eero language never went anywhere, despite both being very nice.

3. Beyond just Objects: Architecture Oriented Programming

Ever since my Diplomarbeit, Approaches to Composition and Refinement in Object-Oriented Design back in 1997, I've been interested in Software Architecture and Architecture Description Languages (ADLs) as a way of overcoming the problems we have when constructing larger pieces of software.

One thing I noticed very early is that the elements of an ADL closely match up with and generalise the elements of a programming language, for example an object-oriented language: object generalises to component, message to connector. So it seemed that any specific pogramming language is just a specialisation or instantiation of a more general "architecture language".

To explore this idea, I needed a language that was amenable to experimentation, by being both malleable enough as to allow a metasystem that can abstract away from objects and messages and simple/small enough to make experimentation feasible. A simple variant of Smalltalk would do the trick. More mature variants tend to push you towards building with what is there, rather than abstracting from it, they "...eat their young" (Alan Kay).

So Objective-Smalltalk fits the bill perfectly as a substrate for architecture-oriented programming. In fact, its being built on/with Objective-C, which came into being largely to connect the C/Unix world with the Smalltalk world, means it is already off to a good start.

What to build? How about not reinventing the wheel and simply picking the (arguably) 3 most successful/popular architectural styles:

  • OO (subsuming the other call/return styles)
  • Unix Pipes and Filters
  • REST
Again, surprisingly, at least to me, even these specific styles appear to align reasonably well with the elements we have in a programming language. OO is already well-developed in (Objective-)Smalltalk, dataflow maps to Smalltalk's assignment operator, which needed to be made polymorphic anyway, and REST at least partially maps to non-message identifiers, which also are not polymorphic in Smalltalk.

Having now built all of these abstractions into Objective-Smalltalk, I have to admit again to my surprise how well they work and work together. Yes, it was my thesis, and yes, I can now see confirmation bias everywhere, but it was also a bit of a long-shot.

4. Architecture Oriented Metaprogramming

The architectural styles described above are implemented in frameworks and their interfaces hard-coded into the language implementation. However, with three examples , it should now be feasible to create linguistic support for defining the architectural styles in the language itself, allowing users to define and refine their own architectural styles. This is ongoing work.

What now?

One of the key takeaways from this is that each stage is already quite useful, and probably a worthy project all by itself, it just gets Even Better™ with the addition of later stages. Another is that I need to get back to getting stage ready, as it wasn't actually needed for stage 3, at least not initially.

Thursday, November 14, 2019

Presenting (in) Objective-Smalltalk

2019 has been the year that I have started really talking about Objective-Smalltalk in earnest, because enough of the original vision is now in place.

My first talk was at the European Smalltalk User Group's (ESUG) annual conference in my old hometown of Cologne: (pdf)

This year's ESUG was was my first since Essen in 2001, and it almost seemed like a bit of a timewarp. Although more than half the talks were about Pharo, the subjects seemed mostly the same as back when: a bit of TDD, a bit of trying to deal with native threads (exactly the same issues I struggled with when I was doing the CocoaSqueak VM), a bit of 3D graphics that weren't any better than 3D graphics in other environments, but in Smalltalk.

One big topic was getting large (and very profitable) Smalltalk code-bases running on mobile devices such as iPhones. The top method was transpiling to JavaScript, another translating the VM code to JavaScript and then having that run off-the-shelf images. Objective-Smalltalk can also be put in this class, with a mix of interpretation and native compilation.

My second talk, I was at Germany's oldest Mac conference, Macoun in Frankfurt. The videos from there usually take a while, but here was a reaction:

"Anyone who wants a glimpse at the future should have watched @mpweiher's talk"

Aww, shucks, thanks, but I'll take it. :-)

I also had two papers accepted at SPLASH '19, one was Standard Object Out: Streaming Objects with Polymorphic Write Streams at the Dynamic Languages Symposium, the other was Storage Combinators at Onward!.

Anyway, one aspect of those talks that I didn't dwell on is that the presentations themselves were implemented in Objective-Smalltalk, in fact the definitions were Objective-Smalltalk expressions, complex object literals to be precise.

What follows is an abridged version of the ESUG presentation:


controller := #ASCPresentationViewController{
    #Name : 'ESUG Demo'.
    #Slides : #(

      #ASCChapterSlide { 
               #text : 'Objective-SmallTalk'.
               #subtitle : 'Marcel Weiher (@mpweiher)'
         }  ,

        #ASCBulletSlide{ 
             #title : 'Objective-SmallTalk'.
             #bullets : #( 
                'Embeddable SmallTalk language (Mac, iOS, Linux, Windows)',
                'Objective-C framework (peer/interop)',
                'Generalizes Objects+Messages to Components+Connectors',
                'Enable composition by solving Architectural Mismatch',
             )
        } ,
      #ASCBulletSlide{ 
             #title : 'The Gentle Tyranny of Call/Return'.
             #bullets : #( 
                'Feymnan: we name everything just a little wrong',
                'Multiparadigm: Procedural, OO and FP!',
                "Guy Steele: it's no longer about completion",
                "Oscar Nierstrasz: we were told we could just model the domain",
                "Andrew Black: good OO students antropmorphise the objects",
             )
        } ,

         #ProgramVsSystem { 
              #lightIntensities : #( 0.2 , 0.7 )
              
         }  ,


       #ASCSlideWithFigure{ 
             #delayInSeconds : 5.0.
             #title : 'Objects and Messages'.
             #bullets : #( 
                'Objective-C compatible semantics',
                'Interpreted and native-compiled',
                '"C" using type annotations',
                'Higher Order Messaging',
                'Framework-oriented development',
                'Full platform integration',
             )
        } ,
  

       #ASCBulletSlide{ 
             #title : 'Pipes and Filters'.
             #bullets : #( 
                'Polymorphic Write Streams (DLS ''19)',
                '#writeObject:anObject',
                'Triple Dispatch + Message chaining',
                'Asynchrony-agnostic',
                'Streaming / de-materialized objects',
                'Serialisation, PDF/PS (Squeak), Wunderlist, MS , To Do',
                'Outlook: filters generalise methods?',
            )
        } ,
 
       #ASCBulletSlide{ 
             #title : 'In-Process REST'.
             #bullets : #( 
                'What real large-scale networks use',
                'Polymorphic Identifiers',
                'Stores',
                'Storage Combinators',
                'Used in a number of applications',
             )
        } ,


       #ASCBulletSlide{ 
             #title : 'Polymorphic Identifiers'.
             #bullets : #( 
                'All identifiers are URIs',
                "var:hello := 'World!",
                'file:{env:HOME}/Downloads/site := http://objective.st',
                'slider setValueHolder: ref:var:celsius',
             )
        } ,

       #ASCBulletSlide{ 
             #title : 'Storage Combinators'.
             #bullets : #( 
                'Onward! ''19',
                'Combinator exposes + consumes REST interfaces',
                'Uniform interface (REST) enables pluggability',
                'Narrow, semantically tight interface enables intermediaries',
                '10x productivity/code improvments',
             )
        } ,


      #ImageSlide{ 
               #text : 'Simple Composed Store'.
               #imageURL : '/Users/marcel/Documents/Writing/Dissertation/Papers/StorageCombinators/disk-cache-json-aligned.png'.
               #xOffset : 2.0 .
               #imageScale : 0.8
         }  , 
      #ASCBulletSlide{ 
             #title : 'Outlook'.
             #bullets : #( 
                'Port Stores and Polymorphic Write Streams',
                'Documentation / Sample Code',
                'Improve native compiler',
                'Tooling (Debugger)',
                'You! (http://objective.st)',
             )
        }  ,


      #ASCChapterSlide { 
               #text : 'Q&A   http://objective.st'.
               #subtitle : 'Marcel Weiher (@mpweiher)'
         }  ,
      )
}. 


There are a number of things going on here:
  • Complex object literals
  • A 3D presentation framework
  • Custom behavior via custom classes
  • Framework-oriented programming
Let's look at these in turn.

Complex object literals

Objective-Smalltalk has literals for arrays (really: ordered collections) and dictionaries, like many other languages now. Array literals are taken from Smalltalk, with a hash and round braces: #(). Unlike other Smalltalks, entries are separated via commas, so #( 1,2,3) rather than #( 1 2 3 ). For dictionaries, I borrowed the curly braces from Objective-C, so #{}.

This gives us the ability to specify complex property lists directly in code. A common idiom in Mac/iOS development circles is to initialize objects from property lists, so something like the following:


presentation = [[MyPresentation alloc] initWithDictionary:aDictionary];

All complex object literals really do is add a little bit of syntactic support for this idiom, by noticing that the two respective character at the start of array and dictionay literals give us a space to put a name, a class name, between those two characters:


presentation := #MyPresentation{ ... };

This will parse the text between the curly brackets as a dictionary and then initialize a MyPresentation object with that dictionary using the exact -initWithDictionary: message given above. This may seem like a very minor convenience, and it is, but it actually makes it possible to simply write down objects, rather than having to write code that constructs objects. The difference is subtle but significant.

The benefit becomes more obvious once you have nested structures. A normal plist contains no specific class information, just arrays, dictionaries numbers and strings, and in the Objective-C example, that class information is provided externally, by passing the generic plist to a specific class instance.

(JSON has a similar problem, which is why I still prefer XML for object encoding.)

So either that knowledge must also be provided externally, for example by the implicit knowledge that all substructure is uniform, or custom mechanisms must be devised to encode that information inside the dictionaries or arrays. Ad hoc. Every single time.

Complex object identifiers create a common mechanism for this: each subdictionary or sub-array can be tagged with the class of the object to create, and there is a convenient and distinct syntax to do it.

A 3D presentation framework

One of the really cool wow! effects of Alan Kay's Squeak demos is always when he breaks through the expected boundaries of a presentation with slides and starts live programming and interactive sketching on the slide. The effect is verey similar to when characters break the "fourth wall", and tends to be strongest on the very jaded, who were previously dismissive of the whole presentation.

Alas, a drawback is that those presentations in Squeak tend to look a bit amateurish and cartoonish, not at all polished.

Along came the Apple SceneKit Team's presentations, which were done as Cocoa/SceneKit applications. Which is totally amazing, as it allows arbitrary programmability and integration with custom code, just like Alan's demos, but with a lot more polish.

Of course, an application like that isn't reusable, the effort is pretty high and interactivity low.

I wonder what we could do about that?

First: turn the presentation application into a framework (Slides3D). Second, drive that framework interactively with Objective-Smalltalk from my Workspace-like "Smalltalk" application: presentation.txt. After a bit of setup such as loading the framework (framework:Slides3D load.) and defining a few custom slide classes, it goes on to define the presentation using the literal shown above and then starts the presentation by telling the presentation controller to display itself in a window.


framework:Slides3D load.     
class ProgramVsSystem : ASCSlide {
   var code.
   var system.
   ...
}.
class ImageSlide : ASCSlide { 
     var text.
     var image.


      #ASCChapterSlide { 
               #text : 'Q&A   http://objective.st'.
               #subtitle : 'Marcel Weiher (@mpweiher)'
         }  ,
      )
}. 

controller := #ASCPresentationViewController{
    #Name : 'ESUG Demo'.
    #Slides : #(

      #ASCChapterSlide { 
               #text : 'Objective-SmallTalk'.
               #subtitle : 'Marcel Weiher (@mpweiher)'
         }  ,

       ...
      )
}. 
     
controller view openInWindow:'Objective-SmallTalk (ESUG 2019)'. 

Voilà: highly polished, programmatically driven presentations that I can edit interactively and with a somewhat convenient format. Of course, this is not a one-off for presentations: the same mechanism can be used to define other object hierarchise, including but not limited to interactive GUIs.

Framework-oriented programming

Which brings us to the method behind all this madness: the concept I call framework-oriented programming.

The concept is worth at least another article or two, but at its most basic boils down to: for goodness sake, put the bulk of your code in frameworks, not in an application. Even if all you are building is an application. One app that does this right is Xcode. On my machine, the entire app bundle is close to 10GB. But the actual Xcode binary in /Applications/Xcode.app/Contents/MacOS? 41KB. Yes, Kilobytes. And most of that is bookkeeping and boilerplate, it really just contains a C main() function, which I presume largely matches the one that Xcode generates.

Why?

Simple: an Apple framework (i.e.: a .framework bundle) is at least superficially composable, but a .app bundle is not. You can compose frameworks into bigger frameworks, and you can take a framework and use it in a different app. This is difficult to impossible with apps (and no, kludged-together AppleScript concoctions don't count).

And doing it is completely trivial: after you create an app project, just create a framework target alongside the app target, add that framework to the app and then add all code and resources to the framework target instead of to the app target. Except for the main() function. If you already have an app, just move the code to the framework target, making adjustments to bundle loading code (the relevant bundle is now the framework and no longer the app/main bundle). This is what I did to derive Slides3D from the WWDC 2013 SceneKit App.

What I've described so fa is just code packaging. If you also organize the actual code as an object-oriented framework, you will notice that with time it will evolve into a black-box framework, with objects that are created, configured and composed. This is somewhat tedious to do in the base language (see: creating Views programmatically), so the final evolutionary step is considered a DSL (Hello, SwiftUI!). However, most of this DSL tends to be just creating, configuring and connecting objects. In other words: complex object literals.

Monday, November 11, 2019

What Alan Kay Got Wrong About Objects

One of the anonymous reviewers of my recently published Storage Combinators paper (pdf) complained that hiding disk-based, remote, and local abstractions behind a common interface was a bad idea, citing Jim Waldo's A Note on Distributed Computing.

Having read both this and the related 8 Fallacies of Distributed Computing a while back, I didn't see how this would apply, and re-reading confirmed my vague recollections: these are about the problems of scaling things up from the local case to the distributed case, whereas Storage Combinators and In-Process REST are about scaling things down from the distributed case to the local case. Particularly the Waldo paper is also very specifically about objects and messages, REST is a different beast.

And of course scaling things down happens to be time-honored tradtition with a pretty good track record:

In computer terms, Smalltalk is a recursion on the notion of computer itself. Instead of dividing "computer stuff" into things each less strong than the whole—like data structures, procedures, and functions which are the usual paraphernalia of programming languages—each Smalltalk object is a recursion on the entire possibilities of the computer. Thus its semantics are a bit like having thousands and thousands of computers all hooked together by a very fast network.
Mind you, I think this is absolutely brilliant: in order to get something that will scale up, you simply start with something large and then scale it down!.

But of course, this actually did not happen. As we all experienced scaling local objects and messaging up to the distributed case did not (CORBA, SOAP,...), and as Waldo explains, cannot, in fact, work. What gives?

My guess is that the method described wasn't actually used: when Alan came up with his version of objects, there were no networks with thousands of computers. And so Alan could not actually look at how they communicated, he had to imagine it, it was a Gedankenexperiment. And thus objects and messages were not a scaled-down version of an actual larger thing, they were a scaled down version of an imagined larger thing.

Today, we do have a large network of computers, with not just thousands but billions of nodes. And they communicate via HTTP using the REST architectural style, not via distributed objects and messages.

So maybe if we took that communication model and scaled it down, we might be able to do even better than objects and messages, which already did pretty brilliantly. Hence In-Process REST, Polymorphic Identifiers and Storage Combinators, and yes, the results look pretty good so far!

The big idea is "messaging" -- that is what the kernal of Smalltalk/Squeak is all about (and it's something that was never quite completed in our Xerox PARC phase). The Japanese have a small word -- ma -- for "that which is in between" -- perhaps the nearest English equivalent is "interstitial". The key in making great and growable systems is much more to design how its modules communicate rather than what their internal properties and behaviors should be. Think of the internet -- to live, it (a) has to allow many different kinds of ideas and realizations that are beyond any single standard and (b) to allow varying degrees of safe interoperability between these ideas.

So of course Alan is right after all, just not about objects and messages, which are too specific: "ma", or "interstitialness" or "connector" is the big idea, messaging is just one incarnation of that idea.

Thursday, November 7, 2019

Instant Builds

One of the goals I am aiming for in Objective-Smalltalk is instant builds and effective live programming.

A month ago, I got a package from an old school friend: my old Apple ][+, which I thought I had given as a gift, but he insisted had been a long-term loan. That machine featured 48KB of DRAM and a 1 MHz, 8 bit 6502 processor that took multiple cycles for even the simplest instructions, had no multiply instructions and almost no registers. Yet, when I turn it on it becomes interactive faster than the CRT warms up, and the programming experience remains fully interactive after that. I type something in, it executes. I change the program, type "RUN" and off it goes.

Of course, you can also get that experience with more complex systems, Smalltalk comes to mind, but the point is that it doesn't take the most advanced technology or heroic effort to make systems interactive, what it takes is making it a priority.


But here we are indeed.

Now Swift is only one example of this, it's a current trend, and of course these systems do claim that they provide benefits that are worth the wait. From optimizations to static type-checking with type-inference, so that "once it compiles, it works". This is deemed to be (a) 100% worthwhile despite the fact that there is no scientific evidence backing up these claims (a paper which claimed that it had the evidence was just shredded at this year's OOPSLA) and (b) essentially cost-free. But of course it isn't cost free:

So when everyone zigs, I zag, it's my contrarian nature. Where Swift's message was, essentially "there is too much Smalltalk in Objective-C", my contention is that there is too little Smalltalk in Objective-C (and also that there is too little "Objective" in Smalltalk, but that's a different topic).

Smalltalk was perfectly interactive in its own environment on high end late 70s and early 80s hardware. With today's monsters of computation, there is no good reason, or excuse for that matter, to not be interactive even when taken into the slightly more demanding Unix/macOS/iOS development world. That doesn't mean there aren't loads of reasons, they're just not any good.

So Objective-Smalltalk will be fast, it will be live or near-live at all times, and it will have instant builds. This isn't going to be rocket science, mostly, the ingredients are as follows:

  1. An interpreter
  2. Late binding
  3. Separate compilation
  4. A fast and simple native compiler
Let's look at these in detail.

An interpreter

The basic implementation of Objective-Smalltalk is an AST-walking interpreter. No JIT, not even a simple bytecode interpreter. Which is about as pessimal as possible, but our machines are so incredibly fast, and a lot of our tasks simple enough or computational steering enough that it actually does a decent enough job for many of those tasks. (For more on this dynamic, see The Death of Optimizing Compilers by Daniel J. Bernstein)

And because it is just an interpreter, it has no problems doing its thing on iOS:

(Yes, this is in the simulator, but it works the same on an actual device)

Late Binding

Late binding nicely decouples the parts of our software. This means that the compiler has very little information about what happens and can't help a lot in terms of optimization or checking, something that always drove the compiler folks a little nuts ("but we want to help and there's so much we could do"). It enables strong modularity and separate compilation. Objective-Smalltalk is as late-bound in its messaging as Objective-C or Smalltalk are, but goes beyond them by also late-binding identifiers, storage and dataflow with Polymorphic Identifiers (ACM, pdf), Storage Combinators (ACM, pdf) and Polymorphic Write Streams (ACM, pdf).

Allowing this level of flexibility while still not requiring a Graal-level Helden-JIT to burn away all the abstractions at runtime will require careful design of the meta-level boundaries, but I think the technically desirable boundaries align very well with the conceptually desirable boundaries: use meta-level facilities to define the language you want to program in, then write your program.

It's not making these boundaries clear and freely mixing meta-level and base-level programming that gets us in not just conceptual trouble, but also into the kinds of technical trouble that the Heldencompilers and Helden-JITs have to bail us out of.

Separate Compilation

When you have good module boundaries, you can get separate compilation, meaning a change in file (or other code-containing entity if you don't like files) does not require changes to other files. Smalltalk had this. Unix-style C programming had this, and the concept of binary libraries (with the generalization to frameworks on macOS etc.). For some reason, this has taken more and more of a back-seat in macOS and iOS development, with full source inclusion and full builds becoming the norm in the community (see CocoaPods) and for a long time being enforced by Apple by not allowing user-define dynamic libraries on iOS.

While Swift allows separate compilation, this can have such severe negative effects on both performance and compile times that compiling everything on any change has become a "best practice". In fact, we now have a build option "whole module optimization with optimizations turned off" for debugging. I kid you not.

Objective-Smalltalk is designed to enable "Framework-oriented-programming", so separate compilation is and will remain a top priority.

A fast and simple native compiler

However, even with an interpreter for interactive adjustments, separate compilation due to good modularity and late binding, you sometimes want to do a full build, or need to rebuild a large part of the codebase.

Even that shouldn't take forever, and in fact it doesn't need to. I am totally with Jonathan Blow on this subject when he says that compiling a medium size project shouldn't really more than a second or so.

My current approach for getting there is using TinyCC's backend as the starting point of the backend for Objective-Smalltalk. After all, the semantics are (mostly) Objective-C and Objective-C's semantics are just C. What I really like about tcc is that it goes so brutally directly to outputting CPU opcode as binary bytes.


static void gcall_or_jmp(int is_jmp)
{
    int r;
    if ((vtop->r & (VT_VALMASK | VT_LVAL)) == VT_CONST &&
	((vtop->r & VT_SYM) && (vtop->c.i-4) == (int)(vtop->c.i-4))) {
        /* constant symbolic case -> simple relocation */
        greloca(cur_text_section, vtop->sym, ind + 1, R_X86_64_PLT32, (int)(vtop->c.i-4));
        oad(0xe8 + is_jmp, 0); /* call/jmp im */
    } else {
        /* otherwise, indirect call */
        r = TREG_R11;
        load(r, vtop);
        o(0x41); /* REX */
        o(0xff); /* call/jmp *r */
        o(0xd0 + REG_VALUE(r) + (is_jmp << 4));
    }
}

No layers of malloc()ed intermediate representations here! This aligns very nicely with the streaming/messaging approach to high-performance I've taken elsewhere with Polymorphic Write Streams (see above), so I am pretty confident I can make this (a) work and (b) simple/elegant while keeping it (c) fast.

How fast? I obviously don't know yet, but tcc is a fantastic starting point. The following is the current (=wrong) ObjectiveTcc code to drive tcc to build a function that sends a single message:


-(void)generateMessageSendTestFunctionWithName:(char*)name
{
    SEL flagMsg=@selector(setMsgFlag);
    [self functionOnlyWithName:name returnType:VT_INT argTypes:"" body:^{
        [self pushFunctionPointer:objc_msgSend];
        [self pushObject:self];
        [self pushPointer:flagMsg];
        [self call:2];
    }];
}

How often can I do this in one second? On my 2018 high spec but 13" MBP: 300,000 times. Including in-memory linking (though not much of that happening in this example), not including Mach-O generation as that's not implemented yet and writing the whole shebang to disk. I don't anticipate either of these taking appreciably additional time.

If we consider this 2 "lines" of code, one for the function/method header and one for the message, then we can generate binary for 600KLOC/s. So having a medium size program compile and link in about a second or so seems eminently doable, even if I manage to slow the raw Tcc performance down by about an order of magnitude.

(For comparison: the Swift code base that motivated the Rome caching system for Carthage was clocking in at around 60 lines per second with the then Swift compiler. So even with an anticipated order of magnitude slowdown we'd still be 1000x faster. 1000x is good enough, it's the difference between 3 seconds and an hour.)

What's the downside? Tcc doesn't do a lot of optimization. But that's OK as (a) the sorts of optimizations C compilers and backends like LLVM do aren't much use for highly polymorphic and late-bound code and (b) the basics get you around 80% of the way (c) most code doesn't need that much optimization (see above) and (d) machines have become really fast.

And it helps that we aren't doing crazy things like initially allocating function-local variables on the heap or doing function argument copying via vtables that require require leaning on the optimizer to get adequate performance (as in: not 100x slower..).

Defense in Depth

While any of these techniques might be adequate some of the time, it's the combination that I think will make the Objective-Smalltalk tooling a refreshing, pleasant and highly productive alternative to existing toolchains, because it will be reliably fast under all circumstances.

And it doesn't really take (much) rocket science, just a willingness to make this aspect a priority.

Saturday, April 27, 2019

What's Going Down at the TIOBE Index? Swift, Surprisingly

Last month I expressed my surprise at the fact that Objective-C was recovering its rankings in the TIOBE index, not quite to the lofty #3 spot it enjoyed a while ago, but to a solid 10, once again surpassing Swift, which had dropped to #17.

This month, Swift has dropped to #19 almost looking like it's going to fall out of the top 20 altogether.

Strange times.

Wednesday, April 3, 2019

Accessors Have Message Obsession

Just came across and older post by Nat Pryce on Message Obsession, which he describes as the opposite end of a spectrum from Primitive Obsession.

The example is a set of commands for moving a robot:


-moveNorth.
-moveSouth.
-moveWest.
-moveEast.

Although the duplication is annoying, the bigger problem is that there are two things, the verb "move" and a direction argument, mushed together into the message name. And that can cause further problems down the road:
"It’s awkward to work with this kind of interface because you can’t pass around, store or perform calculations on the direction at all."

He argues, convincingly IMHO, that the combined messages should be replaced by a single move: message with a separate direction argument. The current fashion would be to make direction an enum, but he (wisely, IMHO) turns it into a class that can encode different directions:
-move:direction.

class Direction {
  ...
}

So far so good. However...

...we have this message obsessions at a massively larger scale with accessors.


-attribute.
-setAttribute:newValue.

Every single attribute of every single class gets its own accessor or accessor pair, again with the action (get/set) mushed together with the name of the attribute to work on. The solution is the same as for the directions in Nat's example: there are only two actual messages, with reified identifiers.
-get:identifier.
-set:identifier to:value.

These, of course, correspond to the GET and PUT HTTP verbs. Properties, now available in a number of mainstream languages, are supposed to address this issue, but they only really address to 2:1 problem (getter and setter for an attribute). The much bigger N:2 problem (method pair for every attribute) remains unaddressed, and particularly you also cannot pass around, store or perform calculations on the identifier.

And it turns out that passing those identifiers around performing calculations on them is tremendously powerful, even if you don't have language support. Without language support, the interface between the world of reified identifiers and objects can be a bit awkward.

Saturday, March 23, 2019

What's up at the TIOBE Index? Surprisingly, Objective-C

When Apple introduced Swift, Objective-C quickly dropped down from its number 3 spot in the TIOBE index. Way down. And it certainly seemed obvious that from that day on, this was the only direction it would ever go.

Imagine my surprise when I looked earlier this March and found it back up, no, not in the lofty heights it used to occupy, but at least in tenth place (up from 14th a year earlier), and actually surpassing Swift again, which dropped by almost half in its percent rating and from 12th to 17th place in the rankings.

What's going on here?

Tuesday, March 19, 2019

LISP Macros, Delayed Evaluation and the Evolution of Smalltalk

At a recent Clojure Berlin Meetup, Veit Heller gave an interesting talk on Macros in Clojure. The meetup was very enjoyable, and the talk also brought me a little closer to understanding the relationship between functions and macros and a bit of Smalltalk evolution that had so far eluded me.

The question, which has been bugging me for some time, is when do we actually need metaprogramming facilities like macros, and why? After all, we already have functions and methods for capturing and extracting common functionality. A facile answer is that "Macros extend the language", but so do functions, in their way. Another answer is that you have to use Macros when you can't make progress any other way, but that doesn't really answer the question either.

The reason the question is relevant is, of course, that although it is fun to play around with powerful mechanisms, we should always use the least powerful mechanism that will accomplish our goal, as it will be easier to program with, easier to understand, easier to analyse and build tools for, and easier to maintain.

Anyway, the answer in this case seemed to be that macros were needed in order to "delay evaluation", to send unevaluated parameters to the macros. A quick question to the presenter confirmed that this was the case for most of the examples. Which begs the question: if we had a generic mechanism for delaying evluation, could we have used plain functions (or methods) instead, and indeed the answer was that this was the case.

One of the examples was a way to build your own if, which most languages have built in, but Smalltalk famously implements in the class library: there is an ifTrue:ifFalse: message that takes two blocks (closures) as parameters. The True class evaluates the first block parameter and ignores the second, the False class evaluates the second block parameter and ignores the first.

The Clojure macro example worked almost exactly the same way, but where Smalltalk uses blocks to delay evaluation, the example used macros. So where LISP might use macros, Smalltalk uses blocks. That macros and blocks might be related was new to me, and took me a while to process. Once I had processed it, a bit of Smalltalk history that I had always struggled with, this bit about Smalltalk-76, suddenly made sense:



Why did it "have to" provide such a mechanism? It doesn't say. It says this mechanism was replaced by the equivalent blocks, but blocks/anonymous functions seem quite different from alternate argument-passing mechanisms. Huh?

With this new insight, it suddenly makes sense. Smalltalk-72 just had a token-stream, there were no "arguments" as such, the new method just took over parsing the token stream and picked up the paramters from there. In a sense, the ultimate macro system and ultimately powerful, but also quite unusable, incomprehensible, unmaintainable and not compilable. In that system, "arguments" are per-definition unevaluated and so you can do all the macro-like magic you want.

Dan's Smalltalk-76 effort was largely about compiling for better performance and having a stable, comprehensible and composable syntax. But there are times you still need unevaluated arguments, for example if you want to implement an if that only evaluates one of its branches, not both of them, without baking it into the language. Smalltalk did not have a macro mechanism, and it no longer had the Smalltalk-72 token-stream where un-evaluated "arguments" came for free, so yes, there "had" to be some sort of mechanism for unevaluated arguments.

Hence the open-colon syntax.

And we have a progression of: Smalltalk-72 token stream → Smalltalk-76 open colon parameters → Smalltalk-80 blocks.
All serving the purpose of enabling macro-like capabilities without actually having macros by providing a general language facility for passing un-evaluated parameters.

Aha!

Saturday, March 9, 2019

Software-ICs, Binary Compatibility, and Objective-Swift

Swift recently achieved ABI stability, meaning that we can now ship Swift binaries without having to ship the corresponding Swift libraries. While it's been a long time coming, it's also great to have finally reached this point. However, it turns out that this does not mean you can reasonably ship binary Swift frameworks, for reasons described very well by Peter Steinberger of PSPDFKit and the good folks at instabug.

To reach this not-quite-there-yet state took almost 5 years, which is pretty much the total time NeXT shipped their hardware, and it mirrors the state with C++, which is still not generally suitable for binary distribution of libraries. Objective-C didn't have these problems, and as it turns out this is not a coincidence.

Software ICs

Objective-C was created specifically to implement the concept of Software-ICs. I briefly referenced the concept in a previous article, and also mentioned its relationship to the scripted components pattern, but the comments indicated that this is no longer a concept people are familiar with.

As the name suggests the intention was to bring the benefits the hardware world had reaped from the introduction of the Integrated Circuits to the software world.

It is probably hard to overstate the importance of ICs to the development of the computer industry. Instead of assembling computers from discrete components, you could now put entire subsystem onto one component, and then compose these subsystems to form systems. The interfaces are standardised pins, and the relationship between the outside interface and the complexity hidden inside can be staggering. Although the socket of the CPU I am writing is a beast, with 1151 pins, the chip inside has a staggering 2.1 billion transistors. With a ratio of one million to one, that's a very deep interface, even if you disregard the fact that the bulk of those pins are actually voltage supply and ground pins.

The important point is that you do not have to, and in fact cannot, look inside the IC. You get the pins, very much a binary interface, and the documentation, a specification sheet. With Software-ICs, the idea was the same: you get a binary, the interface and a specification sheet. Here are two BYTE articles that describe the concepts:


A lot of what they write seems quaint now, for example a MailFolder that inherits from Array(!), but the concepts are very relevant, particularly with a couple of decades worth of perspective and the new circumstances we find ourselves in.

Although the authors pretty much equate Software-ICs with objects and object-oriented programming, it is a slightly different form of object-oriented programming than the one we mostly use today. They do write about object/message programming, similar to Alan Kay's note that 'The big idea is "messaging"'.

With messaging as the interconnect, similar to Unix pipes, our interfaces are sufficiently well-defined and dynamic that we really can deliver our Software-ICs in binary form and be compatible, something our more static languages like C++ and Swift struggle with.

Objective-C is middleware with language features.

Message Oriented Middleware

Virtually all systems based on static languages eventually grow an additional, separate and more dynamic component mechanism. Windows has COM, IBM has SOM, Qt has signals and slots and the meta-object system, Be had BMessages etc.

In fact, the problem of binary compatibility of C++ objects was one of the reasons for creating COM:

Unlike C++, COM provides a stable application binary interface (ABI) that does not change between compiler releases.
COM has been incredibly successful, it enables(-ed?) much of the Windows and Office ecosystems. In fact, there is even a COM implementation on macOS: CFPlugin, part of CoreFoundation.

CFPlugIn provides a standard architecture for application extensions. With CFPlugIn, you can design your application as a host framework that uses a set of executable code modules called plug-ins to provide certain well-defined areas of functionality. This approach allows third-party developers to add features to your application without requiring access to your source code. You can also bundle together plug-ins for multiple platforms and let CFPlugIn transparently load the appropriate plug-in at runtime. You can use CFPlugIn to add plug-in capability to, or write a plug-in for, your application.
That COM implementation is still in use, for example for writing Spotlight importers. However, there are, er, issues:
Creating a new Spotlight importer is tricky because they are based on CFPlugIn, and CFPlugIn is… well, how to say this diplomatically?… super ugly )-: One option here is to use Xcode 9 to create your plug-in based on the old template. Honestly though, I don’t recommend that because the old template… again, diplomatically… well, let’s just say that the old template lets the true nature of CFPlugIn shine through! (-:
Having written both Spotlight importers and even some COM component on Windows (I think it was just for testing), I can confirm that COM's success is not due to the elegance or ease-of-use of the implementation, but due to the fact that having an interoperable, stable binary interface is incredibly enabling for a platform.

That said, all this talk of COM is a bit confusing, because we already have NSBundle.

Apple uses bundles to represent apps, frameworks, plug-ins, and many other specific types of content.
So NSBundle already does everything a CFPlugin does and a lot more, but is really just a tiny wrapper around a directory that may contain a dynamic shared library. All the interfacing, introspection and binary compatibility features come automagically with Objective-C. In fact, NeXT had a Windows product called d'OLE that pretty automagically turned Objective-C libraries into COM-comptible OLE servers (.NET has similar capabilities). Again, this is not a coincidence, the Software-IC concept that Objective-C is based on is predicated on exactly this sort of interoperation scenario.

Objective-C is middleware with language features.

Frameworks and Microservices

To me, the idea of a Software-IC is actually somewhat higher level than a single object, I tend to see it at the level of a framework, which just so happens to provide all the trappings of a self-contained Software-IC: a binary, headers to define the interface and hopefully some documentation, which could even be provided in the Resources directory of the bundle. In addition, frameworks are instances of NSBundle, so they aren't limited to being linked into an application, they can also be loaded dynamically.

I use that capability in Objective-Smalltalk, particularly together with the stsh the Smalltalk Scripting Shell. By loading frameworks, this shell can easily be transformed into an application-specific scripting language. An example of this is pdfsh, a shell for examining an manipulating PDF files using EGOS, the Extensible Graphical Object System.


#!/usr/local/bin/stsh
#-<void>pdfsh:<ref>file
framework:EGOS_Cocoa load.
pdf := MPWPDFDocument alloc initWithData: file value.
shell runInteractiveLoop

The same binary framework is also used in in PdfCompress, PostView and BookLightning. With this framework, my record for creating a drag-and-drop applicaton to do something useful with a PDF file was 5 minutes, and the only reason I was so slow was that I thought I had remembered the PDF dictionary entry...and had not.

Framework-oriented programming is awesome, alas it was very much deprecated by Apple for quite some time, in fact even impossible on iOS until dynamic libraries were allowed. Even now, though, the idea is that you create an app, which consists of all the source-code needed to create it (exception: Apple code!), even if some of that code may be organised into framework units that otherwise don't have much meaning to the build.

Apps, however are not Software-ICs, they aren't the right packaging technology for reuse (AppleScript notwithstanding). And so iOS and macOS development shops routinely get themselves into big messes, also known as the Big Ball of Mud architectural pattern.

Of course, there are reasons that things didn't quite work out the way we would have liked. Certainly Apple's initial Mac OS X System Architecture book showed a much more flexible arrangement, with groups of applications able to share a set of frameworks, for example. However, DLL hell is a thing, and so we got a much more restricted approach where every app is a little fortress and frameworks in general and binary frameworks in particular are really something for Apple to provide and for the rest to use. However, the fact that we didn't manage to get this right doesn't mean that the need went away.

Swift has been making this worse, by strongly "suggesting" that everything be compiled together and leading to such wonderful oxymorons as "whole module optimisation in debug mode", meaning without optimisation. That and not having a binary modularity story for going on half a decade. The reason for compiling whole modules together is that the modularity mechanism is, practically speaking, very much source-code based, with generics and specialisation etc. (Ironically, Swift also does some pretty crazy things to enable separate compilation, but that hasn't really panned out so far).

On the other hand, Swift's compiler is so slow that teams are rediscovering forms of framework-oriented programming as a self-defense mechanism. In order to get feedback cycles down from ludicrously bad to just plain awful, they split up their projects into independent frameworks that they then compile and run independently during development. So in a somewhat roundabout way, Swift is encouraging good development practices.

I find it somewhat interesting that the industry is rediscovering variants of the Software-IC, in this case on the backend in the form of Microservices. Why do I say that Microservices are a form of Software-IC? Well, they are a binary unit of deployability, fairly loosely coupled and dynamically typed. In fact, Fred George, one of the people who came up with the idea refers to them as Smalltalk objects:

Of course, there are issues with this approach, one being that reliable method calls are replaced with unreliable network calls. Stepping back for a second should make it clear that the purported benefits of Microservices also largely apply to Software-ICs. At least real Software-ICs. Objective-C made the mistake of equating Software-ICs with objects, and while the concepts are similar with quite a bit of overlap, they are not quite the same. You certainly can use Objective-C to build and connect Software-ICs if you want to do that. It will also help you in this endeavour, but of course you have to know that this is something you want. It doesn't do this automatically and over time the usage of Objective-C has shifted to just a regular old object-oriented language, something it is OK but not that brilliant at.

Interoperability

One of the interesting aspects of Microservices is that they are language-agnostic, it doesn't matter what language a particular services is written in, as long as they can somehow communicate via HTTP(S). This is another similarity to Software-ICs (and other middleware such as COM, SOM, etc.): there is a fairly narrowly defined, simple interface, and as long as you can somehow service that interface, you can play.

Microservices are pretty good at this, Unix filters probably the most extreme example and just about every language and every kind of application on Windows can talk to and via COM. NeXT only ever sold 50000 computers, but in a short number of years the NeXT community had bridges to just about every language imaginable. There were a number of Objective- languages, including Objective-Fortran. Apple alone has around 140K employees (though probably a large number of those in retail), and there are over 2.8 million iOS developers, yet the only language integration for Swift I know of is the Python support, and that took significant effort, compiler changes and the original Swift creator, Chris Lattner.

This is not a coincidence. Swift is designed as a programming language, not as middleware with language features. Therefore its modularity features are an add-on to the language, and try to transport the full richness of that programming model. And Swift's programming model is very rich.

Objective-Swift

The middlewares I've talked about use the opposite approach, from the outside in. For SOM, it is described as such:
SOM allows classes of objects to be defined in one programming language and used in another, and it allows libraries of such classes to be updated without requiring client code to be recompiled.
So you define interfaces separately from their implementations. I am guessing this is part of the reason we have @interface in Objective-C. Having to write things down twice can be a pain (and I've worked on projects that auto-generated Objective-C headers from implementation files), but having a concrete manifestation of the interface that precedes the implementation is also very valuable. (One of the reasons TDD is so useful is that it also forces you to think about the interface to your object before you implement it).

In Swift, a class is a single implementation-focused entity, with its interface at best a second-class and second-order effect. This makes writing components more convenient (no need to auto-generate headers...), but connecting components is more complicated.

Which brings us back to that other complication, the lack of stable binary compatibility for libraries and frameworks. One consequence of this is to write frameworks exclusively in Objective-C, which was particularly necessary before ABI stability had been reached. The other workaround, if you have Swift code, is to have an Objective-C wrapper as an interface to your framework. The fact that Swift interoperates transparently with the Objective-C runtime makes this fairly straightforward.

Did I mention that Objective-C is middleware with language features?

So maybe this supposed "workaround" is actually the solution? Have Objective-C or Objective- as our message-oriented middleware, the way it was always intended? Maybe with a bit of a tweak so that it loses most of the C legacy and gains support for pipes and filters, REST/Microservices and other architectural patterns?

Just sayin'.

Wednesday, February 20, 2019

An Unexpected Benefit of Uniform Interfaces

In the previous post, My Objective-Smalltalk Tipping Point: Scheme Handler for a Method Browser, I described how I had reached a point where Objective-Smalltalk code is just so much better than the equivalent Objective-C that the idea of porting back some Objective-Smalltalk to Objective-C in order to better integrate with the surrounding Objective-C code seems a bit abhorrent.

So instead I have chosen to just integrate the code into that Objective-C code base. So load the code:


NSData *classdefSchemeCode=[self frameworkResource:@"classdef-method-browser-scheme" category:@"stsh"];
[interpreter evaluateScriptString:[classdefSchemeCode stringValue]];

Then instantiate the class after finding it reflectively via NSClassFromString()
-(void)awakeFromNib
{
	...
    self.methodStore = [NSClassFromString(@"ClassBrowser") store];
}

So far, so easy. Of course, the big problem with bridged code usually comes now: the compiler doesn't know about code loaded at runtime, so you need to somehow duplicate the class's interface and somehow convince the compiler that the newly loaded class conforms to that interface.

But wait! This is a scheme-handler, which is a store, meaning it doesn't really have any unique interface of its own, but rather just implements the MPWStorage protocol. So all I have to do is the following:


@property (nonatomic, strong)   id  methodStore;

And I'm good to go!

This is unexpected in two ways: first, the integration pain I was expecting just didn't appear. Happy. Second, and maybe more importantly, the benefit of uniform interfaces, which I thought should appear, actually did appear!

So very happy.

Friday, February 15, 2019

My Objective-Smalltalk Tipping Point: Scheme Handler for a Method Browser

The psychological tipping point for a programming language comes when you really, really don't want to go back to the previous language (usually the implementation language). I remember this feeling from many years ago when I was cobbling together an Objective-C implementation (this was pre-NeXT and pre gcc-Objective-C). At some point, enough of Objective-C was working, and the feel so much more pleasant, that I wanted to only implement new stuff with the new language, no matter how many warts the implementation still had (many).

It seems I have now reached that point with Objective-Smalltalk. Now it was pretty much always the case that I preferred the Objective-Smalltalk code to equivalent Objective-C code, but since it was mostly scripts and other "final" code, the comparison never really came up. With a somewhat workable if incomplete class and scheme (and filter) definition syntax in place, and therefore Objective-Smalltalk now at least theoretically capable of delivering reusable code, that is no longer the case.

Specifically, I have a little Smalltalk-inspired method browser that I use for ObjST-based live coding environments (AppLive for Apps and SiteBuilder for websites).

Yes: "Programmer UI". I'll clean that up later. What I am currently cleaning up is the interface between the browser and the "method store", which is horrendous. It is also tied to a specific, property-list based implementation of the method store.

The idea there is to allow external editing of the classes/methods. A browser for hierarchically nested structures...could this be a job for scheme handlers? Why yes, glad you asked!


#!/usr/local/bin/stsh
#-methodbrowser:classdef


scheme ClassBrowser  {
  var dictionary.
  -initWithDictionary:aDict {
     self setDictionary:aDict.
     self.
  }
  -classDefs {
     self dictionary at:'methodDict'.
  }
  /. { 
     |= {
       self classDefs allKeys.
     }
  }

  /:className/:which/:methodName  { 
     |= {
       self classDefs at:className | at:which | at:methodName.
     }
     =| {
       self classDefs at:className | at:which | at:methodName put:newValue.
     }
  }

  /:className/:which { 
     |= {
       self classDefs at:className | at:which | allKeys.
     }
  }

}

scheme:browser := ClassBrowser alloc initWithDictionary: classdef value propertyList.
stdout do println: browser:. each. 
shell runInteractiveLoop.

Despite the fact that there are still some issues to resolve, for example this code makes it clear that something needs to be done about navigation vs. final access, it still strikes me as a clear and succinct expression of what is going on. Now I sort of need to rewrite it in Objective-C, because both AppLive and SiteBuilder are Objective-C projects.

And I really don't want to.

I really, really don't want to. The idea of recasting this logic just strikes me as abhorrent, so much that the somewhat daunting prospect of significantly improving my Objective-Smalltalk native compilation facilities looks much more attractive.

That's the tipping point.







For reference, here's the original Objective-C code. What, you didn't believe me that it's horrendous? It also does a few things that the Objective-Smalltalk code doesn't do yet, but those aren't that significant.


//
//  MethodDict.h
//  MPWTalk
//
//  Created by Marcel Weiher on 10/16/11.
//  Copyright (c) 2012 Marcel Weiher. All rights reserved.
//

#import <Foundation/Foundation.h>

@protocol MethodDict

-(NSArray*)instanceMethodsForClass:(NSString*)className;
-(NSArray*)classMethodsForClass:(NSString*)className;

-(NSString*)fullNameForMethodName:(NSString*)shortName ofClass:(NSString*)className;
-(NSString*)methodForClass:(NSString*)className methodName:(NSString*)methodName;
-(void)setClassMethod:(NSString*)methodBody name:(NSString*)methodName  forClass:(NSString*)className;
-(void)setInstanceMethod:(NSString*)methodBody name:(NSString*)methodName  forClass:(NSString*)className;

-(void)deleteInstanceMethodName:(NSString*)methodName forClass:(NSString*)className;
-(void)deleteClassMethodName:(NSString*)methodName forClass:(NSString*)className;

-(NSMutableDictionary*)addClassWithName:(NSString*)newClassName;
-(void)deleteClass:(NSString*)className;

-(NSString*)instanceMethodForClass:(NSString*)className methodName:(NSString*)methodName;
-(NSString*)classMethodForClass:(NSString*)className methodName:(NSString*)methodName;

@end



@interface MethodDict : NSObject
{
    NSMutableDictionary *dict;
}


- (NSDictionary *)dict;
-initWithDict:(NSDictionary*)newDict;


-(NSArray*)classes;


@end



//
//  MethodDict.m
//  MPWTalk
//
//  Created by Marcel Weiher on 10/16/11.
//  Copyright (c) 2012 Marcel Weiher. All rights reserved.
//

#import "MethodDict.h"
#import <MPWFoundation/MPWFoundation.h>
#import <ObjectiveSmalltalk/MethodHeader.h>

@implementation NSString(methodName)

-methodName
{
    MPWMethodHeader *header=[MPWMethodHeader methodHeaderWithString:self];
    return [header methodName];
}

@end

@implementation MethodDict

objectAccessor(NSMutableDictionary, dict, setDict)


-initWithDict:(NSDictionary*)newDict
{
    self = [super init];
    [self setDict:[[newDict mutableCopy] autorelease]];
    return self;
}

-(NSData*)asXml
{
    NSData *data=[NSPropertyListSerialization dataFromPropertyList:[self dict] format:NSPropertyListXMLFormat_v1_0 errorDescription:nil];
    return data;
}

-(NSArray*)classes
{
    return [[[self dict] allKeys] sortedArrayUsingSelector:@selector(compare:)];
}

-(NSMutableDictionary*)classDictForName:(NSString*)className
{
    return [[self dict] objectForKey:className];
}

-(void)deleteClass:(NSString*)className
{
    return [[self dict] removeObjectForKey:className];
}

-(NSMutableDictionary*)methodDictForClass:(NSString*)className classMethods:(BOOL)isClassMethod
{
    NSString *key=isClassMethod ? @"classMethods" : @"instanceMethods";
    
    return [[[self dict] objectForKey:className] objectForKey:key];
}



-(NSArray*)methdodsForClass:(NSString*)className getClassMethods:(BOOL)classMethods
{
    NSDictionary *methodDict=[self methodDictForClass:className classMethods:classMethods];
    NSArray* methodKeys = [methodDict allKeys];
    if ( [methodKeys count]) {
        return [(NSArray*)[[methodKeys collect] methodName] sortedArrayUsingSelector:@selector(compare:)];
    }
    return [NSArray array];
}

-(NSArray*)instanceMethodsForClass:(NSString*)className
{
    return  [self methdodsForClass:className getClassMethods:NO];
}

-(NSArray*)classMethodsForClass:(NSString*)className
{
    return  [self methdodsForClass:className getClassMethods:YES];
}


-(NSString*)fullNameForMethodName:(NSString*)shortName ofClass:(NSString*)className
{
    NSArray *fullNames = [[self methodDictForClass:className classMethods:NO] allKeys];
    fullNames=[fullNames arrayByAddingObjectsFromArray:[[self methodDictForClass:className classMethods:YES] allKeys]];
    for ( NSString *fullName in fullNames ) {
        if ( [[fullName methodName] isEqual:shortName] ) {
            return fullName;
        }
    }
    return nil;
}

-(NSString*)instanceMethodForClass:(NSString*)className methodName:(NSString*)methodName
{
    
    return [[self methodDictForClass:className classMethods:NO] objectForKey:[self fullNameForMethodName:methodName ofClass:className]];
}

-(NSString*)classMethodForClass:(NSString*)className methodName:(NSString*)methodName
{
    
    return [[self methodDictForClass:className classMethods:YES] objectForKey:[self fullNameForMethodName:methodName ofClass:className]];
}

-(NSString*)methodForClass:(NSString*)className methodName:(NSString*)methodName
{
    
    return [self instanceMethodForClass:className methodName:methodName];
}


-(void)setMethod:(NSString*)methodBody name:(NSString*)methodName  forClass:(NSString*)className isClassMethod:(BOOL)isClassMethod
{
    NSMutableDictionary *methodDict = [self methodDictForClass:className classMethods:isClassMethod];
    if ( !methodDict ) {
        [self addClassWithName:className];
        methodDict = [self methodDictForClass:className classMethods:isClassMethod];
    }
    [methodDict setObject:methodBody forKey:methodName];
}

-(void)setClassMethod:(NSString*)methodBody name:(NSString*)methodName  forClass:(NSString*)className
{
    [self setMethod:methodBody name:methodName forClass:className isClassMethod:YES];
}

-(void)setInstanceMethod:(NSString*)methodBody name:(NSString*)methodName  forClass:(NSString*)className
{
    [self setMethod:methodBody name:methodName forClass:className isClassMethod:NO];
}

-(NSMutableDictionary*)addClassWithName:(NSString*)newClassName
{
    NSMutableDictionary *classDict=[self classDictForName:newClassName];
    if ( !classDict) {
        classDict=[NSMutableDictionary dictionary];
        classDict[@"instanceMethods"]=[NSMutableDictionary dictionary];
        classDict[@"classMethods"]=[NSMutableDictionary dictionary];
        dict[newClassName]=classDict;
    }
    return classDict;
}


-(void)deleteMethodName:(NSString*)methodName forClass:(NSString*)className isClassMethod:(BOOL)isClassMethod
{
    NSMutableDictionary *methodDict = [self methodDictForClass:className classMethods:isClassMethod];

    [methodDict removeObjectForKey:[self fullNameForMethodName:methodName ofClass:className]];
}

-(void)deleteInstanceMethodName:(NSString*)methodName forClass:(NSString*)className
{
    [[self methodDictForClass:className classMethods:NO] removeObjectForKey:[self fullNameForMethodName:methodName ofClass:className]];
}

-(void)deleteClassMethodName:(NSString*)methodName forClass:(NSString*)className
{
    [[self methodDictForClass:className classMethods:YES] removeObjectForKey:[self fullNameForMethodName:methodName ofClass:className]];
}

-(NSString*)description
{
    return [NSString stringWithFormat:@"<%@:%p: %@>",[self class],self,dict];
}

@end

Sunday, February 10, 2019

Why Architecture Oriented Programming Matters

On re-reading John Hughes influential Why Functional Programming Matters, two things stood out for me.

The first was the claim that, "...a functional programmer is an order of magnitude more productive than his or her conventional counterpart, because functional programs are an order of magnitude shorter." That's a bold claim, though he cleverly attributes this claim not to himself but to unspecified others: "Functional programmers argue that...".

If there is evidence for this 10x claim, I'd like to see it. The only somewhat systematic assessment of "language productivity" is the Caper Jones language-level evaluation, which is computed on the lines of code needed to achieve a function point worth of software. In this evaluation, Haskell achieved a level of 7.5, whereas Smalltalk was rated 15, so twice as productive. While I don't see this is conclusive, it certainly doesn't support a claim of vastly superior productivity.

That little niggle out of way, I think he does make some very insightful and important points in the second section. Talking about structured programming he concludes that:

With the benefit of hindsight, it’s clear that these properties of structured programs, although helpful, do not go to the heart of the matter. The most important difference between structured and unstructured programs is that structured programs are designed in a modular way. Modular design brings with it great productivity improvements. First of all, small modules can be coded quickly and easily. Second, general-purpose modules can be reused, leading to faster development of subsequent programs. Third, the modules of a program can be tested independently, helping to reduce the time spent debugging.
He goes on:
However, there is a very important point that is often missed. When writing a modular program to solve a problem, one first divides the problem into subproblems, then solves the subproblems, and finally combines the solutions.
And then comes the zinger:
The ways in which one can divide up the original problem depend directly on the ways in which one can glue solutions together. Therefore, to increase one’s ability to modularize a problem conceptually, one must provide new kinds of glue in the programming language.
Yes, I put that in bold. I'd also put a <blink>-tag on it, but fortunately for everyone involved that tag is obsolete. Anyway, he then makes some partly good, partly debatable points about the benefits of the two kinds of glue he says FP provides: function composition and lazy evaluation.

For me, the key point here is not those specific benefits, but the number 2. He just made the important point that "one must provide new kinds of glue" and then the amount of "new kinds of glue" is the smallest number that actually justifies using the plural. That seems a bit on the low side, particularly because the number also seems to be fixed.

I'd venture to say that we need a lot more different kinds of glue, and we need our kinds of glue to be extensible, to be user-defined. But first, what is this "glue"? Do we have some other term for it? Maybe Alan Kay can help?

The Japanese have a small word -- ma -- for "that which is in between" -- perhaps the nearest English equivalent is "interstitial". The key in making great and growable systems is much more to design how its modules communicate rather than what their internal properties and behaviors should be. Think of the internet -- to live, it (a) has to allow many different kinds of ideas and realizations that are beyond any single standard and (b) to allow varying degrees of safe interoperability between these ideas.
Alan Kay: prototypes vs classes was: Re: Sun's HotSpot

OK, that's an additional insight along the same lines, but doesn't really help us. Fortunately the Software Architecture community has something for us, the idea of a Connector.

Connectors are the locus of relations among components. They mediate interactions but are not “things” to be hooked up (they are, rather, the hookers-up).
Mary Shaw: Procedure Calls Are the Assembly Language of Software Interconnection: Connectors Deserve First-Class Status

The subtitle of that paper by Mary Shaw is the solution: connectors deserve first-class-status. Connectors are the "ma" that goes in-between, the glue that we need "lots more" of ("lots" > 2); and when we give them first class status, we can create more, put them in libraries, and adapt them to our needs.

That's why I think Architecture Oriented Programming matters, and why I am creating Objective-Smalltalk.

UPDATE (2023)

Something I didn't mention in the original article was that the second kind of glue FP supposedly provides, lazy evaluation, was at best a bit iffy, because the evidence for that was a bit hand-wavy. Well, it turns out that an analysis of R code shows that lazy evaluation is essentially unused, at least in package code.

So that means the number of kinds of glue that FP provides is...one.

Friday, February 8, 2019

A small (and objective) taste of Objective-Smalltalk

I've been making good progress on Objective-Smalltalk recently. Apart from the port to GNUstep that allowed me to run the official site on it (shrugging off the HN hug of death in the process), I've also been concentrating on not just pushing the concepts further, but also on adding some of the more mundane bits that are just needed for a programming language.

And so my programs have been getting longer and more useful, and I am starting to actually see the effect of "I'd rather write this on Objective-Smalltalk than Objective-C". And with that, I thought I'd share one of these slightly larger examples, show how it works, what's cool and possibly a bit weird, and where work is still needed (lots!).

The code I am showing is a script that implements a generic scheme handler for sqlite databases and then uses that scheme handler to access the database given on the command line. When you run it, in this case with the sample database Chinook.db, it allows you to interact with the database using URIs using the db scheme. For example, db:. lists the available tables:


> db:. 
( "albums","sqlite_sequence","artists","customers","employees","genres","invoices",
 "invoice_items","media_types","playlists","playlist_track","tracks","sqlite_stat1") 

You can then access entire tables, for example db:albums would show all the albums, or you can access a specific album:
> db:albums/3
{ "AlbumId" = 4;
"Title" = "Let There Be Rock";
"ArtistId" = 1;
} 

With that short intro and without much further ado, here's the code :


#!/usr/local/bin/stsh
#-sqlite:<ref>dbref

framework:FMDB load.


class ColumnInfo {
  var name.
  var type.
  -description {
      "Column: {var:self/name} type: {var:self/type}".
  }
}

class TableInfo  {
  var name.
  var columns.
  -description {
    cd := self columns description.
    "Table {var:self/name} columns: {cd}".
  }
}

class SQLiteScheme : MPWScheme {
  var db.

  -initWithPath: dbPath {
     self setDb:(FMDatabase databaseWithPath:dbPath).
     self db open.
     self.
  }

  -dictionariesForResultSet:resultSet
  {
    results := NSMutableArray array.
    { resultSet next } whileTrue: { results addObject:resultSet resultDictionary. }.
    results.
  }

  -dictionariesForQuery:query {
     self dictionariesForResultSet:(self db executeQuery:query).
  }

  /. { 
     |= {
       resultSet := self dictionariesForQuery: 'select name from sqlite_master where [type] = "table" '.
       resultSet collect at:'name'.
     }
  }

  /:table/count { 
     |= { self dictionariesForQuery: "select count(*) from {table}" | firstObject | at:'count(*)'. }
  }

  /:table/:index { 
     |= { self dictionariesForQuery: "select * from {table}" | at: index. }
  }

  /:table { 
     |= { self dictionariesForQuery: "select * from {table}". }
  }

  /:table/:column/:index { 
     |= { self dictionariesForQuery: "select * from {table}" | at: index.  }
  }

  /:table/where/:column/:value { 
     |= { self dictionariesForQuery: "select * from {table} where {column} = {value}".  }
  }

  /:table/column/:column { 
     |= { self dictionariesForQuery: "select {column} from {table}"| collect | at:column. } 
  }

  /schema/:table {
     |= {
        resultSet := self dictionariesForQuery: "PRAGMA table_info({table})".
	    columns := resultSet collect: { :colDict | 
            #ColumnInfo{
				#'name': (colDict at:'name') ,
				#'type': (colDict at:'type')
			}.
        }.
        #TableInfo{ #'name': table, #'columns': columns }.
     }
  } 

  -tables {
	 db:. collect: { :table| db:schema/{table}. }.
  }
  -<void>logTables {
     stdout do println: scheme:db tables each.	
  }
}

extension NSObject {
  -initWithDictionary:aDict {
    aDict allKeys do:{ :key |
      self setValue: (aDict at:key) forKey:key.
    }.
    self.
  }
}


scheme:db := SQLiteScheme alloc initWithPath: dbref path.
stdout println: db:schema/artists
shell runInteractiveLoop.


Let's walk through the code in detail, starting with the header:
#!/usr/local/bin/stsh
#-sqlite:<ref>dbref

This is a normal Unix shell script invoking stsh, the Smalltalk Shell. The Smalltalk Shell is a bigger topic for another day, but for now let's focus on the second line, which looks like a method declaration, and that's exactly what it is! In order to ease the transition between small scripts and larger systems (successful scripts tend to get larger, and successful large systems evolve from successful small systems), scripts have a dual nature, being at the same time callable from the Unix command line and also usable as a method (or filter) from a program.

Since this script is interactive, that part is not actually that important, but a nice side effect is that the declaration of a parameter gets us automatic command-line parameter parsing, conversion, and error checking. Specifically, stsh knows that the script takes a single parameter of type <ref> (a reference, so a filename or URL) and will put that in the dbref variable as a reference. If the script is invoked without that parameter, it will exit with an error message, all without any further work by the script author. These declarations are optional, without them parameters will go into an args array without further interpretation.

Next up, we load a dependency, Gus Mueller's wonderful FMDB wrapper for SQLite.


framework:FMDB load.

The framework scheme looks for frameworks on the default framework path, and the load message is sent to the NSBundle that is returned.

The next bit is fairly straightforward, defining the ColumnInfo class with two instance variables, name and type, and a -descritpion method.


class ColumnInfo {
  var name.
  var type.
  -description {
      "Column: {var:self/name} type: {var:self/type}".
  }
}

Again, this is very straightforward, with maybe the missing superclass specification being slightly unusual. Different constructs may have different implicit superclasses, for class it is assumed to be NSObject. The description method, introduced by "-" just like in Objective-C, uses string interpolation with curly braces. (I currently need to use fully qualified names like var:self/name to access instance variables, that should be fixed in the near future). It also doesn't have a return statement or the like, a method return can be specified by just writing out the return value.

To me, this has the great effect of putting the focus on the essential "this is the description" rather than on the incidental, procedural "this is how you build the description". It is obviously only a very small instance of this shift, but I think even this small examples speaks to what that shift can look like in the large.

The way instance variables are defined is far from being done, but for now the var syntax does the job. The TableInfo class follows the same pattern as ColumnInfo, and of course these two classes are used to represent the metadata of the database.

So on to the main attraction, the scheme-handler itself, which is just a plain old class inheriting from MPWScheme, with an instance variable and an initialisation method:


class SQLiteScheme : MPWScheme {
  var db.

  -initWithPath: dbPath {
     self setDb:(FMDatabase databaseWithPath:dbPath).
     self db open.
     self.
  }

Having advanced language features largely defined as/by plain old classes goes back to the need for a stable starting point. However, it has turned out to be a little bit more than that, because the mapping to classes is not just the trivial one of "since this written in on OO language, obviously the implementation of features is somehow done with classes". Instead, the classes map onto the language features very much in an Open Implementation kind of way, except that in this case it is Open Language Implementation.

That means that unlike a typical MOP, the classes actually make sense outside the specific language implementation, making their features usable from other languages, albeit less conveniently. Being easily accessible from other languages is important for an architectural language.

With this mapping, a very narrow set of syntactic language mechanism can be used to map a large and extensible (thus infinite) set of semantic features into the languages. This is of course similar to language features like procedures, methods and classes, but is expanded to things that usually haven't been as extensible.

The next two methods handle the interface to FMDB, they are mostly straightforward and, I think, understandable to both Smalltalk and Objective-C programmers without much explanation.


  -dictionariesForResultSet:resultSet
  {
    results := NSMutableArray array.
    { resultSet next } whileTrue: { results addObject:resultSet resultDictionary. }.
    results.
  }

  -dictionariesForQuery:query {
     self dictionariesForResultSet:(self db executeQuery:query).
  }

Smalltalk programmers may balk a little at the use of curly braces rather than square brackets to denote blocks. To me, this is a very easy concession to "the mainstream"; I have bigger fish to fry. To Objective-C programmers, the fact that the condition of the while-loop is implemented as a message sent to a block rather than as syntax might take a little getting used to, but I don't think it presents any fundamental difficulties.

Next up we have some property path definitions, the meat of the scheme handler. Each property path lets you define code that will run for a specific subset of the scheme's namespace, with the subset defined by the property path's URI pattern. As the name implies, property paths can be regarded as a generalisation of Objective-C properties, extended to handle both entire sets of properties, sub-paths and the combination of both.


  /. { 
     |= {
       resultSet := self dictionariesForQuery: 'select name from sqlite_master where [type] = "table" '.
       resultSet collect at:'name'.
     }
  }

The first property path definition is fairly straightforward as it only applies to a single path, the period (so the db:. example from above). Property path definitions start with the forward slash ("/"), similar to the way that instance methods start with "-" and class methods with "+" in Objective-C (and Objetive-Smalltalk). The slash seemed natural to indicate the definition of paths/URIs.

Like C# or Swift property definitions, you need to be able to deal with (at least) "get" and/or "set" access to a property. I really dislike having random keywords like "get" or "set" for program structure, I prefer to see names reserved for things that have domain meaning. So instead of keywords, I am using constraint connector syntax: "|=" means the left hand side is constrained to be the same as the right hand side (aka "get"). "=|" means the right hand side is constrained to be the same as the left hand side (aka "set"). The idea is that the "left hand side" in this case is the interface, the outside of the object/scheme handler, whereas the "right hand side" is the inside of the object, with properties mediating between the outside and the inside of the object.

As most everything, this is currently experimental, but so far I like it more than I expected to, and again, it shifts us away from being action oriented to describing relationships. For example, delegating both get and set to some other object could then be described by using the bidirectional constraint connector: /myProperty =|= var:delegate/otherroperty.

Getting the result set is a straightforward message-send with the SQL query as a constant, non-interpolated string (single quotes, double quotes is for interpolation). We then need to extract the name of the table from the return dictionaries, which we do via the collect HOM and the Smalltalk-y -at: message, which in this case maps to Foundation's -objectForKey:. The next property paths map URIs to queries on the tables. Unlike the previous example, which had a constant, single element path and so was largely equivalent to a classic property, these all have variable path elements, multiple path segments or both.


  /:table/count { 
     |= { self dictionariesForQuery: "select count(*) from {table}" | firstObject | at:'count(*)'. }
  }

  /:table/:index { 
     |= { self dictionariesForQuery: "select * from {table}" | at: index. }
  }

  /:table { 
     |= { self dictionariesForQuery: "select * from {table}". }
  }

Starting at the back, /:table returns the data from the entire table specified in the URI using the parameter :table. The leading semicolon means that this path segment is a parameter that will match any single string and deliver it the method as the parameter name used, in this case "table". Wildcards are also possible.

Yes, the SQL query is performed using simple string interpolation without any sanitisation. DON'T DO THAT. At least not in production code. For experimenting in an isolated setting it's OK.

The second query retrieves a specific row of the table specified. The pipe "operator" is for method chaining with keyword syntax without having to bracket like crazy:


self dictionariesForQuery: "select count(*) from {table}" | firstObject | at:'count(*)'
((self dictionariesForQuery: "select count(*) from {table}") firstObject) at:'count(*)'

I find the "pipe" version to be much easier to both just visually scan and to understand, because it replaces nested (recursive) evaluation with linear piping. And yes, it is at least partly a by-product of integrating pipes/filters, which is a part of the larger goal of smoothly integrating multiple architectural styles. That this integration would lead to an improvement in the procedural part was an unexpected but welcome side effect.

The first property path, /:table/count returns the size of the given table, using the optimised SQL code select count(*). This shows an interesting effect of property paths. In a standard ORM, getting the count of a table might look something like this: db.artists.count. Implemented naively, this code retrieves the entire "artists" table, converts that to an array and then counts the array, which is incredibly inefficient. And of course, this was/is a real problem of many ORMs, not just a made up example.

The reason it is a real problem is that it isn't trivial to solve, due to the fact that OOPLs are not structurally polymorphic. If I have something like db.artists.count, there has to be some intermediate object returned by artists so I can send it the count message. The natural thing for that is the artists table, but that is inefficient. I can of course solve this by returning some clever proxy that doesn't actually materialise the table unless it has to, or I can have count handled completely separately, but neither of these solutions are straightforward, which is why this has traditionally been a problem.

With property paths, the problem just goes away, because any scheme handler (or object) has control over its sub-structure to an arbitrary depth.

Queries are handled in a similar matter, so db:albums/where/ArtistId/6 retrieves the two albums by band Apocalyptica. This is obviously very manual, for any specific database you'd probably want to specialise this generic scheme handler to give you named relationships and also to return actual objects, rather than just dictionaries. A step in that direction is the /schema/:table property path:


  /schema/:table {
     |= {
        resultSet := self dictionariesForQuery: "PRAGMA table_info({table})".
	    columns := resultSet collect: { :colDict | 
            #ColumnInfo{
				#'name': (colDict at:'name') ,
				#'type': (colDict at:'type')
			}.
        }.
        #TableInfo{ #'name': table, #'columns': columns }.
     }
  } 

This property path returns the SQL schema in terms of the objects we defined at the top. First is a simple query of the SQLite table that holds the schema information, returning an array of dictionaries. These individual dictionaries are then converted to ColumnInfo objects using object literals.

Similar to defining the -description method above as simple the parametrized string literal instead of as instructions to build the result string, object literals allow us to simple write down general objects instead of constructing them. The example inside the collect defines a ColumnInfo object literal with the name and type attributes set from the column dictionary retrieved from the database.

Similarly, the final TableInfo is defined by its name and the column info objects. Object literals are a fairly trivial extension of Objective-Smalltalk dictionary literals, #{ #'key': value }, with a class-name specified between the "#" and the opening curly brace. Being able to just write down objects is, I think, one of the better and under-appreciated features of WebObjects .wod files (though it's not 100% the same thing), as well as QML and I think also part of what makes React "declarative".

Not entirely coincidentally, the "configurations" of architectural description languages can also be viewed as literal object definitions.

With that information in hand, and with the Objective-Smalltalk runtime providing class definition objects that can be used to install objects with utheir methods in the runtime, we now have enough information to create some classes straight from the SQL table definitions, without going through the intermediate steps of generating source code, adding that to a project and compiling it.

That isn't implemented, and it's also something that you don't really want, but it's a stepping stone towards creating a general mechanism for orthogonal modular persistence. The final two utility methods are not really all that noteworthy, except that they do show how expressive and yet straightforward even ordinary Objective-Smalltalk code is.


  -tables {
	 db:. collect: { :table| db:schema/{table}. }.
  }
  -<void>logTables {
     stdout do println: scheme:db tables each.	
  }

The -tables method just gets the all the schema information for all the tables. The -logTables methods prints all the tables to stdout, but individually, not as an array. Finally, there is a class extension to NSObject that supports the literal syntax on all objects and the script code that actually initialises the scheme with a database and starts an interactive session. This last feature has also been useful in Smalltalk scripting: creating specialized shells that configure themselves and then run the normal interactive prompt.

So that's it!

It's not a huge revelation, yet, but I do hope this example gives at least a tiny glimpse of what Objective-Smalltalk already is and of what it is poised to become. There is a lot that I couldn't cover here, for example that scheme-handlers aren't primarily for interactive exploration, but for composition. I also only mentioned pipes-and-filters in passing, and of course there "is" a lot more that just "isn't" there, quite yet.

As always, but even more than usual, I would love to get feedback! And of course the code is on github