Showing posts with label Testing. Show all posts
Showing posts with label Testing. Show all posts

Monday, June 20, 2022

Blackbird: A reference architecture for local-first connected mobile apps

Wow, what a mouthful! Although this architecture has featured in a number of my other writings, I haven't really described it in detail by itself. Which is a shame, because I think it works really well and is quite simple, a case of Sophisticated Simplicity.

Why a reference architecture?

The motivation for creating and now presenting this reference architecture is that the way we build connected mobile apps is broken, and none of the proposed solutions appear to help. How are they broken? They are overly complex, require way too much code, perform poorly and are unreliable.

Very broadly speaking, these problems can be traced to the misuse of procedural abstraction for a problem-space that is broadly state-based, and can be solved by adapting a state-based architectural style such as in-process REST and combining it with well-known styles such as MVC.

More specifically, MVC has been misapplied by combining UI updates with the model updates, a practice that becomes especially egregious with asynchronous call-backs. In addition, data is pushed to the UI, rather than having the UI pull data when and as needed. Asynchronous code is modelled using call/return and call-backs, leading to call-back hell, needless and arduous transformation of any dependent code into asynchronous code (see "what color is your function") that is also much harder to read, discouraging appropriate abstractions.

Backend communication is also an issue, with newer async/await implementations not really being much of an improvement over callback-based ones, and arguably worse in terms of actual readability. (They seem readable, but what actually happens is different  enough that the simplicity is deceptive).

Overview

The overall architecture has four fundamental components:
  1. The model
  2. The UI
  3. The backend
  4. The persistence
The main objective of the architecture is to keep these components in sync with each other, so the whole thing somewhat resembles a control loop architecture: something disturbs the system, for example the user did something in the UI, and the system responds by re-establishing equilibrium.

The model is the central component, it connects/coordinates all the pieces and is also the only one directly connected to more than one piece. In keeping with hexagonal architecture, the model is also supposed to be the only place with significant logic, the remainder of the system should be as minimal, transparent and dumb as possible.

memory-model := persistence.
persistence  |= memory-model.
ui          =|= memory-model. 
backend     =|= memory-model.

Graphically:

Elements

Blackbird depends crucially on a number of architectural elements: first are stores of the in-process REST architectural style. These can be thought of as in-process HTTP servers (without the HTTP, of course) or composable dictionaries. The core store protocol implements the GET, PUT and DELETE verbs as messages.

The role of URLs in REST is taken by Polymorphic Identifiers. These are objects that can reference identify values in the store, but are not direct pointers. For example, they need to be a able to reference objects that aren't there yet.

Polymorphic Identifiers can be application-specific, for example they might consist just of a numeric id,

MVC

For me, the key part of the MVC architectural style is the decoupling of input processing and resultant output processing. That is, under MVC, the view (or a controller) make some change to the model and then processing stops. At some undefined later time (could be synchronous, but does not have to be) the Model informs the UI that it has changed using some kind of notification mechanism.

In Smalltalk MVC, this is a dependents list maintained in the model that interested views register with. All these views are then sent a #changed message when the model has changed. In Cocoa, this can be accomplished using NSNotificationCenter, but really any kind of broadcast mechanism will do.

It is then the views' responsibility to update themselves by interrogating the model.

For views, Cocoa largely automates this: on receipt of the notification, the view just needs invalidate itself, the system then automatically schedules it for redrawing the next time through the event loop.

The reason the decoupling is important to maintain is that the update notification can come for any other reason, including a different user interaction, a backend request completing or even some sort of notification or push event coming in remotely.

With the decoupled M-V update mechanism, all these different kinds of events are handled identically, and thus the UI only ever needs to deal with the local model. The UI is therefore almost entirely decoupled from network communications, we thus have a local-first application that is also largely testable locally.

Blackbird refines the MVC view update mechanism by adding the polymorphic identifier of the modified item in question and placing those PIs in a queue. The queue decouples model and view even more than in the basic MVC model, for example it become fairly trivial to make the queue writable from any thread, but empty only onto the main thread for view updates. In addition, providing update notifications is no longer synchronous, the updater just writes an entry into the queue and can then continue, it doesn't wait for the UI to finish its update.

Decoupling via a queue in this way is almost sufficient for making sure that high-speed model updates don't overwhelm the UI or slow down the model. Both these performance problems are fairly rampant, as an example of the first, the Microsoft Office installer saturates both CPUs on a dual core machine just painting its progress bar, because it massively overdraws.

An example of the second was one of the real performance puzzlers of my career: an installer that was extremely slow, despite both CPU and disk being mostly idle. The problem turned out to be that the developers of that installer not only insisted on displaying every single file name the installer was writing (bad enough), but also flushing the window to screen to make sure the user got a chance to see it (worse). This then interacted with a behavior of Apple's CoreGraphics, which disallows screen flushes at a rate greater than the screen refresh rate, and will simply throttle such requests. You really want to decouple your UI from your model updates and let the UI process updates at its pace.

Having polymorphic identifiers in the queue makes it possible for the UI to catch up on its own terms, and also to remove updates that are no longer relevant, for example discarding duplicate updates of the same element.

The polymorphic identifier can also be used by views in order to determine whether they need to update themselves, by matching against the polymorphic identifier they are currently handling.

Backend communication

Almost every REST backend communication code I have seen in mobile applications has created "convenient" cover methods for every operation of every endpoint accessed by the application, possibly automatically generated.

This ignores the fact that REST only has a few verbs, combined with a great number of identifiers (URLs). In Blackbird, there is a single channel for backend communication: a queue that takes a polymorphic identifier and an http verb. The polymorphic identifier is translated to a URL of the target backend system, the resulting request executed and when the result returns it is placed in the central store using the provided polymorphic identifier.

After the item has been stored, an MVC notification with the polymorphic identifier in question is enqueued as per above.

The queue for backend operations is essentially the same one we described for model-view communication above, for example also with the ability to deduplicate requests correctly so only the final version of an object gets sent if there are multiple updates. The remainder of the processing is performed in pipes-and-filters architectural style using polymorphic write streams.

If the backend needs to communicate with the client, it can send URLs via a socket or other mechanism that tells the client to pull that data via its normal request channels, implementing the same pull-constraint as in the rest of the system.

One aspect of this part of the architecture is that backend requests are reified and explicit, rather than implicitly encoded on the call-stack and its potentially asynchronous continuations. This means it is straightforward for the UI to give the user appropriate feedback for communication failures on the slow or disrupted network connections that are the norm on mobile networks, as well as avoid accidental duplicate requests.

Despite this extra visibility and introspection, the code required to implement backend communications is drastically reduced. Last not least, the code is isolated: network code can operate independently of the UI just as well as the UI can operate independently of the network code.

Persistence

Persistence is handled by stacked stores (storage combinators).

The application is hooked up to the top of the storage stack, the CachingStore, which looks to the application exactly like the DictStore (an in-memory store). If a read request cannot be found in the cache, the data is instead read from disk, converted from JSON by a mapping store.

For testing the rest of the app (rather than the storage stack), it is perfectly fine to just use the in-memory store instead of the disk store, as it has the same interface and behaves the same, except being faster and non-persistent.

Writes use the same asynchronous queues as the rest of the system, with the writer getting the polymorphic identifiers of objects to write and then retrieving the relevant object(s) from the in-memory store before persisting. Since they use the same mechanism, they also benefit from the same uniquing properties, so when the I/O subsystem gets overloaded it will adapt by dropping redundant writes.

Consequences

With the Blackbird reference architecture, we not only replace complex, bulky code with much less and much simpler code, we also get to reuse that same code in all parts of the system while making the pieces of the system highly independent of each other and optimising performance.

In addition, the combination of REST-like stores that can be composed with constraint- and event-based communication patterns makes the architecture highly decoupled. In essence it allows the kind of decoupling we see in well-implemented microservices architectures, but on mobile apps without having to run multiple processes (which is often not allowed).

Monday, June 1, 2020

MPWTest Only Tests Frameworks

It should be noted, if it wasn't obvious, that MPWTest is opinionated software, meaning it achieves some of its smoothness by gleefully embracing constraints that some might view as potentially crippling limitations.

Maybe the biggest of these constraints, mentioned in the previous post, is that MPWTest only tests frameworks. This means that the following workflow is not supported out of the box:

The point being that this is a workflow I not just somewhat indifferently do not want, but rather emphatically and actively want to avoid. Tests that are run (only?) when launching the app are application tests. My perspective is that unit tests are an integral part of the class. This may seem a subtle distinction, but subtle differences in something you do constantly can have huge impacts. "Steter Tropfen höhlt den Stein."

Another aspect is that launching the app for testing as a permanent and fixed part of your build process seems highly annoying at best. Linker finishes, app pops up, runs for a couple of seconds, shuts down again. I don't see that as viable. For testing to be integral and pervasive, it has to be invisible when the tests succeed.

The testing pyramid is helpful here: my contention is that you want to be at the bottom of that pyramid, ideally all of the time. Realistically, you're probably not going to get there, but you should push really, really hard, even making sacrifices that appear to be unreasonable to achieve that goal.

Framework-oriented programming

Only testing frameworks begs the question as to how to test those parts of the application not in frameworks. For me the answer is simple: there isn't any production code outside of frameworks.

None. Not the UI, not the application delegate. Only the auto-generated main().

The benefits of this approach are plentiful, the effort minimal. And if you think this is an, er, eccentric position to take, the program you almost certainly use to create apps for iOS/macOS etc. takes the same eccentric position: Xcode's main executable is 45K in size and only contains a main() function and some Swift boilerplate.

If all your code is in frameworks, only testing frameworks is not a problem. That may seem like a somewhat extreme case of sour grapes, with the arbitrary limitations of a one-off unit testing framework driving major architectural decisions, but the causality is the other way around: I embraced framework-oriented programming before and independently of MPWTest.

iOS

Another issue is iOS. Running a command-line tool that dynamically loads and tests frameworks is at least tricky and may be impossible, so that approach currently does not work. My current approach is that I view on-device and on-simulator tests as higher-up in the testing hierarchy: they are more costly, less numerous and run less frequently.

The vast majority of code lives in cross-platform frameworks (see: Ports and Adapters) and is developed and tested primarily on macOS. I have found this to be much faster than using the simulator or a device in day-to-day programming, and have used this "mac-first" technique even on projects where we were using XCTest.

Although not testing on the target platform may be seen as a problem, I have found discrepancies to be between exceedingly rare and non-existent, with "normal" code trending towards the latter. One of the few exceptions in the not-quite-so-normal code that I sometimes create was the change of calling conventions on arm64, which meant that plain method pointers (IMPs) no longer worked, but had to be cast to the "correct" pointer type, only on device. Neither macOS nor the simulator would show the problem.

For that purpose, I hacked together a small iOS app that runs the tests specified in a plist in the app bundle. There is almost certainly a better way to handle this, but I haven't had the cycles or motivation to look into it.

How to approximate

So you can't or don't want to adopt MPWTest. That doesn't mean you can't get at least some of the benefits of the approach. As a start, instead of using Cmd-B in Xcode to build, just use Cmd-U instead. That's what I did when working on Wunderlist, where we used XCTest.

Second, adopt framework-oriented programming and the Ports and Adapters style as much as possible. Put all your code in frameworks, and as much as possible in cross-platform frameworks that you can test/run on macOS, and even if you are developing exclusively for iOS, create a macOS target for that framework. This makes using Cmd-U to build much less painful.

Third, adhere to a strict 1:1 mapping between production classes and test classes, and place your test classes in the same file as the class they are testing.

My practical experience with both JUnit and XCTest on medium-sized projects does not square with the assertion that the difference is not that big: you still have to create these additional classes, they have to communicate with the class under tests (self in MPWTest), you have to track changes etc. And of course, you have to know to configure und use the framework differently from the way it was built, intended and documented. And what I've seen of OCUnit use was that the tests were not co-located with the class, but in a separate part of the project.

A final note is that the trick of interchangeably using the class as the test fixture is only really possible in a language like Objective-C where classes are first class objects. It simply wouldn't be possible in Java. This is how the class can test itself, and the tests become an integral part of the class, rather than something that's added somewhere else.

Sunday, December 23, 2018

A Minimal Test Runner

A long time ago when I was working a MPWTest, "The simplest Objective-C Unit Test Framework that could possibly work...", I had a brief chat with Kent Beck about it, and one of the things he said was that everyone should build their own unit test "framework".

Why the scare quotes?

If your testing framework is actually a framework, it's probably too big. I recently started porting MPWFoundation and Objective-Smalltalk to GNUstep again, in order to get it running In the Cloud™. In order to see how it's going, it's probably helpful to run the tests.

Initially, I needed to test some compiler issues with such modern amenities as keyed subscripting of dictionaries:


#import <Foundation/Foundation.h>

int main( int argc, char *argv[] ) {
  MPWDictStore *a=[MPWDictStore store];
  a[@"hi"]=@"there";
  NSLog(@"hi: %@",a[@"hi"]);
  return 0;

}

Once that was resolved with the help of Alex Denisov, I wanted to minimally run some tests, but the idea of first getting all of MPWTest to run wasn't very appealing. So instead I just did the simplest thing that could possible work:
static void runTests()
{
  int tests=0;
  int success=0;
  int failure=0;
  NSArray *classes=@[
    @"MPWDictStore",
    @"MPWReferenceTests",
  ];

  for (NSString *className in classes ) {
    id testClass=NSClassFromString( className );
    NSArray *testNames=[testClass testSelectors];
    for ( NSString *testName in testNames ) {
      SEL testSel=NSSelectorFromString( testName );
      @try {
        tests++;
        [testClass performSelector:testSel];
        NSLog(@"%@:%@ -- success",className,testName);
        success++;
      } @catch (id error)  {
        NSLog(@"%@:%@ == failure: %@",className,testName,error);
        failure++;
      }
    }

  }
  printf("\033[91;3%dmtests: %d total, %d successes %d failures\033[0m\n",
         failure>0 ? 1:2,tests,success,failure);
}

That's it, my minimal testrunner. With hard-coded list of classes to test. In a sense, that is the entire "test framework", the rest just being conventions followed by classes that wish to be tested.

And of course MPWTest's slogan was a bit...optimistic.

Friday, July 11, 2014

Overspeccing

I just took my car to its biennial TüV inspection and apart from the tires that had simply worn out everything was A-OK, nothing wrong at all. Kind of surprising for a 7 year old mechanical device that has been used: daily commute from Mountain View to Oakland, tight cornering in the foothills, shipped across the Atlantic twice and now that it is back in its native country, occasional and sometimes prolonged sprints at 200 km/h. All that with not all that much maintenance, because the owner is not exactly a car nut.

Cars used to not be nearly this reliable, and getting there wasn't easy, it took the industry both plenty of time and a lot of effort. It's not that the engineers didn't know how to build reliable cars, but making them reliable and keeping them affordable and still allowing car companies to turn a profit, that was hard.

One particular component is the alternator belt, which had to be changed so frequently that engine compartments were specially designed to make the belt easily accessible. That's no longer the case, and the characteristic screeching sound of a worn belt is one that I haven't heard in a long time.

My late dad, who was in the business, told me how it went down, at least at Volkswagen. As other problems had been whittled away over the decades, alternator belts were becoming a real issue on the reliability reports compiled by motoring magazines, and the engineers were tasked with the job of fixing the problem. And fix it they did: they came up with a design that would "never" break or wear out, and no I don't know the details of how that was supposed to work.

Problem was: it was a tad expensive. Much more expensive than the existing solution and simply too expensive for the price bracket they were aiming for (this may seem odd to outsiders considering the total cost of a car, but pennies matter). Which of course was one reason why they had put up with unreliable belts for so long. Then word came in that the Japanese had solved the problem as well, and were offering it on their cheap(er) models. Next auto-show, they went to the both of one of those Japanese companies and popped the hood.

The engineers scoffed: the design the Japanese was cheaper because it was much, much more primitive than the one they had come up with, and it would, in fact, also wear out much more quickly. But exactly how much more quickly would it wear out? In other words, what was the expected lifetime of this cheaper, inferior alternator belt design?

About the expected lifetime of the car.

Ahh. As far as I can tell, the Japanese design or variants thereof conquered the world. I can't recall the last time I heard the screech of a worn out belt, engine compartments these days are not designed with accessibility in mind and cars are still affordable, although changing the belt if it does break will cost more in labor because of the less accessible placement.

What do alternator belts have to do with software development? Probably nothing, but to me at least, the situation reminds me of the one I write about in The Safyness of Static Typing. I am actually with those commenters who scoffed at the idea that the safety benefit of static typing is only around 2%, because theoretically having a tight specification of possible values checked at compile-time absolutely should bring a greater benefit.

For example, when static typing and protocols were introduced to Objective-C, I absolutely expected them to catch my errors, so I was quite surprised when it turned out that in practice they didn't: because I could actually compile/run/test my code without having to specify static types, by the time I added static types the code simply no longer had type errors, because the vast majority of those were caught by running it. The dynamic safety also helped, because instead of a random crash, I got a nice clean error message "object abc doesn't understand message xyz".

My suspicion is that although dynamic typing and the practices that go with it may only be, let's say, 50% as good at catching type errors as a good static type system, they are actually 98% effective at catching real world type errors. So if static type systems are twice as good, they would be 196% effective at catching real world type errors, which just like the perfect, german-engineered alternator belts, is simply more than is actually needed (96% more with my hypothetical numbers).

There are obviously other factors at play, but I think this may account for a good part of the perceived discrepancy.

What do you think? Comments welcome here or on Hacker News.

Monday, January 7, 2013

Dependency Injection is a Virtue

DHH recently made a claim of Dependency Injection not being entirely virtuous, and got support from Tim Bray. While both make good points, they are both wrong.

Having hard-coded class-names like in the example Time.now is effectively the same as communicating via global variables. DHH's suggestion of stubbing out the Time class's now is selling us mutable global variables as the solution to global variables. Or more precisely: passing an argument to a method by modifying a global variable that the method reads out and restoring the state of the global variable afterward.

If that's "better", I don't really want to see "worse", and not wanting that sort of thing has nothing to do with being a Java drone limited by the language. And my experience with time-dependent systems tells me that you really want to pass the time into such a system generally, not just for unit testing.

Of course, having n-levels of AbstractFactoryFactory indirection could actually be argued as being worse, as Tim convincingly does, but that's one implementation of DI that's hobbled by Java's limitations. For a DI solution that's actually simple and elegant, check out Newspeak's  module system (PDF): there is no global namespace, modules are parametrized and all names dynamically resolved.

If you want synthetic time for a module, just instantiate that module with your own Time class.

Wednesday, March 28, 2012

Three Kinds of Agile

Wiliam Edwards writes that Agile Is a Sham. He got quite a few intelligent comments, including those pointing out that the values he espouses are exactly those from the Agile Manifesto:

Individuals and interactions over processes and tools
Working software over comprehensive documentation
Customer collaboration over contract negotiation
Responding to change over following a plan

In my working life, I have encountered "Agile" three times. The last time was at a startup based in Oakland. It was Scrum, it was imposed from the top, a tool was introduced first in order to do agile tracking and planning, none of the actual technical practices were ever seriously considered. In short, exactly the kind of sham that Wiliam complains about. It had zero benefit.

Before that, we had two groups at BBC News Interactive starting to do their respective interpretations of XP. One did the simplest thing that could possibly work, for example never getting around to ever putting in the database, and did that in a test-first style. It didn't do standup meetings, paired at times, and every once in a while did a planning game. This team replaced a system that failed almost every single day and had abysmal performance using a 12 computer setup with a system that had two failures in 3 years, performed 100-1000 times better using only a single machine. The second team did standup meetings and planning games, but never managed to implement the technical practices such as TDD or DTSTTCPW/YAGNI. This team failed to deliver and the project was canceled/reset.

Finally, I first encountered what I would now recognize as XP before I knew that XP existed. Hard core YAGNI/DTSTTCPW, mixing pairing and working alone both organically and usefully. We introduced unit testing later and I became an immediate convert, having been responsible for shipping this product for some time and feeling the uncertainty over the question "when will it be ready". I later discovered that a bundle of these practices that we as coders had discovered for ourselves were known as XP, and I eagerly learned what I could from their experience.

So is Agile a sham? Maybe the dadaist manifesto put it best: if you are against this manifesto, you are a dadaist!

Tuesday, March 27, 2012

CocoaHeads Berlin Performance Talk

Last Wednesday, March 21st 2012, I held a talk on Cocoa Performance at the monthly Berlin CocoaHeads meeting. Thanks to everyone for the kind reception and benignly overlooking the fact that the end of the talk was a bit hurried.

Although the slides were primarily supportive and do not necessarily stand on their own, they are now available online by popular request (2.3MB, Keynote).

And a PDF version (6.3MB).

30k requests/s, aka wrk is fast

I just discovered wrk, a small and very fast http load testing tool. My previous experiments with µhttp-based MPWSideWeb first maxed out at around 5K requests per second, and after switching to httperf, I got up to around 10K per second. Using wrk on the same machine as previously, I now get this:
marcel@nomad[~]wrk  -c 100 -r 300000 http://localhost:8082/hi
Making 300000 requests to http://localhost:8082/hi
  2 threads and 100 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency     4.60ms    2.51ms  11.47ms   62.40%
    Req/Sec    14.98k     0.99k   16.00k    52.80%
  300002 requests in 9.71s, 21.46MB read
Requests/sec:  30881.96
Transfer/sec:      2.21MB
marcel@nomad[~]curl http://localhost:8082/hi
Hello World!


So a nice 30K requests per second on a MacBook Air, and wrk was only running at around 60% CPU, whereas httperf tended to be pegged at 100%. The web-server in question is a minimal server like sinatra.rb set up to return a simple "Hello world!".
marcel@localhost[scripts]cat memhttpserver.stsh 
#!/usr/local/bin/stsh
context loadFramework:'MPWSideWeb'
server := MPWHTTPServer new.
server setPort: 8082.
stdout println: 'memhttpserver listening on port ',server port stringValue.
server start:nil.

scheme:base := MPWSiteMap scheme.
base:/hi := 'Hello World!'.
server setDelegate: scheme:base .

shell runInteractiveLoop

marcel@localhost[scripts]./memhttpserver.stsh 
memhttpserver listening on port 8082
>

So I definitely have a new favorite http performance tester!

Thursday, February 17, 2011

Mac App Store won't let me buy apps: solution

Just tried to buy an app via the Mac App Store and it was absolutely refusing to take my money. Various suggestions I've seen on the web such as clearing caches,resetting them via iTunes advanced preferences, rebooting, retrying, using slight variations of my account name all made no difference whatsoever.

The solution turned out to be manually signing in using the Store menu (manually sign out if you are already signed in). At that point I was allowed to update/verify my billing information and subsequent purchase attempts worked.

In previous attempts, I had not signed in manually, but rather had the App Store do the sign-in after I attempted to purchase.

So needs a little more work...

Wednesday, December 9, 2009

Some test-driven-development notes

A couple of random points that might be of interest:

Code coverage tools

  if ( rare-condition ) {
      -is this code tested?-
  }
If you actually followed test-first, then the code in the rare if is definitely tested, because if there isn't failing test case for the rare condition, then there is no reason for the code or the test to exist.

Another objection could be that people won't follow the techniques. I haven't found this to be a big or recurring practical problem so far, and agile techniqes tend to be empirically driven. If you suspect that this is a problem you are seeing in your environment, running a code-coverage tool to put some data behind your suspicion may be a good idea.

Test before or test after?

Note that the solution to the code-coverage question above does not work if tests are written after the fact: in this case, the rare-case is likely not to be covered because it was written without being forced by a failing unit test.

Many if not most of the benefits of TDD are related to the way they shape the design of the code, all of these benefits obviously don't accrue if you've already designed or even written the code. In fact, if you ask the XP folks about it, they will tell you that TDD is not for ensuring quality, it is exclusively for helping with coding and design.

For example, figuring out how to test something will force you to come to a clarity about what the code is supposed to do that just writing the code usually does not.

Knowing that your tests cover your code (see above) allows you to do extremely radical refactorings at any point in the development process. The ability to refactor at any time in turn allows you to keep your initial designs simple without coding for anticipated changes. Not coding for anticipated changes that may not occur or may occur differently than you expect in turns allows you to move more quickly, which more than pays for the expense of the tests.

Furthermore, the tests force you to think how you can call the functionality you are about to implement, which means it shapes architecture towards simplicity, high cohesion and low-coupling.

Generating tests

Auto-generating tests for existing methods is a means of subverting the test-driven approach: there will be the appearance of testing, but with virtually none of the benefits. It is probably worse than not having tests, because in the latter case you at least know that you're not covered.

Is it a good way of starting with unit test coverage for legacy code? No. See the C2 wiki entry for a good explanation of how to approach this case. In short, start refactoring and adding unit tests when you actually need to touch the code, be it for new features or to fix defects that are scheduled to be fixed.

Sunday, December 21, 2008

Unit test the class

Travis Griggs comes to the conclusion that unit test objects should map 1:1 to classes under test.

I agree.

In fact, I would go a bit further: tests should be an integral part of a class. While this helps avoid negative outcomes such as parallel class hierarchies or having code and tests diverge, it more importantly simplifies the test/code relationship and drives home the point that code is incomplete without its tests.

While I was working with JUnit on a reasonably large Java system, both finding a good place for a particular test and finding the tests for a specific class became quite burdensome after a while.

For this reason MPWTest simply asks classes to test themselves. Furthermore, only frameworks are tested, so the test tool simply loads each framework to test, enumerates the classes within that particular framework and then runs the tests it finds. TestCases and TestSuites are implicitly created from this structure, removing most of the administrative burdens of unit testing, and also any explicit dependence of the tests on the testing framework.

Having no dependencies on the testing framework makes it easier to ship tests in production code without having to also ship the testing framework. While this may sound odd at first, it avoids potential issues with code compiled for testing being different than code destined to be shipped, and further reinforces the idea that tests are an integral part of each class, rather than an optional add-on.