So performance and testing are mutually dependent.
So performance and testing are mutually dependent.
That level of performance makes testing qualitatively different from having tests that run in 7 minutes down from 15 after some serious performance tuning described in the article, or 5 minutes down from 10. It means running the unit tests can be a normal part of edit-compile-run cycle, rather than a separate activity, supporting the idea that the tests are simply an intrinsic part of the code.
These are not lightweight tests, for example setting up and tearing down Postscript interpreters or running PDF and Postscript documents through text extraction engines. However, they do run in memory almost exclusively and mostly in a compiled language.
However, even 3 seconds is still too long a delay, feedback should be instantaneous to give a true interactive programming experience, or at least not get in the way of that experience. I saw a nice approach to this at the Hasso Plattner Institute using code coverage analysis to interactively run tests sorted by relevance to the code being edited (in Smalltalk). A simpler approach might be to just run the unit tests in the background while editing.
In my working life, I have encountered "Agile" three times. The last time was at a startup based in Oakland. It was Scrum, it was imposed from the top, a tool was introduced first in order to do agile tracking and planning, none of the actual technical practices were ever seriously considered. In short, exactly the kind of sham that Wiliam complains about. It had zero benefit.
Before that, we had two groups at BBC News Interactive starting to do their respective interpretations of XP. One did the simplest thing that could possibly work, for example never getting around to ever putting in the database, and did that in a test-first style. It didn't do standup meetings, paired at times, and every once in a while did a planning game. This team replaced a system that failed almost every single day and had abysmal performance using a 12 computer setup with a system that had two failures in 3 years, performed 100-1000 times better using only a single machine. The second team did standup meetings and planning games, but never managed to implement the technical practices such as TDD or DTSTTCPW/YAGNI. This team failed to deliver and the project was canceled/reset.
Finally, I first encountered what I would now recognize as XP before I knew that XP existed. Hard core YAGNI/DTSTTCPW, mixing pairing and working alone both organically and usefully. We introduced unit testing later and I became an immediate convert, having been responsible for shipping this product for some time and feeling the uncertainty over the question "when will it be ready". I later discovered that a bundle of these practices that we as coders had discovered for ourselves were known as XP, and I eagerly learned what I could from their experience.
So is Agile a sham? Maybe the dadaist manifesto put it best: if you are against this manifesto, you are a dadaist!
Although the slides were primarily supportive and do not necessarily stand on their own, they are now available online by popular request (2.3MB, Keynote).
And a PDF version (6.3MB).
marcel@nomad[~]wrk -c 100 -r 300000 http://localhost:8082/hi
Making 300000 requests to http://localhost:8082/hi
2 threads and 100 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 4.60ms 2.51ms 11.47ms 62.40%
Req/Sec 14.98k 0.99k 16.00k 52.80%
300002 requests in 9.71s, 21.46MB read
Requests/sec: 30881.96
Transfer/sec: 2.21MB
marcel@nomad[~]curl http://localhost:8082/hi
Hello World!
marcel@localhost[scripts]cat memhttpserver.stsh #!/usr/local/bin/stsh context loadFramework:'MPWSideWeb' server := MPWHTTPServer new. server setPort: 8082. stdout println: 'memhttpserver listening on port ',server port stringValue. server start:nil. scheme:base := MPWSiteMap scheme. base:/hi := 'Hello World!'. server setDelegate: scheme:base . shell runInteractiveLoop marcel@localhost[scripts]./memhttpserver.stsh memhttpserver listening on port 8082 >
30K documents x 10 tags/document x 4 bytes/tag =
300K tags x 4 bytes/tag =
1200 K bytes = 1.2 MB
Even assuming 2:1 bloat due to to overhead gives us 2.4 MB, which should not just fit comfortably into the
RAM of a modern computer or a cellphone, it actually fits comfortably into the L3 cache of an Intel Core i7
with 8-10MB to spare.What about getting that data into RAM? The slowest hard drives (non-SSD) I could find using a quick web search had a transfer rate of better than 48MB/s and a seek time of around 10ms, so the 2.4MB in question should be in memory in around:
10ms + 2.4MB / (48MB/s) =
10ms + 0.05 s =
10ms + 50 ms = 60 ms
So less than 1/10th of a second to read it in, and a moderately fast SSD reduces that to 10ms.EDIT: fixed embarrassing typo (L1 -> L3 cache).
Measuring quickly on my laptop reveals that strcmp() takes around 8ns for a long matching string and 2ns for a non-match in the first character (with first character optimization). Splitting the difference and thus not taking into account that non-matches tend to be more common than matches, let's assume 5ns to compare each tag.
5ns /tag x 10 tags / document x 30k documents =
50ns / document x 30K documents =
1500K ns =
1500 µs = 1.5 ms
So an approach that takes longer than, say, 2ms to do such a search can probably be improved.Of course, we could do something slightly less thoroughly braindead and represent tags using integer, er, tags. A simple integer comparison should be less then one nanosecond, so that would drop the time to below 300 µs. With that, we could do 3000 queries per second, or 300 queries every tenth of second (the generally accepted threshold for interactive performance).
In theory, we could actually start optimizing ever so slightly by storing lists of document ids with each tag and then simply doing set operations on the document lists stored with each tag. But we don't really have to.