Friday, July 11, 2014

Overspeccing

I just took my car to its biennial TüV inspection and apart from the tires that had simply worn out everything was A-OK, nothing wrong at all. Kind of surprising for a 7 year old mechanical device that has been used: daily commute from Mountain View to Oakland, tight cornering in the foothills, shipped across the Atlantic twice and now that it is back in its native country, occasional and sometimes prolonged sprints at 200 km/h. All that with not all that much maintenance, because the owner is not exactly a car nut.

Cars used to not be nearly this reliable, and getting there wasn't easy, it took the industry both plenty of time and a lot of effort. It's not that the engineers didn't know how to build reliable cars, but making them reliable and keeping them affordable and still allowing car companies to turn a profit, that was hard.

One particular component is the alternator belt, which had to be changed so frequently that engine compartments were specially designed to make the belt easily accessible. That's no longer the case, and the characteristic screeching sound of a worn belt is one that I haven't heard in a long time.

My late dad, who was in the business, told me how it went down, at least at Volkswagen. As other problems had been whittled away over the decades, alternator belts were becoming a real issue on the reliability reports compiled by motoring magazines, and the engineers were tasked with the job of fixing the problem. And fix it they did: they came up with a design that would "never" break or wear out, and no I don't know the details of how that was supposed to work.

Problem was: it was a tad expensive. Much more expensive than the existing solution and simply too expensive for the price bracket they were aiming for (this may seem odd to outsiders considering the total cost of a car, but pennies matter). Which of course was one reason why they had put up with unreliable belts for so long. Then word came in that the Japanese had solved the problem as well, and were offering it on their cheap(er) models. Next auto-show, they went to the both of one of those Japanese companies and popped the hood.

The engineers scoffed: the design the Japanese was cheaper because it was much, much more primitive than the one they had come up with, and it would, in fact, also wear out much more quickly. But exactly how much more quickly would it wear out? In other words, what was the expected lifetime of this cheaper, inferior alternator belt design?

About the expected lifetime of the car.

Ahh. As far as I can tell, the Japanese design or variants thereof conquered the world. I can't recall the last time I heard the screech of a worn out belt, engine compartments these days are not designed with accessibility in mind and cars are still affordable, although changing the belt if it does break will cost more in labor because of the less accessible placement.

What do alternator belts have to do with software development? Probably nothing, but to me at least, the situation reminds me of the one I write about in The Safyness of Static Typing. I am actually with those commenters who scoffed at the idea that the safety benefit of static typing is only around 2%, because theoretically having a tight specification of possible values checked at compile-time absolutely should bring a greater benefit.

For example, when static typing and protocols were introduced to Objective-C, I absolutely expected them to catch my errors, so I was quite surprised when it turned out that in practice they didn't: because I could actually compile/run/test my code without having to specify static types, by the time I added static types the code simply no longer had type errors, because the vast majority of those were caught by running it. The dynamic safety also helped, because instead of a random crash, I got a nice clean error message "object abc doesn't understand message xyz".

My suspicion is that although dynamic typing and the practices that go with it may only be, let's say, 50% as good at catching type errors as a good static type system, they are actually 98% effective at catching real world type errors. So if static type systems are twice as good, they would be 196% effective at catching real world type errors, which just like the perfect, german-engineered alternator belts, is simply more than is actually needed (96% more with my hypothetical numbers).

There are obviously other factors at play, but I think this may account for a good part of the perceived discrepancy.

What do you think? Comments welcome here or on Hacker News.

2 comments:

Jakob said...

The main problem with the alternator belt wasn't only that the belt itself was unreliable, but that failure was catastrophic: when the belt tears, the piston would crush the valves and destroy the engine. To avoid this, you had to change the alternator belt at a certain interval as a precaution.

My old trusty Nissan Micra had one of these more reliable alternator belts, but the more important thing was a different engine layout. The engine didn't break when the alternator belt broke. Therefore you didn't need to change the belt as a precaution; you could just wait until it actually broke and change it then.

Maybe the analogy here is that static typing helps most when failure is catastrophic (eg. invalid memory access), but doesn't help as much when there is a good dynamic safety net (like exceptions for invalid messages).

Marcel Weiher said...

Excellent analogy, thanks!

I've actually encountered the claim that there is no difference between a DNU and randomly overwriting memory.