Tuesday, October 15, 2013

Should you use CoreData?

The answer, of course, is "it depends".

Now that we have that out of the way, I want to have a look at Drew Crawford's You should use Core Data, which manages to come up with a less nuanced answer in its 2943 words. It's an older article (2012), but recently came to my attention via Drew McCormack (@drewmccormack): "Great post", he wrote, and after reading the article I not just disagreed, but found that Twitter wasn't really adequate for writing up all the things wrong with that article.

Where to start? Maybe at the beginning, right in the first paragraph the following is called out as a "myth":

Among them, Core Data is designed for something–not really sure what–but whatever it is, it’s a lot more complicated than what I need to do in my project.
First here is a categorical mistake, because unless Drew knows the exact requirements and engineering trade-offs in every iOS application, he can't know whether this is true or false, fact or myth.

The second mistake is that the basic statement "CoreData was designed for something else" is actually true. CoreData's design dates back to NeXT's Enterprise Object Framework or EOF for short, and EOF was designed as an ORM for talking to corporate relational database servers, with a variety of alternate back-ends for non-relational DBs (including the way cool 3270 adapter!).

Obviously the implementation is different and the design has diverged by now, but that is the basic design, and yes, that does do something that is more complicated than what some (many?) developers need.


Next sentence, next problem:

I just want to save some entities to disk.
I may well be reading too much into this, but using the word entities already bakes so many assumptions into the problem statement. When I actually do want to save entities, databases in general and CoreData in particular are somewhat higher on my list of technologies, but quite often I don't start with ERM, and therefore just have objects, XML or other data. While these could be ER-modeled with varying degrees of difficulty/success, they certainly don't have to be, and it's often not the best choice.

In the enterprise system described in REST: Advanced Research Topics and Practical Applications, we removed the database from the rewrite, because the variety of the data meant converting the DB into a key-value store one way or another, or having a schema that's about an A0 page in small print (a standards body had developed such a schema and I think we even got a poster). We instead ended up converging on an EventPoster architecture that kept the original XML feed files around and parsed them into objects as necessary. No ERM here.

The next couple of paragraphs go off on an ad-hominem straw man tangent making WAG assumptions about the provenance of iOS developers and more WAGs about why that (assumed) provenance causes said developers to have these misconceptions. Those "misconceptions" that actually turn out to be true. Although largely irrelevant, it does contain some actual misinformation. For example the fact that categories don't get linked with static libraries is not an LLMV bug, it's a consequence of the combined semantics of static libraries and Objective-C categories irrespective of compiler/linker versions.



Then there's another tangent to Joel Spolsky's article on Things You Should Never Do (the link in Drew's article is dead), such as rewriting legacy code from scratch, which Joel describes categorically as "single worst strategic mistake" that any software company can make. In the words of the great Hans Bethe: "Ach, that isn't in wrong!":

  1. Just because Joel wrote something makes it right or applicable why?
  2. While Joel makes some good points and is right to counter a tendency to not want to deal with existing code, his claim is most certainly wrong in its absoluteness, both empirically (the system referenced above was a rewrite with tremendous benefits, then there's OS X vs. trying to keep fixing Classic Mac OS indefinitely etc.) and logically: it only holds true if all your accumulated complexity is of the essential kind, and none if of the accidental kind. That idea seems ludicrous to me, virtually all software is rushed, has shifting requirements that are only understood after the fact, has historical limitations (such as having to run in 128KB of RAM on a 7MHz CPU with no MMU) etc.

    Sometimes a rewrite is warranted, though you should obviously be wary and not undertake such a project lightly.

  3. Of course the biggest non-sequitur in the whole tangent is "How on earth does the problem of reading source code and rewriting legacy systems apply to this situation?". Apart from "not at all"? We don't have access to CoreData source code and there is no question about us rewriting it. Well, actually, I used to have such access, but even though my remit would have allowed some rewrites, I don't think I could have done a better job for the technology constraints chosen. The question is whether to use it or not, including whether the technology constraints are appropriate.
  4. And no, not every persistence solution ends up as effectively a rewrite of CoreData.


After that ill-conceived tangent, the article goes right back to the next unsubstantiated ad-hominem:
Here are some things you probably haven’t thought about when architecting your so-called data stack:
Apart from the attitude somewhat unbefitting to someone who gets so much wrong, well, Nein, but lets's look at some of these in detail:
  • Handling future changes to the data schema: this just isn't hard if you're not using a relational database. In fact high variability of the schema was one of the reasons for ditching the DB in you know...
  • Bi-directional synchronization with servers...hah hah...!
  • Bi-directional synchronization with peers...see above
  • Undo support: gosh, I wish there were some sort of facility that would manage undo support on my model objects, if I had my way, Apple would add this and call it NSUndoManager. Without such a useful class, I'll just have to do complicated things like renaming old versions of my data file or storing deltas.
  • Multithreading. Really? Multithreading?? If you think CoreData makes multithreading easier rather than harder, I have both a nice bridge and some oceanfront property in Nevada to sell you.
  • Device/time constraints: the performance ca(na)rd. CoreData is slow and memory intensive. It makes up for this by adding the ability and the requirement for the developer to continuously tune the working set, if that is an option. If continuously minimizing/tuning the working set is not an option (i.e. you have some large datasets), you're hosed.



Then we're back to the familiar ad-hominems/straw-men and non-sequitur analogies for a couple of paragraphs, nothing to see there. The whole analogy to Cocoa is useless: yes, Cocoa is good. How does this make CoreData good (it may or may not be, there simply is no connection) or appropriate for my tasks? Also, Cocoa in particular and Apple code in general is not "magic". I've been there, I've seen it and I fixed some of it. A lot of it is good, some excellent, but some not so much (sleep(5) anyone?).

The section entitled But, there are times not to use CoreData, right? could have been the saving grace, but alas the author blows it again. Having a "mark all as read" option in a NewsReader application is a "strange" "corner case"? Right. Let me turn this section around: there is a very narrow range of application areas where CoreData is or might be appropriate. Mostly read-only data, highly regular (table-like) structure, no need for bulk operations ever, preferably no trees either and no binary data. Fairly loose performance requirements.

What Apple says

The section on "what Apple says" is also gold, though I have to give credit to the Apple doc writers for managing to strongly suggest things without actually explicitly claiming them, strongly enough to fool this particular writer:
Apple’s high-level APIs can be much faster than ‘optimized’ code at lower levels.
This is obviously true, but completely meaningless. My tricycle can be faster than a Ferrari, for example if the Ferrari is out of gas or has a flat tire, or just parked. When we actually measured the performance of applications adopting CoreData at Apple we invariably got a significant performance regression. Then a lot of effort would be expended by all the teams involved in order to fix the regression, optimization effort that hadn't been expended on the original application, usually making up some of the shortfall.

I find the code reduction claim also specious, at least when stated as an unquestionable fact like this. In my experience, code size was reduced when removing CoreData and its predecessor EOF. Had we had exactly the requirements that CoreData/EOF were designed for (talking to existing relational enterprise databases), the result would almost certainly been different, but those were not our requirements, and I doubt that most iOS apps have those requirements (and in fact, CoreData didn't even support taking to external SQL databases, at all, a puzzling development for all the EOF veterans).

For managing small amounts of data inside in iOS application, CoreData is almost always overkill, because it effectively simulates an entire client/server enterprise application stack within your phone or desktop. The performance and complexity costs to pay for that overkill are substantial.

So Should You Use Core Data?

As I wrote: it depends.

The article in question at least adds no useful information to answering that question, only inappropriate analogies, repeated claims without any supporting evidence and lots of ad-hominems that are probably meant to be witty. If you believe its last section, the only reason developers don't use CoreData is because they are naive, lazy or ignorant.

This is evidently not true and ignores even the most basic concepts that one would apply to answering the question, such as, say, fitness for purpose. CoreData may well be the best implementation of an in-process-ORM simulating a client-server enterprise app there is, and there is good evidence that this is in fact true (certainly the people on it are top notch and some of the code is amazing). However, all that doesn't help if that's not what you need.

Considering the gleefully paraded ignorance of not just alternatives but various other programming aspects, I'd take the unconditional advice this article gives with a pinch of salt. Mountain-sized pinch, that is.