Measuring quickly on my laptop reveals that strcmp() takes around 8ns for a long matching string and 2ns for a non-match in the first character (with first character optimization). Splitting the difference and thus not taking into account that non-matches tend to be more common than matches, let's assume 5ns to compare each tag.
5ns /tag x 10 tags / document x 30k documents = 50ns / document x 30K documents = 1500K ns = 1500 µs = 1.5 msSo an approach that takes longer than, say, 2ms to do such a search can probably be improved.
Of course, we could do something slightly less thoroughly braindead and represent tags using integer, er, tags. A simple integer comparison should be less then one nanosecond, so that would drop the time to below 300 µs. With that, we could do 3000 queries per second, or 300 queries every tenth of second (the generally accepted threshold for interactive performance).
In theory, we could actually start optimizing ever so slightly by storing lists of document ids with each tag and then simply doing set operations on the document lists stored with each tag. But we don't really have to.
No comments:
Post a Comment