Tuesday, March 27, 2012

30k entries (II), aka computers have RAM, and they can do I/O, too...

Let's assume a document storage system with an assumed maximum working set of 30K documents. Let's also assume we want to store some tags, maybe 10 per document, encoded as 32 bit integers (8 bits tag type, 24 bits tag value used as an index). That would be:
    30K documents x 10 tags/document x 4 bytes/tag = 
                           300K tags x 4 bytes/tag =
                                      1200 K bytes = 1.2 MB
Even assuming 2:1 bloat due to to overhead gives us 2.4 MB, which should not just fit comfortably into the RAM of a modern computer or a cellphone, it actually fits comfortably into the L3 cache of an Intel Core i7 with 8-10MB to spare.

What about getting that data into RAM? The slowest hard drives (non-SSD) I could find using a quick web search had a transfer rate of better than 48MB/s and a seek time of around 10ms, so the 2.4MB in question should be in memory in around:

 10ms + 2.4MB / (48MB/s) = 
           10ms + 0.05 s =
           10ms +  50 ms =  60 ms
So less than 1/10th of a second to read it in, and a moderately fast SSD reduces that to 10ms.

EDIT: fixed embarrassing typo (L1 -> L3 cache).

No comments: