Reverbrain wiki

Site Tools


eblob:eblob:l2hash

Introduction

Sometime ago we started a project which aim was to reduce memory consumption of eblob. It was decided that we should try to somehow decrease footprint of cache structures.

eblob_ram_control

First we replaced some low-cardinality fields like index_fd, data_fd, index and type in each cache enrty (struct eblob_ram_control) with pointer to control structure of blob file (struct eblob_base_ctl). This change along with improved alignment of structure decreased memory usage by 20% at cost of one memory dereference.

Disk cache and eblob_hash_entry

Second we removed LRU-cache from eblob code which was inefficient anyway. This decreased memory usage by itself almost halved eblob memory consumption and also allowed us to decrease size of another memory-hog structure eblob_hash_entry by removing LRU list pointers (~15% savings). This slightly increased CPU usage of our storage servers, but who cares - CPU there is almost idle anyway.

L2hash

And last, but definitely not the least - we've introduced concept of so called level-two-hashing. It based on assumption that our internal key representation is 512bit long which is not very memory efficient assuming that we are storing all keys for last open blob in memory and there can be many millions of those there. So we assumed that for memory constrained environments we can just store 32 or 64 bit hash of key instead of key itself in memory. We called this thing `l2hash` (sorry guys - there is no l2hash blueprint for this feature). L2hash also has some drawbacks which we will discuss further, so we made it optional with EBLOB_L2HASH(64) config flag. It allows us to cut memory consumption of eblob by more than 40%. But this comes at some cost.

First of all current hash function (we use murmur2 hash) does not preserve original key ordering so range requests currently are not supported when l2hash is enabled. We will fix this in future versions by using order preserving level two hashing function.

Also now that we only have hash of key at hand we need to go to disk to verify that it's really requested key, not a collision. But assuming that collisions are rare and we need to go to disk anyway in case of non-collision this isn't a big problem. Still this disk access is issued while holding global hash lock which is really bad for concurrency. We will mitigate this one by probably using rwlock for cache instead of global mutex which should improve general performance of eblob on concurrent workloads. Currently rwlock version of eblob is already handed to our QA team for testing.

Availability

L2hash first appeared in eblob 0.17.4, eblob_ram_control shrinkage was introduced in 0.17.8 and eblob_hash_entry shrinkage in 0.18.0.

eblob/eblob/l2hash.txt ยท Last modified: 2013/03/11 12:04 by savetherbtz