Sometime ago we started a project which aim was to reduce memory consumption of eblob. It was decided that we should try to somehow decrease footprint of cache structures.
First we replaced some low-cardinality fields like
type in each cache enrty (
struct eblob_ram_control) with pointer to control
structure of blob file (
struct eblob_base_ctl). This change along with
improved alignment of structure decreased memory usage by 20% at cost of one
Second we removed LRU-cache from eblob code which was inefficient anyway. This
decreased memory usage by itself almost halved eblob memory consumption and
also allowed us to decrease size of another memory-hog structure
eblob_hash_entry by removing LRU list pointers (~15% savings). This slightly
increased CPU usage of our storage servers, but who cares - CPU there is almost
And last, but definitely not the least - we've introduced concept of so called
level-two-hashing. It based on assumption that our internal key representation
is 512bit long which is not very memory efficient assuming that we are
storing all keys for last open blob in memory and there can be many millions of
those there. So we assumed that for memory constrained environments we can just
store 32 or 64 bit hash of key instead of key itself in memory. We called this
thing `l2hash` (sorry guys - there is no l2hash blueprint for this feature).
L2hash also has some drawbacks which we will discuss further, so we made it
EBLOB_L2HASH(64) config flag.
It allows us to cut memory consumption of eblob by more than 40%. But this
comes at some cost.
First of all current hash function (we use
murmur2 hash) does not preserve
original key ordering so range requests currently are not supported when l2hash
is enabled. We will fix this in future versions by using order preserving level
two hashing function.
Also now that we only have hash of key at hand we need to go to disk to verify that it's really requested key, not a collision. But assuming that collisions are rare and we need to go to disk anyway in case of non-collision this isn't a big problem. Still this disk access is issued while holding global hash lock which is really bad for concurrency. We will mitigate this one by probably using rwlock for cache instead of global mutex which should improve general performance of eblob on concurrent workloads. Currently rwlock version of eblob is already handed to our QA team for testing.
L2hash first appeared in eblob
0.17.4, eblob_ram_control shrinkage was introduced in
0.17.8 and eblob_hash_entry shrinkage in