Reverbrain wiki

Site Tools


Eblob Roadmap

  • Remove type support

This will remove mostly unused and hacky columns (aka types) from eblob. It will decrease number of code lines and it's complexity, but increase speed and robustness. Instead of using columns one can dedicate subset of key bits as column name. If subset was selected from the start of key it will behave like column-ordered database, otherwise it'll be like row-ordered one (all these only after data-sort is finished. Until data-sort is finished ).

  • Add writev()-like interface.

Right now we have enormous amount of code inside elliptics eblob_backend to implement new extended data format. It can be simplified by providing writev-like interface. This also can greatly speedup common chunk write patterns.

  • Remove compression.

Compression is rather useless because it's per record, not per block, so it inefficient in both terms of compression and resource usage. Also it adds additional dependency on libsnappy. And decompressing data locally have disadvantage of not saving network bandwidth.

  • Data-sort that merges adj. bases.

For now number of bases almost constantly increases even with defragmentaion. The only way number of bases can drop down is when one base becomes completely empty. So it'll be really good idea to merge of two adj. bases if they are under some size threshold or total number of bases went over some configurable limit. As side effect this will improve access times for read because it will potentially look into less number of bases.

  • Per base statistics.

For now only source of statistics is .stat file that we are reading during each start, but this logic is broken since first defragmentation run. On start we should init disk/removed/hashed stats and then maintain it during whole period of work. This will enable us to compute defragmentation thresholds without having to run additional iterator pass. Also this will actualize data.stat file even after defragmentation run.

  • Improved run-time statistics.

Add useful statistics like: like number of writes/reads/removes, histograms of response times, some basic OS-level monitoring like disk utilization, etc.

  • Metadata for each base.

It would be nice if we would have small amount of data associated with each base, a fixed-size header like for example superblock is for filesystem. Adding headers to an existing bases is not a trivial task so it's better to just add file to each base that will store various information, for example, blob format version, various flags (e.g. that blob is sorted, instead of using .data_is_sorted mark), statistics and other stuff. Also we can store there some precomputed data like bloom filter.

  • Warmup before start.

We can also periodically save mincore maps of blobs/indexes to the disk so on cold start up we can warmup with the same data that was in memory last lime we checked.

  • Immutable “closed” bases.

Right now eblob is append-mostly storage, so old data and metadata can be overwritten. If the only operation that we can preform on “closed” blob could be removal of entries we can greatly simplify eblob code, especially data-sort.

  • Remove binlog.

If we'll have immutable bases we can safely remove binlog - we only need to store removals which should fit in memory.

  • Zero-syscall read path.

We can replace all reads with mmap-based access for performance reasons. NB! Needs extensive study.

  • Optimized bloom filter

Right now we only use 2 hash functions and waste 128bits of memory per key. We can use 14 bits with only 20 hash functions and gain much better performance.

  • Preallocate base

We can try to preallocate base in small chunks (e.g. 1G) and see if it makes a difference on ext4.

  • Separate eblob_fsck utility

Currently we are using eblob_merge for that purposes that gained ability to handle various types of index and data file corruptions. We can do better and provide standalone tool that specializes on checking consistency and repairing hole eblob databases.

  • Smarter auto-repair.

Currently on init eblob iterates over non-sorted indexes and if it finds corruption then it just truncates index to the last non-corrupted entry. We may do better by for example marking entries as corrupted or shifting index left thus replacing corrupted records.

  • Thread for data punching

We can periodically use FALLOC_FL_PUNCH_HOLE on removed data to free space until data-sort thresholds are met, thus freeing space even without “heavy” defragmentation run.

roadmap/eblob.txt · Last modified: 2013/09/02 02:40 by savetherbtz