This will remove mostly unused and hacky columns (aka
types) from eblob. It will decrease number of code lines and it's complexity, but increase speed and robustness. Instead of using columns one can dedicate subset of key bits as column name. If subset was selected from the start of key it will behave like column-ordered database, otherwise it'll be like row-ordered one (all these only after data-sort is finished. Until data-sort is finished ).
Right now we have enormous amount of code inside elliptics
eblob_backend to implement new extended data format. It can be simplified by providing
writev-like interface. This also can greatly speedup common chunk write patterns.
Compression is rather useless because it's per record, not per block, so it inefficient in both terms of compression and resource usage. Also it adds additional dependency on
And decompressing data locally have disadvantage of not saving network bandwidth.
For now number of bases almost constantly increases even with defragmentaion. The only way number of bases can drop down is when one base becomes completely empty. So it'll be really good idea to merge of two adj. bases if they are under some size threshold or total number of bases went over some configurable limit. As side effect this will improve access times for read because it will potentially look into less number of bases.
For now only source of statistics is
.stat file that we are reading during each start, but this logic is broken since first defragmentation run. On start we should init disk/removed/hashed stats and then maintain it during whole period of work. This will enable us to compute defragmentation thresholds without having to run additional iterator pass. Also this will actualize
data.stat file even after defragmentation run.
Add useful statistics like: like number of writes/reads/removes, histograms of response times, some basic OS-level monitoring like disk utilization, etc.
It would be nice if we would have small amount of data associated with each base, a fixed-size header like for example superblock is for filesystem. Adding headers to an existing bases is not a trivial task so it's better to just add file to each base that will store various information, for example, blob format version, various flags (e.g. that blob is sorted, instead of using
.data_is_sorted mark), statistics and other stuff. Also we can store there some precomputed data like bloom filter.
We can also periodically save
mincore maps of blobs/indexes to the disk so on cold start up we can warmup with the same data that was in memory last lime we checked.
Right now eblob is append-mostly storage, so old data and metadata can be overwritten. If the only operation that we can preform on “closed” blob could be removal of entries we can greatly simplify eblob code, especially
If we'll have immutable bases we can safely remove binlog - we only need to store removals which should fit in memory.
We can replace all reads with
mmap-based access for performance reasons. NB! Needs extensive study.
Right now we only use 2 hash functions and waste 128bits of memory per key. We can use 14 bits with only 20 hash functions and gain much better performance.
We can try to preallocate base in small chunks (e.g. 1G) and see if it makes a difference on ext4.
Currently we are using eblob_merge for that purposes that gained ability to handle various types of index and data file corruptions. We can do better and provide standalone tool that specializes on checking consistency and repairing hole eblob databases.
Currently on init eblob iterates over non-sorted indexes and if it finds corruption then it just truncates index to the last non-corrupted entry. We may do better by for example marking entries as corrupted or shifting index left thus replacing corrupted records.
We can periodically use
FALLOC_FL_PUNCH_HOLE on removed data to free space until
data-sort thresholds are met, thus freeing space even without “heavy” defragmentation run.