Couple of months ago we've rewritten eblob's defragmentation routine to include some additional “perks”. Previously defragmentation was responsible for only physically deleting entries marked as 'removed' in index (thus shrinking effective blob size and releasing any space held by removed entries) but now it's also sorts data by key and merges adjacent bases together if their size is below given threshold.
In-depth technical explanation of data-sort can be found in data-sort blueprint.
Eblob is append mostly database which means most of the time new data is added to the end of “opened” base, also in some cases overwrites and appends will also cause data to be read-copy-appended. Even operations like removal only mark key as “deleted” in index, but do not free space used by data itself.
For ease of future discussion I firstly should introduce some of eblob's basic
concepts. Let me start with index. Index is a set of tiny fixed length records
struct eblob_disk_control) that describe where data is located in data blob
and how big it is. Each blob is split into number of data files each of those
has an index. Only one blob file at a time receives writes - this blob called 'opened'.
Once this blob file reaches some thresholds it 'closes' and it's index scheduled for
sorting (sorting by key).
During normal operation eblob will write new records to data file and add record to corresponding index (called “unsorted index”), so order of records in data file is the same as order of records in index. After restart eblob will sort all those “unsorted indexes”, which is rather quick by itself and will allow fast lookups in future. But now order of records in index and data file won't match. This will impact performance of range queries and iterators.
Also on append/overwrite/remove heavy workloads data file fragments very quickly not releasing space back to operating system.
Idea behind 'data-sort' is very simple - lets sort data blob along with index so keys that are close lexicographically will be close on disk which is almost always preferred behavior.
There were numerous problems that we needed to overcome to bring data-sort to life. First of all we needed fast algorithm to sort many gigabytes of data and possibly millions of keys while still serving data. We came up with modified merge sort - we replaced recursive decent sorting step with splitting of data file on almost fixed length pieces, quick sorting each piece in memory and then n-way merging them into final 'sorted' blob.
Here lies another problem - even 'closed' base can be modified - data can be overwritten in-place or some entries can be marked removed. So when data-sort is started we convert all modifications to base in question to read-copy-updates to the 'opened' base. So that the only thing we need to do is to keep track of removed entries in trivial in-memory 'log'. Before staring data-sort we turning mentioned 'log' on and after we've finished sorting we are 'applying' this log on top of resulting 'sorted' base by removing all logged keys from it.
In eblob prior version
0.18.0 there was process called defragmentation. It was similar to some SQL's
VACUUM or some NoSQL's
compaction procedure. During which we iterate over data file copying it's content to new file skipping removed records thus defragmenting it. Then we simply swap old file with new one and delete old one.
We replaced this defragmentation with more sophisticated process called
data-sort which not only physically removes marked-as-deleted entries but also sorts data by key in both index and data-file.
It's worth mentioning again that starting with
data-sort also implies defragmentation, and there is no way to only defragment blob without sorting it's data. For purpose of simple defragmentation offline utility eblob_merge can be used.
NB! For sorted bases we only starting defragmentation only if base meets either
defrag_percentage threshold or it's way too small so there is good chance it gets “merged” with another base.
Data-sort have following pros and cons:
data-sortprovides defragmentation, thus freeing space on file system.
data-sortwill also flush part of in-memory cache to disk, freeing some amount of RAM. To see how much memory will be freed by data sort one can use statistics watching for space occupied by
data-sortwill try to merge small bases into one so reducing number of files in blob database and improving read speeds;
data-sortis rather heavy process which may negatively impact eblob performance for it's whole duration. Moreover the final step of
data-sortrequires almost all locks inside eblob code to be held, which can basically stall all operations for about a second. Operations that were in flight during that swap step will fail with
data-sortwill do more harm then good.
Also it's worth mentioning that after first
data-sort eblob will switch to optimized mode that defragments blob faster because “sort” step can be skipped.
There are different ways of using data-sort
eblob_start_defrag(). With elliptics one can just call
dnet_ioclient -d start -r $(hostname -f):1024:2to invoke defragmentation which is now backed by data-sort.
EBLOB_AUTO_DATASORT (128)which will trigger automatic data-sort on eblob database open and after each blob close. This flag is the easiest way to use data-sort in production - data will be sorted as long as blob is closed.
EBLOB_TIMED_DATASORT (256)which will invoke data-sort every
defrag_timeoutseconds starting from
EBLOB_SCHEDULED_DATASORT (512)which will invoke data-sort every day at
defrag_splayhours. This is useful for distributed clusters that you don't want to manage externally trough
eblob_start_defragcall but still want data-sort to be performed in timely manner and not affect cluster during it's “rush hour”.
Data-sort superseded defragmentation starting from eblob
Data-sort gained the ability to merge adj. bases in version