Reverbrain wiki

Site Tools


eblob:eblob:data-sort

Introduction

Couple of months ago we've rewritten eblob's defragmentation routine to include some additional “perks”. Previously defragmentation was responsible for only physically deleting entries marked as 'removed' in index (thus shrinking effective blob size and releasing any space held by removed entries) but now it's also sorts data by key and merges adjacent bases together if their size is below given threshold.

In-depth technical explanation of data-sort can be found in data-sort blueprint.

Background

Eblob is append mostly database which means most of the time new data is added to the end of “opened” base, also in some cases overwrites and appends will also cause data to be read-copy-appended. Even operations like removal only mark key as “deleted” in index, but do not free space used by data itself.

For ease of future discussion I firstly should introduce some of eblob's basic concepts. Let me start with index. Index is a set of tiny fixed length records (struct eblob_disk_control) that describe where data is located in data blob and how big it is. Each blob is split into number of data files each of those has an index. Only one blob file at a time receives writes - this blob called 'opened'. Once this blob file reaches some thresholds it 'closes' and it's index scheduled for sorting (sorting by key).

During normal operation eblob will write new records to data file and add record to corresponding index (called “unsorted index”), so order of records in data file is the same as order of records in index. After restart eblob will sort all those “unsorted indexes”, which is rather quick by itself and will allow fast lookups in future. But now order of records in index and data file won't match. This will impact performance of range queries and iterators.

Also on append/overwrite/remove heavy workloads data file fragments very quickly not releasing space back to operating system.

Implementation

Idea behind 'data-sort' is very simple - lets sort data blob along with index so keys that are close lexicographically will be close on disk which is almost always preferred behavior.

There were numerous problems that we needed to overcome to bring data-sort to life. First of all we needed fast algorithm to sort many gigabytes of data and possibly millions of keys while still serving data. We came up with modified merge sort - we replaced recursive decent sorting step with splitting of data file on almost fixed length pieces, quick sorting each piece in memory and then n-way merging them into final 'sorted' blob.

Log of removed entries

Here lies another problem - even 'closed' base can be modified - data can be overwritten in-place or some entries can be marked removed. So when data-sort is started we convert all modifications to base in question to read-copy-updates to the 'opened' base. So that the only thing we need to do is to keep track of removed entries in trivial in-memory 'log'. Before staring data-sort we turning mentioned 'log' on and after we've finished sorting we are 'applying' this log on top of resulting 'sorted' base by removing all logged keys from it.

Data-sort vs defragmentation

In eblob prior version 0.18.0 there was process called defragmentation. It was similar to some SQL's VACUUM or some NoSQL's compaction procedure. During which we iterate over data file copying it's content to new file skipping removed records thus defragmenting it. Then we simply swap old file with new one and delete old one.

We replaced this defragmentation with more sophisticated process called data-sort which not only physically removes marked-as-deleted entries but also sorts data by key in both index and data-file.

It's worth mentioning again that starting with 0.18.0 eblob's data-sort also implies defragmentation, and there is no way to only defragment blob without sorting it's data. For purpose of simple defragmentation offline utility eblob_merge can be used.

NB! For sorted bases we only starting defragmentation only if base meets either defrag_percentage threshold or it's way too small so there is good chance it gets “merged” with another base.

Data-sort pros and cons

Data-sort have following pros and cons:

Pros:

  • data-sort provides defragmentation, thus freeing space on file system.
  • data-sort will also flush part of in-memory cache to disk, freeing some amount of RAM. To see how much memory will be freed by data sort one can use statistics watching for space occupied by memory_index_tree;
  • data-sort will try to merge small bases into one so reducing number of files in blob database and improving read speeds;
  • Index key order will match data file key order which greatly speeds up iteration and range requests (on real world tests we see difference around an order of magnitude).

Cons:

  • data-sort is rather heavy process which may negatively impact eblob performance for it's whole duration. Moreover the final step of data-sort requires almost all locks inside eblob code to be held, which can basically stall all operations for about a second. Operations that were in flight during that swap step will fail with -EAGAIN.
  • If data was meant to be read with the same patterns that was used to write it and those patterns are far from lexicographical then data-sort will do more harm then good.

Also it's worth mentioning that after first data-sort eblob will switch to optimized mode that defragments blob faster because “sort” step can be skipped.

Usage

There are different ways of using data-sort

  • Use via long time provided API knob called eblob_start_defrag(). With elliptics one can just call dnet_ioclient -d start -r $(hostname -f):1024:2 to invoke defragmentation which is now backed by data-sort.
  • Use config flag EBLOB_AUTO_DATASORT (128) which will trigger automatic data-sort on eblob database open and after each blob close. This flag is the easiest way to use data-sort in production - data will be sorted as long as blob is closed.
  • Use config flag EBLOB_TIMED_DATASORT (256) which will invoke data-sort every defrag_timeout seconds starting from eblob_init call.
  • Use config flag EBLOB_SCHEDULED_DATASORT (512) which will invoke data-sort every day at defrag_time hours +- defrag_splay hours. This is useful for distributed clusters that you don't want to manage externally trough eblob_start_defrag call but still want data-sort to be performed in timely manner and not affect cluster during it's “rush hour”.

Availability

Data-sort superseded defragmentation starting from eblob 0.18.0 Data-sort gained the ability to merge adj. bases in version 0.21.6

eblob/eblob/data-sort.txt · Last modified: 2013/09/14 18:49 by savetherbtz