This will be a short article on how to tune your VM for maximum write performance. I'm not sure whether it is generic enough, but at least this is very useful when you have write-heavy workload in append-like mode, or when there are multiple files and you write into one after another (change files when some boundary has been crossed).
This is the case for Eblob - our the most widely used low-level local storage in Elliptics. Eblob can be turned into append system, which allows overwrite, but if new size is larger than old one, then object being written is copied into new location and more space is reserved for future writes. Old copy is marked as removed and will be deleted when defragmentation takes place.
Every such 'copy' within eblob reserves 2 times more space than current size of the object.
If you write into Elliptics using different keys each time, then they will be appended to the blob's end. When blob's size reaches its maximum size described in config (or reaches maximum number of elements, which is also configurable parameter), new blob is created.
Its time for small VM details here. There are
flush kernel processes, which main goal is to write your data from page cache to the disk. VM can be tuned to kick in this processes according to our demand.
The main issue here is that until data is written to disk, inode is locked. This means no further writes into given file is possible. In some cases reads are forbidden too - if page you want to read is not present in page cache, it has to be read from disk, and this process may lock inode too.
flush process completes you can not update data in the appropriate file. Thus main design goal is to split to-be-updated file and that one to be flushed to disk.
If we can make a prognosis on the amount of data written, we can tuned eblob to create new files frequently enough and tune VM to write old ones and do not touch currently being written.
Let's suppose we write 1Kb objects with 20 krps rate. This is about 20 MB/s write speed. Let's limit blob size to 1 Gb, this can be configured this way in elliptics config:
backend = blob blob_size = 1G
With 20 Mb/s write speed, new blob will be created roughly every 50 seconds.
Its time to tune VM now. We want
flush process to kick in frequently, but we do not want it to work with new data. Thus we want it to write (and lock) old blob files.
Let's say 100-second-old data is ok to be written to disk. Here is a set of sysctls for good write behaviour.
vm.dirty_background_bytes = 0 vm.dirty_background_ratio = 75 vm.dirty_bytes = 0 vm.dirty_expire_centisecs = 10000 vm.dirty_ratio = 80 vm.dirty_writeback_centisecs = 50
vm.dirty_expire_centisecs says how old dirty (or written) data should be, so that
flush process pushed it to disk. It is measured in centiseconds, i.e. 1/100 of second. In example above it is 100 seconds.
vm.dirty_ratio draws a boundary (in percents) of what page cache can take before flush is forced. If page cache is more than 80% then process' write will block flushing data to disk. If it is more than 75% background flush starts.
vm.dirty_writeback_centisecs says that
flush process should check dirty pages 2 times a second (50 centiseconds).
Above eblob + vm config and specified write load turns to behave the way we wanted - eblob creates new blob file every 50 seconds, 2 times per second kernel checks pages which are 100 seconds old (written 100 seconds ago) and flushes them to disk. In our example this will be '2 blobs ago' files.