Reverbrain wiki

Site Tools


elliptics:configuration

Configuration


Elliptics has 2 types of nodes: client and server. The main difference between them is that the client node is not a member of the storage network. Server gets it's configuration from specified file and client gets it through C++ API and Python API as a parameters and they have different properties.

Server node configuration


Server gets it'c configuration from JSON file that was specified at start. Configuration file contains several sections responsible for different aspects of elliptics operation:

All directories specified in the configuration file must be created manually before use.

Elliptics logger section

Property Description
frontends = [ … ] Describes frontends where logs have to be pushed, this section is described at blackhole's documentation.
level = “debug” Sets verbosity level of elliptics logs.
It doesn't corresponds to syslog levels. When logging to syslog almost all messages will be written with INFO level. Elliptics supports follow levels:
“error”, “warning”, “info”, “notice”, “debug”.

Elliptics options section

Property Description
join = true Specifies whether to join storage network.
Server nodes should use true. Client nodes don't use this parameter or set it to false.
flags = 4 Bitset flags influences to server behaviour.
Bits can be set in any variations, but in case of bits 2 and 5 set both, 2 will be used. It should include follow bits:
bit 1 (flags=2) - do not request remote route table
bit 2 (flags=4) - mix states before read operations according to state's weights
bit 3 (flags=8) - do not checksum data on upload and check it during data read
bit 4 (flags=16) - do not update metadata at all
bit 5 (flags=32) - randomize states for read requests
bit 6 (flags=64) - keeps ids in elliptics cluster
remote = [ “1.2.3.4:1025:2”, “2.3.4.5:2345:2” ] List of remote nodes to connect.
Address should have follow format: `address:port:family` where family is either 2 (AF_INET) or 10 (AF_INET6) and address can be host name or IP of remote node.
address = [ “localhost:1025:2-0”, “10.10.0.1:1025:2-1” ] List of node address which enables multiple interface support.
Elliptics server listens on port specified in this config option (format: addr:port:family-route_group)
If ports differ, the last one will be used.
Elliptics server will accept connections from any address, but it has to understand which addresses of the other connected joined servers it has to send to newly accepted client. Thus we 'join' multiple addresses on every node into 'logical route tables' which are indexed by the last number in addr config option. Thus, format becomes: local_address:port:family-route_group.
Addresses in the same route group on different servers will be put into the same route tables, thus when client in example below connects to localhost, it will receive (and connect to) addresses from the logical route group 0 (whatever addresses will be put into that route group on other servers).
Let's suppose we have 3 connected servers with the following addresses:
srv0: [ “10.0.0.17:1025:2-0”, “192.168.0.17:1025:2-1”, “20.20.20.17:1025:2-2” ]
srv1: [ “10.0.0.34:1025:2-0”, “192.168.0.34:1025:2-1”, “20.20.20.34:1025:2-2”]
srv2: [ “15.14.13.12:1025:2-0”, “99.99.99.99:1025:2-1”, “111.111.111.111:1025:2-2” ]
When client connects to srv1 to IP address 192.168.0.34:1025, it will receive (and connect to) following route table:
[ “192.168.0.17:1025”, “192.168.0.34:1025”, “99.99.99.99:1025” ]
Because above addresses are in the same logical route group and client connected to one of the addresses in that logical route group.
`addr` is a list of interfaces that server binds
wait_timeout = 60 Specifies number of seconds to wait for command completion
check_timeout = 60 Specifies number of seconds to wait before killing unacked transaction. This parameter is used also for some service operations such as update the routing server, check network connections an so on.
io_thread_num = 16 Specifies number of threads in processing IO pool.
Typically, value of this parameter should be comparable with the number of hardware processing cores.
Usually all input-output operations (like read, write, remove, indexes etc.) are executed by these threads. IO operation could be executed not by IO threads if they were requested with nolock IO flag therefore it will be executed by nonblocking processing pool.
nonblocking_io_thread_num = 16 Sets number of IO threads in processing pool dedicated to nonblocking operations.
They are invoked from recursive commands like DNET_CMD_EXEC, when script tries to read/write some data using the same id/key as in original exec command. Typically, value of this parameter should be comparable with the number of hardware processing cores. Nonblocking operations include follow operations:
- Lookup operations
- Gathering statistics (stat_log, stat_log_count)
- Gathering monitor statistics (monitor_stat)
- Different execs and replies
- Route list exchange
- Iterators
- Updating status
net_thread_num = 4 Specifies number of threads in network processing pool that would accepts new connections, checks and reads data from them.
daemon = false Specifies whether to go into background
auth_cookie = “qwerty” Sets authentication cookie.
The cookie is not meant for real authentification, it is transferred in plain text without encryption instead it is used to check server configurations - servers with different cookies can not connect to each other, and thus node can not join cluster with different cookie client can connect to any cluster no matter what cookie is used auth_cookies is a byte array with 32-byte length.
bg_ionice_class = 3 Background jobs (replica checks and recovery) IO priorities ionice for background operations (disk scheduler should support it).
class - number from 0 to 3:
0 - default class
1 - realtime class
2 - best-effort class
3 - idle class
bg_ionice_prio = 0 Specifies prio - number from 0 to 7, sets priority inside class
server_net_prio = 1
client_net_prio = 6
Specifies IP priorities. Man 7 socket for IP_PRIORITY.
server_net_prio is set for all joined (server) connections.
client_net_prio is set for other connection.
It is only turned on when non zero.
indexes_shard_count = 2 Specifies index shard count.
Every index is being split to this number of 'shards'. Shards are likely to be spread over your cluster evenly, but if number of servers is less than number of shards, this will slow the whole index operations noticebly.
For example, if you have 1 node and 10 shards, index 'find' will be 10 times slower than 1-shard-setup. Otherwise, if number of servers is high enough, the more shards you have more parallel is index processing.
Operations like 'update' (put key into index) or 'set' (put key into index if key is not present already) does not depend on number of shards, this is single read+read+write operation anyway (at worst).
monitor = {} Specifies monitoring section according to Elliptics monitoring subsection
srw_config = “/path/to/cocaine/config/file.json” Specifies path to configuration file for srw worker.
Elliptics uses cocaine engine https://github.com/organizations/cocaine for its server-side workers.
srw_config should point to its configuration file, example config lives in tree in example/library_config.json file.
If you use this parameter, cocaine should be installed and configured with a confidence, or elliptics won't start.
parallel = true Specifies if backends should be inited parallely at start, default value is true.
cache = {} Specifies cache settings according to Elliptics cache subsection cache subsection description]]

Elliptics monitoring subsection

port = 20000 Specifies port that should be listen by monitor subsystem.
The port will be used by monitor built-in HTTP mini server. If monitor_port isn't specified or is equal to 0 then monitor subsystem won't be created and started.
call_tree_timeout = 0 Only aggregate (and show in monitoring) call trees which were created less than call_tree_timeout seconds ago. Default is 0 - aggregate all trees.

Elliptics cache subsection

Property Description
size = 102400 Sets cache maximum size.
sync_timeout = 30 Sets cache synchronization timeout.
Elliptics flushes data to disk from cache after the sync_timeout in case if write command was performed with DNET_IO_CACHE flag.
shards = 16 Sets number of independent caches.
Elliptics splits all keys to shards exclusive independent caches. It makes possible to perform several operation with cache on multi-core systems.
pages_proportions = [ 4, 2 ] Sets proportions of slru pages's sizes.

Elliptics backends section

Elliptics can operate with several backends simultaneously. For that purpose 'backends' section of configuration file is an array of backends' configurations.

Each backend has parameters different from parameters of another backend, but some of them are equal for all of backends. At the moment Elliptics supports 'eblob' backend, there were number of others including filesystem, smack, tokyo cabinet and dynamic, but they were dropped due to performance reasonds. Config section for each are described below.

Generic backend options

Property Description
backend_id = 5 Specifies the id of backend, this id must be unique within the current node
enable = true Specifies if backend should be initialized at start, default value is true
type = “blob” Specifies the type of backend
group = 1 Backend will store keys for this group.
Group is a synonim for replica or copy. If multiple servers with the same group ID join elliptics network, group will contain multiple nodes and load will be spread to those servers. If multiple servers with different group IDs join elliptics network, there will be multiple groups and thus client will be able to write multiple copies.
history = “/path/to/history” Specifies history environment directory.
It will host file with generated IDs for this backend. The history directory should be created manually before use.

Eblob backend section

Backend's type is “blob”.

Property Description
sync = 0 Sets number of seconds for the outcome of which metadata will be synced.
Zero here means 'sync on every write'.
Positive number means file writes are never synced and metadata is synced every `sync` seconds. It is highly recommended not to specify this timeout at all (it is -1 by default), since it may heavily affect performance. Using default value relies on OS and kernel to sync data to disk.
data = “/path/to/blob/data” Specifies eblob objects prefix.
System will append .NNN and .NNN.index to new blobs. Path to blobs should be created manually before use.
For example, if prefix is `/tmp/blob/data`, path `/tmp/blob` should be created.
blob_flags = 1 Specifies blob processing flags bitset (bits start from 0).
bit 0 - if set, eblob reserves 10% of total space or size of the blob (which is bigger). By default it is turned off and eblob only reserves size of the blob. This is useful (required) to be able to run defragmentation.
bit 1 - deprecated overwrite commits mode. Starting from eblob 0.20.0 it's default behaviour.
bit 2 - deprecated overwrite mode. Starting from eblob 0.20.0 it's default behaviour.
bit 3 - do not append checksum footer - this saves 72 bytes per written record. This also disables checksum.
bit 4 - do not check whether system has enough space for the new blob. If this bit is not set following checks are performed for every write call:
If blob_size_limit is specified in config and is not zero, total eblob size (sum of all blobs and indexes) plus size of the record to be written must be smaller than blob_size_limit
If blob_size_limit is not set (or is zero) eblob write checks available free space: if bit 0 is not set, then write call checks if available free space is large enough to hold 2 blobs (2 * blob_size bytes are reserved for eblob data sorting). If bit 0 is set at least 10% of the underlying storage must empty.
bit 5 - reserved for internal use, do not set.
bit 6 - use second hashing layer - reduces memory usage for in-memory eblob index (costs some IOPS).
bit 7 - auto data-sort. When set, eblob will perform data sorting as well as defragmentation and index sorting on startup and on blob close. This is prefered “set and forget” behaviour for small databases.
bit 8 - timed data-sort. When set eblob will run data-sort on every non-sorted blob each `defrag_timeout` seconds. This is legacy behaviour for compatibility with old setups.
bit 9 - scheduled data-sort. When set eblob will run data sort starting at `defrag_time` +/- `defrag_splay` hours. Probably one hould set this values to match clusters “mostly-idle” hours. This option was introduced to “spread” defragmentation load across nodes in cluster in time.
bit 10 - disables starting permanent threads (sync, defrag, periodic).
bit 11 - enables automatic index-only-sort which will kick-in on base's “close”.

This is prefered “set and forget” behaviour for not-so-big clusters. For very big clusters it's recommended to disable all `auto-data-sort` features and manually run 'dnet_ioclient … -d' command with external synchronization. Also data-sort will try to defragment even already sorted blobs if they've reached `defrag_percentage` fragmentation or they are way too small so there is good probability they will be merged into one.
iterate_thread_num = 4 Sets number of threads used to populate data into RAM at startup.
This greatly speeds up data-sort/defragmentation and somehow speeds up startup.
Also this threads are used for iterating by start_iterator request.
Default: 1
blob_size = “10G” Specifies maximum blob size.
New file will be opened after current one grows beyond `blob_size` limit.
Supports K, M and G modifiers.
records_in_blob = 10000000 Specifies maximum number of records in blob.
When number of records reaches this level, blob is closed and sorted index is generated. Its meaning is similar to above `blob_size`, except that it operates on records and not bytes.
Both parameters `blob_size` and `records_in_blob` can be used together or separately.
defrag_timeout = 3600 Specifies timeout for data-sort process to start.
Data-Sort/Defragmentation operation is rather costly (even if nothing is going to be copied, defragmentation still checks every index to determine number of removed keys).
It is recommended to set it to hours (it is in seconds) or more>
NB! Only works if bit 8 of config flags is set.
Default: -1 or none.
defrag_percentage = 25 Specifies percentage of removed entries (compared to number of all keys in blob) needed to start defragmentation.
If blob is already sorted and number of removed keys is less than (removed + not removed) * $defrag_percentage / 100 then defragmentation process will skip given blob.
defrag_time = 3
defrag_splay = 3
Schedules defragmentation start time and splay.
Both time and splay are specified in hours in local timezone, so that on big clusters defragmentation load could be spread in time at “mostly-idle” hours.
NB! Only works if bit 9 of config flags is set.
blob_size_limit = “10G” Specifies maximum size whole eblob can occupy on disk.
Basically, this is the maximum size eblob data directory can occupy on disk.
index_block_size = 40
index_block_bloom_length = “128 * 40”
Specifies bloom filter parameters:
index_block_size - number of records from index file, which are hashed into one bloom filter. Eblob splits all records from sorted index file into chunks, each chunk has start and finish keys only and bloom filter which says whether requested entry can be found in given chunk.
index_block_bloom_length - number of bits per chunk, it should be at least as twice as number of records in chunk.
Default values:
index_block_size = 40
index_block_bloom_length = 128 * 40
periodic_timeout Specifies timeout for periodic process to start.
Periodic process writes eblob statistics to 'data.stat' file and caches json statistics.
This parameter allows to specify how often these statistics should be updated. Any time consumer of these statistics has an access to 'data.stat' file and cached json statistics generated by the last run of periodic process.

Example of server node configuration file

Example of configuration file can be found at /usr/share/doc/elliptics/examples/ioserv.json.

Please, make note that currently only one backend may be enabled per time, so remove one of backends from configuration below:

ioserv.json
{
	"logger": {        
            "frontends": [
                {
                    "formatter": {
                        "type": "string",
                        "pattern": "%(timestamp)s %(request_id)s/%(lwp)s/%(pid)s %(severity)s: %(message)s %(...L)s"
                    },
                    "sink": {
                        "type": "files",
                        "path": "/dev/stdout",
                        "autoflush": true,
                        "rotation": {
                            "move": 0
                        }
                    }
                }
            ],
            "level": "debug"
	},
	"options": {
		"join": true,
		"flags": 20,
		"remote": [
			"localhost:1025:2",
			"localhost:1026:2",
			"localhost:1027:2",
			"localhost:1028:2"
		],
		"address": [
			"localhost:1025:2-0"
		],
		"wait_timeout": 60,
		"check_timeout": 60,
		"io_thread_num": 16,
		"nonblocking_io_thread_num": 16,
		"net_thread_num": 4,
		"daemon": false,
		"auth_cookie": "qwerty",
		"bg_ionice_class": 3,
		"bg_ionice_prio": 0,
		"server_net_prio": 1,
		"client_net_prio": 6,
		"cache": {
			"size": 68719476736
		},
		"indexes_shard_count": 2,
		"monitor": {
			"port": 20000,
			"call_tree_timeout": 0
		}
	},
	"backends": [
		{
			"backend_id": 1,
			"cache": {
				"size": 0
			},
			"type": "blob",
			"group": 2,
			"history": "/opt/elliptics/history.1",
			"data": "/opt/elliptics/eblob.1/data",
			"sync": "-1",
			"blob_flags": "0",
			"blob_size": "10G",
			"records_in_blob": "1000000",
			"periodic_timeout": 15,
			"defrag_percentage": 10,
			"defrag_timeout": 3600
		},{
			"backend_id": 2,
			"cache": {
				"size": 0
			},
			"type": "blob",
			"group": 2,
			"history": "/opt/elliptics/history.2",
			"data": "/opt/elliptics/eblob.2/data",
			"sync": "-1",
			"blob_flags": "0",
			"blob_size": "10G",
			"records_in_blob": "1000000",
			"periodic_timeout": 15,
			"defrag_percentage": 10,
			"defrag_timeout": 3600
		}
	]
}

Client node configuration


Client node uses just some parameters from elliptics-core section of server config file. Client node should get follow parameters:

log = /path/to/log/file Sets destination for elliptics logs.
Set to 'syslog' without inverted commas if you want elliptics to log through syslog. If you want write log to the file, create appropriate directory first.
log_level = 4 Sets verbosity level of elliptics logs.
It doesn't corresponds to syslog levels. When logging to syslog almost all messages will be written with INFO level. Elliptics supports follow levels:\\- DNET_LOG_DATA = 0\\- DNET_LOG_ERROR = 1\\- DNET_LOG_INFO= 2\\- DNET_LOG_NOTICE = 3\\- DNET_LOG_DEBUG = 4
join = 0 Specifies whether to join storage network.
Client nodes set it to 0. Server nodes should use 1.
remote = 1.2.3.4:1025:2 2.3.4.5:2345:2 List of remote nodes to connect.
Address should have follow format: `address:port:family` where family is either 2 (AF_INET) or 10 (AF_INET6) and address can be host name or IP of remote node.
wait_timeout = 60 Specifies number of seconds to wait for command completion
check_timeout = 60 Specifies number of seconds to wait before killing unacked transaction. This parameter is used also for some service operations such as update the routing server, check network connections an so on.
io_thread_num = 16 Specifies number of threads in processing IO pool.
Typically, value of this parameter should be comparable with the number of hardware processing cores.
Usually all input-output operations (like read, write, remove, indexes etc.) are executed by these threads. IO operation could be executed not by IO threads if they were requested with nolock IO flag therefore it will be executed by nonblocking processing pool.
nonblocking_io_thread_num = 16 Sets number of IO threads in processing pool dedicated to nonblocking operations.
They are invoked from recursive commands like DNET_CMD_EXEC, when script tries to read/write some data using the same id/key as in original exec command. Typically, value of this parameter should be comparable with the number of hardware processing cores. Nonblocking operations include follow operations:
- Lookup operations
- Gathering statistics (stat_log, stat_log_count)
- Gathering monitor statistics (monitor_stat)
- Different execs and replies
- Route list exchange
- Iterators
- Updating status
net_thread_num = 4 Specifies number of threads in network processing pool that would accepts new connections, checks and reads data from them.
elliptics/configuration.txt · Last modified: 2017/06/15 15:25 by zbr