Reverbrain wiki

Site Tools



Everything is built on the asynchronous API model.

Interfaces, classes and structures are described in the corresponding header files.

Node is the main controlling structure at the backend and the client side of Elliptics. Node can be described by the dnet_node structure or by the client class as well.

Connection between the client and the cluster in C++.

Data structures used at the working process of the applications are described in Data structures and flags.

dnet_node structure

dnet_node structure is required for the work of all functions (it’s an analog of the ioremap::elliptics::node class in C++). dnet_node describes the client’s network connection to the cluster. If you want to set connection to the cluster you’ll need to create a node via dnet_node_create() function. Logger will be passed in this fuction. The most popular logger is the file logger. And you can create your own similar logger as well.

Node with the logger is an abstract object that can write to the log about its condition. You need to add the network connection via dnet_add_state() function if you want to connect this object with the cluster. The structure describing the address will be passed to dnet_add_state() function.

int dnet_add_state(struct dnet_node *n, char *addr, int port, int family, int flags);

If none of the flags is specified the client will connect to the remote machine and download the routing table from the machine. After that the client will connect to all ip adresses in that routing table. You can add any number of the adresses.

A number of the current connections can be defined via dnet_get_state() function. dnet_get_routes() function allows you to look up the routing table. If it turns out that the connection to any node have been unsuccessfull or the routing table haven’t been downloaded we will get the error message.

dnet_session structure

We need dnet_session structure for the futher work, it is similar to ioremap::elliptics::session class in C++.

dnet_session structure is a temporary object created for the implementation different kinds of operations. If user wants to read the data in some group he can create the new session then to add the groups (via dnet_add_group()) from which he wants to read the data and to call the function for reading (for example, dnet_read_data_wait()) and finally to send the current session in it.

You can add io-flags (in case you want to use read\write operations) and command flags (in case you want to use read\wtite the data or run the operations without the logs) at the session as well.

If we want to start writing\reading data or to call any commands with another parametres we should change the flags or the groups by using the functions (for example, dnet_session_set_groups()). Thus when the session is no longer needed we just kill it. Note that we still have dnet_node structure and that’s mean we are connected to the cluster anyway.

Node is a thread-safe object, so we can create one node for a common thread pool and after that we can create the separate session in each pool for reading from particular place or for writing at the particular place. An update of session is not a thread-safe object but anyway the temporary object is not supposed to be long-living.

dnet_node_create() function is passing to dnet_config structure.


dnet_config structure

This structure is necessary for creation of the client node and the server node.

dnet_config parametres:

  • family defines an address family: 2 - ipv4 (AF_INET), 10 - ipv6 (AF_INET6)

  • wait_timeout defines how long every blocking operation sleeps (by default timeout is equal to 5 seconds).

  • flags — flag’s description for backend can be found here (, flags not apply in the client part.

  • check_timeout is a timeout for the special dedicated thread. This timeout is checking the transactions that we can kill by timeout, checking the reconnects we can set, checking for the switched off network connections, etc.

  • stall_count defines how much transactions one after the other have to stopped for the current network connection was considered unsuccesfull and was terminated.

  • io_thread_num and nonblocking_io_thread_num. When we create a network connection we want to have a thread pool that will handle that network connection.

  • io_thread_num is a pool of many threads that select the commands from the line and start to process it. However, the threads process it with the locked corresponding keys (id of the command).

In other words, io_thread_num is a pool size that processing all commands by default. io_thread_num locks all operations between each other. That is two operations that can’t be processed at the same time with the same key. nonblocking_io_thread_num — is the size of the pool that handles the unlocked commands. We recommend to set an io_thread_num and a nonblocking_io_thread_num in numbers comparable to the number of processors at the system (8-16-32...).

  • net_thread_num is a thread pool that can read the data from the network.

  • temp_meta_env — here you need to write the path to the directory where you will store the temporary metadata for verification. This directory should have enough space.

  • history_env — the directory used to store the ids files that have the beginnings of the identifiers ranges this node have to store.

  • *ns — а namespace.

  • bg_ionice_class, bg_ionice_pri (relevant only for the backend) — the class and the priority for the input/output operation (man ionice)

  • removal_delay (measure in days). The data can be deleted in two steps: we delete the data but we keep metadata and leave the time stamp of the delete operation. After that we can run a test checking what kind of time stamp is there (the time stamp of delete or the time step of update). The test will choose the newest time stamp. If the newest time stamp is the time stamp of delete then we delete the data and leave a mark about it.

Metadata will be delete at the end of removal_delay. For example, if we set removal_delay equal to 1 day and we have two copies of the three removed and passed during this period (1 day) then we will remove metadata. If the removal wasn’t start at the third machine with the third copy (for example, the machine was turned off) and when the machine will be back online we will launch a recovery and we will restore those deleted data on two machines. Usually a removal_delay is selected for the long time period to ensure all of the nodes (which can store the replica) will be online and the data or the marks about deleted data were restored.

  • cookie [DNET_AUTH_COOKIE_SIZE] — the most simple authentication. We put a line that should be the same for all cluster nodes. If the node is trying to connect the cluster and its cookie are differnt from the cluster then the node will be unable to connect. That was done to avoid the configuration errors. For example, if the administrator wants to merge the clusters, it must have the same cookie.

  • server_prio — a connection between the servers; client_prio — a connection between the clients and the servers (man 7 socket for IP_PRIORITY - priorities are set for joined (server) and others (client) connections).

  • oplock_num — a table with the locks that are taken from the particular machine for performing locked operations. We recommend to set the large number at the parameter (for example, a thousand or more). The default value is 10 000.

  • srw_init_ctl — this structure recieves the configuration file (a path to the file that contains the server-side configuration)

  • cache_size — the maximum size in-memory cache LRU. If the size = 0 the cache won’t be exist. If the size != 0 then this is the maximum cache size in bytes that supports Elliptics.

dnet_config flags

  • DNET_CFG_JOIN_NETWORK — the flag indicates that the node connected to the cluster is not a regular client it’s a server which means it will be the part of the routing table, it will do any commands, will save the data itself, and so on. When a client set this flag accidentelly it means that actually it will become a server. So the clients usually do not set this flag.

  • DNET_CFG_NO_ROUTE_LIST flag means that we won’t download the routing table from the remote machines and we will be connected to one specific node. This is useful for gathering statistics.

  • DNET_CFG_MIX_STATES flag. Every network connection has the certain weight. We measure the time of input/output and the less time of the operation per byte then greater the weight. When we want to read data from the several copies with this flag we choose the group with the greater weight at first.

  • DNET_CFG_NO_CSUM flag relevant to the client and the server. When Elliptics receives the input/output command or the read\write command it’s not generates or checks the appropriate check-sum. If there is a check-sum equal to 0 it’s considered that there is no check-sum at all.

  • DNET_CFG_RANDOMIZE_STATES flag means that we randomly read from the different groups. If there are no flags (neither DNET_CFG_MIX_STATES, nor DNET_CFG_RANDOMIZE_STATES) we select the groups in the order in which they were added to the session.

These flags can be changed in run-time via dnet_io_client utility:

dnet_ioclient -r remote.addr:port:family -U0 -F new-flags

Description dnet_config flags can be found here. When a node is connected to a cluster, it announces the group where it is located. The group essentially is a replica. The keys at the group are distributed between the machines by the rules of distributed hash table.

dnet_log structure

struct dnet_log {

    int         log_level;
    void            *log_private;
    void        (* log)(void *priv, int level, const char *msg);


  • log_level — then higher log_level the greater an amount of the logs.
enum dnet_log_level {
    DNET_LOG_DATA = 0,
  • DNET_LOG_DATA = 0 means that we are outputting only the data.

  • DNET_LOG_ERROR writes the errors.

  • DNET_LOG_INFO has even more logs in particular written the completed transactions. It describes how long the operation took and what was the key.

  • В DNET_LOG_NOTICE describes where were the operations written, what operations were executed, etc.

  • In DNET_LOG_DEBUG written a lot of details about the transactions. It should be running when there is a problem that needed to be debugged.

  • log_private — the private client pointer that it would want to pass at the logging function dnet_log().

  • (* log) — the log_private pointer is passing as priv, the current level is passing as level by which data logger records, char *msg — the message itself.

dnet_backend_callbacks structure

dnet_backend_callbacks structure is only needed for the server. You can write your own backend that will handle all operations that are not will be processed by the core of Elliptics.


  • command_handler — the callback that process the commands;

  • void *state — the network connection;

  • void *priv — the private data created for that backend;

  • struct dnet_cmd *cmd — describes a command;

  • void *data — the data itself;

  • command_private — the private data that the backend want to keep and use for every command;

  • send is used to send the data. You have to write an id and state where we want to transfer an id info;

  • storage_stat fills the statistics.

  • backend_cleanup calls the personal cleanup for the structure that placed at command_private.

  • checksum — it’s a callback that somehow calculates the check-sum via given dnet_id. checksum cant’be bigger then *csize byte.


On the server side the cache is designed this way.

Cache performance is described here.

About an API side of cache you can read here.

Cache flags described here.

Server-side processing

Elliptics has an ability to process tha data on the server using Cocaine. In that case the client needs to send a request (DNET_CMD_EXEC command) to the server passing an id telling you exactly where the request should be processed. If an id is specified in a certain group the request will go to a specific machine. If an id is not specified — request will go to all the machines in each group.

In other words, there is a possibility in Elliptics like sending the request to the specific machine and sending the request by broadcasting (to all machines). From a technical point of view we have a client that sends a request to the server, the server sends a request to a server you want to and after that this server sends a request into Cocaine, which find it by the name of the needed application and opens a connection to send an event that has to be processed.

The client after recognizing the name of an event and the necessary data starts to process and after that it will send a reply to the first machine. If this machine don’t be able to process the task it can push a request to other machines that are can send the push even further and each of it can send this reply back.

Thus, the several machines can take a part in processing one event. When the data is fully processed the last worker of the chain can send to the client the final reply with the message like «data processing finished».

Synchronous part

If you need the synchronous calls as a wrapper over the asynchronous part you can use dnet_wait (a wrap over conditional variable): initialize it before the asynchronous call, then call dnet_wakeup() in asynchronous callback and after calling an asynchronous function just wait for the result using dnet_wait_event().

Asynchronous part

An asynchronous part consists of the control structure that was attached to the transaction (request-response: what data were sent and what data we expect to recieve). This control structure stores the collected data received by the client and conditional variables where the client and the other supporting arguments are «sleeping».

elliptics/api-c.txt · Last modified: 2014/06/02 17:30 by zbr