Route table is essentially a control layer for network transport. Basically it is a set of node addresses and ID ranges which are handled by given node.
When Elliptics starts, it connects to all remote nodes specified in ‘remote = ‘ section of the config. Connect to remote node includes asking for ID ranges remote node maintains as well as its route table.
Route table may look like this (part of the log dump):
Server is now listening at 127.0.0.1:1026. 2012-08-23 22:29:26.917737 27694/27694 4: 2: 10fe9b14804c -> 127.0.0.1:1026 2012-08-23 22:29:26.917753 27694/27694 4: 2: 28901d6094ed -> 127.0.0.1:1025 2012-08-23 22:29:26.917767 27694/27694 4: 2: 2e03dfaa560c -> 127.0.0.1:1026 2012-08-23 22:29:26.917781 27694/27694 4: 2: 354782a7527a -> 127.0.0.1:1023
Where ’2:’ is a group (or replica set) id and hex strings next to group number are start IDs of ranges asscociated with node, which address is written at the end.
By default each Elliptics node connects to every node specified in ‘remote’ part of config. Then it downloads and merges theirs remote table and connects to every node it found there. This remote tables are periodically refreshed from every group (Elliptics randomly selects node from every group and asks its remote table) – route tables are downloaded, updated and new nodes are connected if needed.
Thus every client and server node is connected with every other node in cluster. And those connections are periodically checked.
Each node has own “ids” file which specifies IDs in route table for the node. At the beginning a node generates “ids” file based on the hard disk size. By default, if “ids” file is lost, the node will generate new one and node will take new ranges which are likely to be different. It may be caused by some hardware issues.
To prevent losing “ids” file, node can keep self “ids” file in elliptics cluster provided by “remotes”. At this case node will copy local “ids” file to elliptics cluster and restore it if local one is missed. If both local and elliptics cluster “ids” files are missed then new one will be generated. To make the node works such way the node config should uses DNET_CFG_KEEPS_IDS_IN_CLUSTER (6th bit) at flags in it's config file. By default this feature is turned off.
“ids” file for the node is kept in elliptics by key: “elliptics_node_ids_%address%:%port%” where %address% and %port% are the address and port of the node.
Using DNET_CFG_KEEPS_IDS_IN_CLUSTER may cause problem in specific case: The node 'NodeX' has ip 'IPX' and keeps his “ids” by key “elliptics_node_ids_IPX:PORT”. One fine day administrator has changed ip of 'NodeX' to 'IPXX' and the node has copied it's “ids” file to key “elliptics_node_ids_IPXX:port”. After some time administrator adds new empty node 'NodeY' and sets it's ip to 'IPX' which was belonging to 'NodeX'. So at the start 'NodeY' will loads old “ids” file of 'NodeX' and will use it for self routes.
TO PREVENT THE CASE YOU SHOULD REMOVE THE NODE “IDS” FILE FROM ELLIPTICS CLUSTER AFTER YOU HAVE CHANGED IT'S IP.
The node “ids” file can be removed by executing
dnet_ioclient -r host:port:family -g %groups% -u elliptics_node_ids%address%:%port%“ where %address% and %port% are the address and port of the node which “ids” file should be removed and %groups% is a list of elliptics cluster groups.
When client wants to send command it uses its in-memory route table to determine remote node. It is still possible that remote table is not yet properly updated when new command is being sent, in this case node which received this command may forward it to the server which has to handle this request (according to its route table).
Since process of route table refreshing and updating is continous in the whole cluster, it is rather quick to detect new nodes connected to subset of servers or some nodes dropped out of the cluster. This allows to add new servers without disruption of client connections and servers restart.
In large clusters it becomes quite boring to add new nodes: admin has to create config file, where number of remote nodes should be specified, and those nodes (or at least one of them) have to be alive. New node will update routing tables on remote servers, it will connect to other nodes and so on…
To simplify this process even further we implemented multicast autodiscovery. Every (configured to do this) node broadcasts information about itself, so that client can receive this and if authentification cookie matches, client will connect to those nodes. Multicast TTL is set to 3.
Using reserved ‘hostname’ word instead of local address (like ‘addr = hostname:1025:2‘) and this new feature (it is turned on by adding ‘autodiscovery:address:port:family’ string into list of remote nodes like ‘remote = autodiscovery:126.96.36.199:1025:2‘, one can fully eliminate need for admin to edit any configuration file for new nodes.