Reverbrain wiki

Site Tools


elliptics:client-tutorial

Table of Contents

Swarm and TheVoid

Swarm

Swarmis high-perfomance library for web crawling.

Depends on libcurl, uriparser, libxml2. Basically it is a library aimed at simplification of development of URL fetchers and various web crawlers. You will find its usage examples at github page.

TheVoid

TheVoid is its counterpart - library aimed at simplification of web backend development. One may think about thevoid as asynchronous multiplexing fastcgi.

This tutorial will tell you how to setup thevoid example application as standalone web server or as Nginx upstream, which will get/put files into Elliptics as well as storing secondary indexes and searching over them using json interface.

Installation

Neither swarm nor thevoid depend on elliptics, but example code does, so we will install it too. This tutorial works with Ubuntu Precise server. We will use packages for everything that has it. We assume that you added reverbrain ppa to your source list

# add-apt-repository ppa:reverbrain/testing
# apt-get update

1. Installing dependencies

# apt-get install debhelper cmake liburiparser-dev libcurl4-openssl-dev libxml2-dev \
  libev-dev libboost-system-dev libboost-thread-dev libboost-program-options-dev elliptics elliptics-dev

This also installs elliptics, although you only need elliptics-client for thevoid HTTP application. Our example will use client and server running on the same machine, so elliptics package (server package) is required.

Elliptics is built with server-side processing support. If you do not need it, you can built elliptics manually.

2. Installing thevoid and swarm

# apt-get install libthevoid-dev libswarm-dev

Please note that example applications which are shipped with swarm are not installed. We will grab swarm source code and manually build them.

3. Downloading and building swarm example applications

$ git clone http://github.com/reverbrain/swarm
$ cd swarm
$ cmake -DBUILD_EXAMPLES=ON .
$ make

4. Configuring thevoid and elliptics We will setup thevoid as standalone web server which is also capable of reading requests from unix socket, so that it could be accessed as Nginx upstream.

Example elliptics config: ioserv.conf.2 This example will start elliptics server on localhost and example.com, assuming that both are locally accesible addresses, i.e. server can bind to them. Server will use 1025 port. Logs will be written into stderr, beware!

Example swarm elliptics server config: swarm-elliptics.json This example will start quite high-performance web server on 8080 port and bound to /tmp/test.sock unix socket. You can specify either of them in Nginx config as upstream. You can also connect to 8080 port directly using wget or something.

Both configs are provided for educational purposes, so they write data to stdout/stderr, use '/opt/' in storage directory path and so on. Please create all directories needed in configs. Neither server will fork to background, so that you could check logs in runtime on your terminal - you can change that in config files of course. For example elliptics server config file with all supported options can be found at /usr/share/doc/elliptics/examples/ioserv.conf if elliptics package had been installed.

5. Starting servers Elliptics:

$ dnet_ioserv -c /path/to/ioserv.conf.2

Thevoid:

$ cd swarm
$ ./elliptics-server -c /path/to/swarm-elliptics.json

Swarm elliptics connector has been built at step 3 and named elliptics-server. It supports folowing commands:

/get?name=
/upload?name=
/ping
/find
/update

name= requires object name to be read or written. The last two commands work with secondary indexes - one can tag some object with multiple indexes and find indexes, which contain all or any of the requested objects.

6. Running IO commands

Read and write some data:

$ wget --post-data="This is a test data" -O /dev/stdout -S -nv example.com:8080/upload?name=test.txt
  HTTP/1.1 200 OK
  Content-Length: 0
  Content-Type: text/html
  Connection: Keep-Alive

$ wget -O /dev/stdout -S -q s33h.xxx.yandex.net:8080/get?name=test.txt
  HTTP/1.1 200 OK
  Content-Length: 19
  Content-Type: text/plain
  Last-Modified: Mon, 01 Jul 2013 22:30:40 GMT
  Connection: Keep-Alive
This is a test data

7. Testing secondary indexes

Elliptics support secondary indexes which can be also described as tags. For example, one can upload file named elliptics and attach multiple tags to it, for example distributed, fast and fault-tolerant. This means that special objects distributed and others will contain a note, that object elliptics was written with particular index or tag.

One can search for all objects, which have indexes distributed and fast for example. In our example such a search has to return object elliptics.

Let's try it in practice. Swarm example/ directory contains files example/update-example.json and example/find-example.json which do exactly the same.

7.1. Updating indexes Using file example/update-example.json we will update 3 indexes: fast, distributed and fault-tolerant - this update will 'put' info about object named elliptics into them. Elliptics allows to attach private data for each uploaded key and index. You can check this file to find out private data.

$ wget -q -O /dev/stdout -S --post-file=example/update-example.json example.com:8080/update
  HTTP/1.1 200 OK
  Content-Length: 0
  Content-Type: text/html
  Connection: Keep-Alive

7.2. Finding objects in multiple indexes Let's suppose we want to find out every object that is present either in indexes fast or distributed. This can be done by posting example file example/find-example.json like this:

$ wget -q -O /dev/stdout -S --post-file=example/find-example.json example.com:8080/find
  HTTP/1.1 200 OK
  Content-Length: 293
  Content-Type: text/json
  Connection: Keep-Alive
{
    "8638fc0c8c025c72ca8995c933898d0e0bb534da80f21c4dcd4c659ed1c4b568859a3db101b099738b113b8fefd0a06c3f9b68cebea7ca6c0a97b9c7ad800587": {
        "distributed": "data for tag 'distributed'",
        "fast": "this is some 'private' data only for key 'elliptics' and index/tag 'fast'"
    }
}

This results shows that obscure object 8638fc… is present in indexes fast and distributed.

Object named 8638fc… is actually elliptics name hashed using sha512. Elliptics does not use and does not store original names, but operates with 512-bit-long hashes. If you previously uploaded file named elliptics, then you can read it either by name or by id:

$ wget --post-data="Elliptics test content" -O /dev/stdout -S -q example.com:8080/upload?name=elliptics
  HTTP/1.1 200 OK
  Content-Length: 0
  Content-Type: text/html
  Connection: Keep-Alive

$ $ wget -O /dev/stdout -S -q example.com:8080/get?name=elliptics
  HTTP/1.1 200 OK
  Content-Length: 22
  Content-Type: text/plain
  Last-Modified: Mon, 01 Jul 2013 22:45:23 GMT
  Connection: Keep-Alive
Elliptics test content

$ wget -O /dev/stdout -S -q example.com:8080/get?id=8638fc0c8c025c72ca8995c933898d0e0bb534da80f21c4dcd4c659ed1c4b568859a3db101b099738b113b8fefd0a06c3f9b68cebea7ca6c0a97b9c7ad800587
  HTTP/1.1 200 OK
  Content-Length: 22
  Content-Type: text/plain
  Last-Modified: Mon, 01 Jul 2013 22:45:23 GMT
  Connection: Keep-Alive
Elliptics test content

7.3. Searching for all or any indexes Above example has shown how to find object which is present in 2 indexes - this test is actually aimed to find all objects which are present in fast OR in distributed indexes. If you want to find all objects which are present in fast AND distributed indexes at the same time you should edit example/find-example.json and set type to and.

8. Any questions?

Feel free to contact us via email: info@reverbrain.com

Or google group: https://groups.google.com/forum/#!forum/reverbrain

elliptics/client-tutorial.txt · Last modified: 2014/01/17 15:04 by masha