Reverbrain wiki

Site Tools


rift:rift

Rift

Rift is HTTP frontend to elliptics with excessive set of features like S3-like buckets (each can have own groups to store data and ACLs), persistent caching and so on. It is built on top of thevoid HTTP framework. Although rift can be used as stand-alone web server, it is usually better to put it behind balancing web server like Nginx. You can check a small (a little bit outdated though) tutorial on how to setup former thevoid elliptics server behind Nginx.

Rift as well as thevoid is completely asynchronous and because of that server extensions might look complex to write, but actually they are not. Here is a documentation of the currently supported features. They are written as list of so called handlers - URI parts which are detected by the server, and then processed in the code.

Here is a small example (/get/ is handler here):

http://reverbrain.com/get/testns/example.txt

/get (in bold) here is a handler's ID (or name), it will receive whole URI and start doing its work. There are multiple matching algorithms supported by thevoid (including wildcard), but Rift supports only exact match. /get thus becomes quite different from /get-one or /get/ for example.

All operations supported by Rift (handlers documentation is below) are exposed via REST API.

Buckets, bucket directories, namespaces and authentication

Namespaces were created in Elliptics to support multiple objects with the same name from different users. For example several clients may want to upload file named example.txt into the storage, but they do not want to overwrite others data. Moreover they might not know that other users ever exist, but still want their own naming scheme.

Here come namespaces. Without namespace elliptics key is just sha512(name) of the written data. Multiple clients will overwrite the same object in the storage. Namespace is key extension which changes elliptics key to sha512(namespace + '\0' + name). Different key, different location, no clients data mix.

Namespaces are supported in Elliptics client API.

Bucket is Rift's control structure used to handle namespace support in HTTP. This is basically a control or metadata structure associated with the namespace.
Bucket not only changes elliptics key, but also performs authentication with full ACL support (if configured to do so). Bucket also contains list of Elliptics groups or replicas where your data will be stored.

Data access for given namespace requires bucket to be stored in elliptics. Bucket will be read, cached in rift sever and periodically (30 seconds by default) updated. Any operation which involves namespace ends up going through the bucket processing. For example object fetching (/get handler below) will check whether given bucket (namespace) exists, check authentication (this can be bypassed if bucket was configured to allow reads without auth check), generate elliptics key from namespace+object name and then read data from the storage. Object will be read from replicas specified in bucket (data-groups bucket parameter). If bucket metadata was not found in elliptics storage, request processing returns not-found error.

Bucket can be created using REST API, as well as modified and removed.

When you delete bucket, your keys are not deleted. If you create the same bucket again, all your keys will be available.

Bucket contains following metadata:

  • ACL - this is a list of the user/token/flags objects, where secure string named token, which is used in HMAC signature generation, is associated with user username plus additional auth flags. More on how signature is generated and checked can be found in signature section. If ACL is empty, authentication is always succeeded. If someone changed bucket and authentication token associated with some user, but there is cached data in rift server, authentication will fail, which will force rift server to reread bucket metadata from the storage and perform auth check again. If it fails, bad-request error is returned to client. It is possible to bypass security checks for GET and all requests.
  • Here are all ACL flags:
  • * bit 0 - given user can bypass all auth checks
  • * bit 1 - given user can write into this bucket, it is required to provide proper authorization if bit-0 flag is not set
  • * bit 2 - given user has admin rights - he can modify bucket metadata and ACL, it is required to provide proper authorization if bit-0 flag is not set
  • list of replicas (or groups) which should host data for given bucket
  • maximum object size (not supported yet)
  • maximum number of keys in the bucket (not supported yet)

All above options can be set through bucket REST API:

curl -H "Expect:" -H "Authorization: riftv1 user-with-admin-rights:token" --data-binary @example/bucket_create.json http://localhost:8080/update-bucket/directory_name/bucket_name
curl -H "Expect:" -H "Authorization: riftv1 user-with-admin-rights:token" --data "unused" http://localhost:8080/delete-bucket/directory_name/bucket_name

Bucket directories

One can create multiple buckets and 'logically' combine them into so called bucket directory. One can list bucket directory and get bucket metadata. When bucket is removed, it is also removed from bucket directory. Removing bucket directory does not remove objects stored in the buckets.

directory_name in the example above is a directory where bucket is created. Basically, directory_name is just a list containing all buckets you decided to put there.

Directory can be created using REST API:

curl -H "Expect:" --data-binary @example/bucket_directory.json http://localhost:8080/update-bucket-directory/bucket_directory_name
curl -H "Expect:" -H "Authorization: riftv1 user-with-admin-rights:token" --data "unused" http://localhost:8080/delete-bucket-directory/directory_name

Server-side ACL checks for each handler

Rift allows fine-grained access control to buckets based on ACL security option. As stated ACL is an access control list where each username is associated with secure token used to check Authorization header and auth flags which allow to create read-only users, to bypass auth checks and to maintain admin access (bucket updates).

ACLs are set during bucket creation in POSTed json or you can update them later via /update-bucket/ handler. ACL format is rather straightforward: user:secure-token:flags

When client issues a request (and ACL check is enabled), it must contain Authorization header in the following format: Authorization: riftv1 user:signature
Where riftv1 is one of the supported authorization method (there will be various S3 flavors soon), user is a user and signature is a secure hash of supported headers and URL, more details can be found in the appropriate authorization section.

Here is the whole state machine of the Rift's authentication checker (when bucket has been found and successfully read and parsed by the server):

  1. Authorization header is being looked for
  2. if auth header has been found, its auth method (riftv1 for example) is being checked, if there is no supported method with that name, forbidden is returned
  3. namespace/bucket with given name is being looked for (even if there is no auth header)
  4. if there is no bucket with given name (even in elliptics storage), forbidden error is returned, otherwise system checks the bucket
  5. if group list is empty, not found error is returned
  6. is ACL is empty, ok is returned - there is nothing to check against
  7. if there is no Authorization header, system considers user to be * wildcard
  8. user (the one from auth header or wildcard) is being searched in ACL, if no match has been found, forbidden error is returned
  9. if ACL contains 'no-auth' bit (bit 0) for given user, ok is returned
  10. if Authorization header does not contain secure signature (header format for example: riftv1 user:signature), unauthorized error is returned
  11. security data in Authorization header is being checked using secure token found in ACL entry - if auth data mismatch, forbidden is returned
  12. ok is returned

Result of this check can be found in log with verdict: prefix in ERROR/INFO (1/2) log level and higher. There will be some additional security checks, which may prevent user access, for example bucket reading requires admin rights and write permission.

Global auth configuration

Authentication can be globally bypassed if rift server config does not include bucket section.

Whole rift server configuration can be found in the appropriate section. Its application section intersects with elliptics server config options, although contains additional parameters.

Persistent caching

Rift allows you to store popular content into separate groups for caching. This is quite different from elliptics cache where data is stored in memory in segmented LRU lists. Persistent caching allows you to temporarily put your data into additional elliptics groups, which will serve IO requests. This is usually very useful for heavy content like big images or audio/video files, which are rather expensive to put into memory cache.

One can update list of objects to be cached as well as per-object list of additional groups. There is a cache.py with excessive help to work with cached keys. This tool will grab requested key from source groups and put them into caching groups as well as update special elliptics cache list object which is periodically (timeout option in cache configuration block of the Rift) checked by Rift. As soon as Rift found new keys in elliptics cache list object, it will start serving IO from those cached groups too as well as from original groups specified in the bucket. When using cache.py tool please note that its file-namespace option is actually a bucket name.

To remove object from cache one should use the same cache.py tool - it will remove data from caching groups (please note that physically removing objects from disk in elliptics may require running online eblob defragmentation) and update special elliptics cache list object. This object will be reread sometime in the future, so if requested key can not be found in cache, it will be automatically served from original bucket groups.

Timeouts

Rift supports separate read/write timeouts (in seconds) for client-server transactions. Read timeouts are also set for lookup/download-info/redirect requests. When buffered (chunked or /big/ URI handlers) IO is used, this timeout it set per chunk.

Read mechanics inside elliptics will switch between replicas when one or more groups do not have data or are not accessible. If group times out, client will wait read-timeout seconds to determine that. If second group times out client will wait another 'read-timeout' seconds and so on, this effectively increases client request timeout by number of groups.

Write happens in parallel, which means client will receive some reply in 'write-timeout' seconds.

URI handlers

All handlers support following format:

/handler_name/bucket_name/key?optional=params

where bucket_name - bucket name - your unique name which allows to differentiate files with the same names but from different users. Different buckets but the same name ends up with different elliptics keys.

If buckets are not turned on in config (there is no bucket section) URLs must be written in the following format:

/handler_name/key?optional=params

Most handlers support following parameter:

  • ioflags= - elliptics IO bitmap flags, for example, cache request is 1024, cache-only request is 3072 and so on
  • cflags= - elliptics command flags (rarely used)
  • trace_id= - 63-bits long ID which will be sent with all elliptics commands and will be found in all elliptics client AND server logs, the highest 64'th bit says elliptics client and server to turn on full debugging log for given transaction

Example:

$ curl -d "this is a test example" "http://example.com/upload/testns/xxx.jpg"
$ curl "http://example.com/get/testns/xxx.jpg"

/get/

This handle is invoked to standard GET request and its job is to read file from the storage. It supports following URI parameters:

  • size/offset= - read specified number of bytes from specified offset within file. Zero size means read everything (from given offset to the end of the file)

HTTP Range header is fully supported according to RFC-2616:

  • Range: bytes=0-49 to request first 50 bytes
  • Range: bytes=-50 to request 50 last bytes
  • Range: bytes=0-49,60-79 to request first 50 bytes and 20 bytes starting with 60th's one

/upload/

This handler is invoked to POST request. Following parameters are supported:

  • offset= - write data at specified offset into remote object
  • prepare=/plain-write/commit= - if one of those flags is specified, then appropriate part of IO operation is started.
    • prepare= reserves specified number of bytes in continuous region on disk, key can not be read at this stage
    • commit= commits specified number of bytes into backend indexes. This step may also invoke on-disk checksum calculation and so on, this depends on elliptics low-level backend configuration. Only when this step has been completed, object can be read from the storage.
    • plain-write specifies that usual write operation should not commit updates into backend indexes. All IO operations are atomic on backend (within single group). If mixed with usual write (without plain-write flags), partially written data will be committed to disk, since usual write invokes commit operation inside, so this is a bad idea.

/delete/

Removes key from the storage and from associated with given key indexes.

/download-info/

This handler is used for getting actual information where data is located on the storage servers. It's reply is JSON file with following structure:

{
    "id": "8a5f4640935...",
    "csum": "a15cf7eee7ba4fd90f...",
    "filename": "/tmp/blob3/data-0.0",
    "size": 9,
    "offset-within-data-file": 144,
    "mtime": {
        "time": "2013-12-05 MSK 19:40:35.731166",
        "time-raw": "1386258035.731166"
    },
    "server": "127.0.0.1:1027",
    "signature": "45ead3507828...",
    "url": "http://127.0.0.1/tmp/blob3/data-0.0%3A144%3A9"
    "time": "1386258035"
}

Where:

  • time - current unix timestamp at the server
  • signature - signature needed to download file directly from storage's machine by nginx
  • id - hex-encoded 64-bytes id of entry
  • csum - checksum of file data as result of sha512
  • filename - file where file is really stored on the server, it may be omitted if storage backend doesn't support it
  • size - size of file entry
  • offset-within-data-file - offset of entry's data in filename from start
  • mtime - unix timestamp of last entry's modification
  • sever - address of server where file is stored
  • url - URL to Nginx web-server with enabled streaming from elliptics, it contains offset:size part after the path

/redirect/

This handler lookups where file is really situated and redirects to appropriate storage server's nginx to directly download the file from it. Support for HTTP headers like Range is done by Nginx.

You might be interested in the Nginx streaming from eblob tutorial - redirect and download-info handlers are used there to find out exact data location on disk, which is then being streamed directly to user by Nginx in a p2p manner.

/echo/

This handler just send reply with 200 HTTP code and same headers as were received from request.

/ping/

This handler simply reply with 200 HTTP code. That's all.

/stat/

This handler returns excessive elliptics statistics, which can be used by higher level applications for smart IO routing.

rift/rift.txt ยท Last modified: 2014/07/25 21:43 by zbr