Reverbrain wiki

Site Tools


backrunner:backrunner

Backrunner

Backrunner is a new swiss-knife HTTP/HTTPS proxy for Elliptics distributed storage. It supports ACL, automatic bucket selection based on disk and network speed, errors, amount of free space, header extension, local static files handling and provides simple REST API for clients.

Available HTTP URI handlers can be found here: backrunner:uri

Building/installing instructions can be found in microtutorial.

Terminology

Bucket

Bucket is an object which contains metadata about how replicas should be arranged. Bucket has a symbolic name like b123. backrunner uses bucket names in URLs for simple mnemonics, but this object has a lot inside.

Here is a simple json used to create a bucket via bmeta tool described below

{
	"generic": {
		"acl": [
			{
				"user": "*",
				"token": "secure unused wildcard token",
				"flags": 1
			},
			{
				"user": "writer",
				"token": "secure token",
				"flags": 2
			}
		],
		"flags": 0,
		"max-size": 0,
		"max-key-num": 0
	},
	"buckets": {
		"b1": { "groups": [1,2] },
		"b2": { "groups": [3,4] },
		"b3": { "groups": [5,6] },
		"b4": { "groups": [7,8, 9] },
		"b5": { "groups": [10, 11, 12] }
	}
}

Following options are supported in bucket json file

groups a list of groups where copies of object to be written has to be placed or searched for. Basically, group is a single replica storage, having multiple groups means writing into given bucket will write multiple replicas in parallel into specified elliptics groups. You can find more about replication in elliptics here: Replication and recovery in Elliptics
acl access control list is a set of user-token-flags tuples, each of which desribes what given user is allowed to do. More on ACL is below.

ACL

Access control list is a set of user-token-flags tuples, each of which desribes what given user is allowed to do. Following tunables are supported

user username specified in Authorization header. If no header is specified, then wildcard * username is used
token secure token used to generate URL signature. If signature doesn't match, request will not be processed and 403 error is returned. Signature placed into Authorization header is generated according to the following rules: Authorization signature
flags flags specify how exactly requests for given user have to be processed, like is it allowed for given user to write data, should its token be checked and so on
ACL flags
0 reserved for read requests, flags equal to zero means given user can only read data
1 no further auth checks should be performed for given user, it is usually set for * wildcard user to specfify that reading can be done without authorization
2 given user can perform write requests (data upload and deletion), combining both flags 2+1 (setting 'flags value to 3) for some user allows him to write data without authorization
4 control tasks - unused now

ACL example above says that * wildcard user is allowed only to read data and it can be performed without authorization. User writer is also allowed to upload and delete data, but it has to provide authorization header generated using its token.

Bucket upload

When bucket control json has been created, bucket must be uploaded into elliptics storage. Backrunner will read bucket metadata from elliptics storage when new requests come from clients.

Main tool to operate with bucket metadata is bmeta tool, its build instructions can be found in Backrunner installation tutorial.

To perform bulk upload of all buckets metadata stored in file bucket.json into elliptics storage bmeta tool is used:

$ bmeta -config config.json -upload bucket.json

To read metadata for given bucket name use:

$ bmeta -config config.json -bucket b1 | grep groups
backrunner: 2015/10/22 19:07:31.629724 b1: version: 1, groups: [1 2], acl: [*:secure unused wildcard token:0x1 writer:secure token:0x2], flags: 0x0, max-size: 0, max-key-num: 0

config.json file is used to connect to elliptics cluster by backrunner and bmeta tools, it is described in the following section.

Backrunner config

Backrunner config contains 2 parts: elliptics config (used to connect backrunner and bmeta to the storage) and backrunner proxy config.

{
	"elliptics": {
		"log-file": "/dev/stdout",
		"log-level": "info",
		"log-prefix": "backrunner: ",
		"remote": [
			"address1:1025:2",
			"address2:1025:2",
			"address3:1025:2"
		],
		"metadata-groups": [1,2,3]
	},
	"proxy": {
		"address": "0.0.0.0:9090",
		"idle-timeout": 60,
		"free-space-ratio-soft": 0.2,
		"free-space-ratio-hard": 0.15,
		"bucket-update-interval": 30,
		"bucket-stat-update-interval": 5,
		"redirect-port": 8080,
		"redirect-token": "secure token to sign redirect request",
		"redirect-signature-timeout": 60,
		"redirect-root": "/srv",
		"headers": {
			"X-Ell-Some-Header": "this is a header data",
			"XXXXXXX": "YYYYYY",
			"Access-Control-Allow-Origin": "*"
		},
		"root": "/srv/elliptics",
		"https_address": "0.0.0.0:443",
		"cert_file": "/etc/elliptics/cert.pem",
		"key_file": "/etc/elliptics/key.pem",
		"content-types": {
			"flv" : "video/x-flv",
			"mp4" : "video/mp4",
			"mp3" : "audio/mpeg"
		},
		"reader-io-flags": 256,
		"writer-io-flags": 0
	}
}

Elliptics section

It describes elliptics connection of the backrunner proxy to elliptics storage

log-file log file to outpu elliptics and backrunner logs
log-level log level for elliptics messages, the higher the level is the more messages are printed, following levels are supported (in increasing verbosity order: error, warning, info, notice, debug)
log-prefix backrunner proxy log prefix to distinguish messages from elliptics client, which doesn't have this prefix
remote array of remote address in address:port:family format, where family is 2 for IPv4 and 10 for IPv6
metadata-groups elliptics groups where bucket metadata is stored

Backrunner proxy section

address address proxy uses to listen for incoming HTTP requests
idle-timeout timeout in seconds to keep HTTP connection alive, if client doesn't send or reads requests more than idle-timeout seconds, connection is being destroyed
free-space-ratio-soft, free-space-ratio-hard soft and hard limits for free space in bucket. When client uses /nobucket_upload/ handler, i.e. puts task of selecting where data has to be placed to the proxy, backrunner uses comprehensive analysis of all the buckets it has access to to select the fastest one. Backrunner uses write performance of the disks, amount of free space, network speed, disk/network/elliptics errors and other parameters to select bucket with the best storage nodes for given request. When amount of free space is less than hard limit, given bucket is not used for write requests at all. When amount of free space is less than soft limit, given bucket is penalized, but yet can be selected for given write operation.
bucket-update-interval number of seconds when bucket metadata is being updated. One can change secure token for example and it will take effect after bucket has been updated
bucket-stat-update-interval to understand what happens with underlying storage proxy has to request storage statistics once in a while, in particular amount of free space, it is not recommended to set this timeout to large number of seconds, 5-10 seconds is usually enough
headers a string-to-string map of headers added to any (including error) replies sent to client

Redirect part

redirect-port when /redirect/ URI handler is used, backrunner returns 302 redirect to given port on one of the storage nodes which hosts requested data
redirect-token secure token used to sign redirect URI (will be stored in headers)
redirect-signature-timeout number of seconds generated secure token is valid, this value will be used in signature generation and will be stored in headers
redirect-root part of the absolute path returned by the elliptics which will be cut off from the URL to prevent access outside of the storage data directory. redirect-root must match root option in eblob streaming module config - this is a root directory where data is stored, and also this part will be cut off from the URL. For example, if elliptics lookup says given key lives in /srv/elliptics/2/data/data-0.12 file, and redirect-root is set to /srv/elliptics, then redirect URL will only contain 2/data/data-0.12 part and streaming eblob module will recover correct path by adding its root to the given path.

Common static files handler

If no handler backrunner:uri prefix matched, then backrunner will return files from its local root directory, specified as root proxy config option. It is commonly used to put files like crossdomain.xml and the like. If root option is not specified and no handler prefix had matched, 503 error is returned.

HTTPS configuration

Backrunner proxy can listen both for http and https connections, this section describes how to setup secure service.

https_address address to listen for incoming https connections. It can be specified with or without common address option, if the same address is used for both options, backrunner will fail to start.
cert_file path to the certificate file. Please note that if the certificate is signed by a certificate authority, the cert_file should be the concatenation of the server's certificate followed by the CA's certificate. One can generate certificate file using generate_cert.go utility.
key_file path to the private key file.

Content-Type detection

By default Backrunner tries to detect content type by analyzing data it sends to client, it implements the algorithm described at http://mimesniff.spec.whatwg.org/ But it is not always possible to correctly detect the type, for example “audio/mpeg” falls into this category, probably because of MPEG patents. Anyway, backrunner has a hardcore option to set content type based on the requested key suffix.

content-types maps key suffix to content type strings, if there is a precise match, given content type is used, otherwise Backrunner tries to detect it. If detection fails, application/octet-stream is used.

Reader/Writer IO flags

reader-io-flags/writer-io-flags options specify Elliptics IO flags. For example '256' means 'do not perform checksum', in this example writer will always generate checksum and reader will not verify it. All magic constants here correspond to ORed constants from above Elliptics IO flag documentation.

backrunner/backrunner.txt · Last modified: 2017/03/12 17:19 by zbr