Reverbrain wiki

Site Tools


blueprints:elliptics:recovery

Info

Assignee: rbtz@

Status: TESTING

DependsOn: metadata, iterators

WIP branches

Summary

Create brand-new recovery mechanism for elliptics that is efficient, simple and easy to use.

Whiteboard

This task is split into three separate tasks:

  • Implement new metadata

Historically metadata (timestamp) is stored in separate eblob or in different eblob column (type). Now we'll be storing metadata along with data. So for all new records we'll add fixed sized header before data itself that stores metadata.

  • Implement new iterators.

Elliptics should accept 'start iterator' command that runs iterator for given key/timestamp range and returns metadata (possibly along with data).

  • Implement new recovery.

Using new metadata and iterators we should be able to implement we recovery mechanism. It should be efficient and admin-friendly.

Technical information

Recovery will proceed in following steps:

  • Get routing table.
  • Compute key ranges that node-to-be-recovered (local node) have stolen from it's neighbors (remote nodes).
  • For each range:
    • Run two iterators for each key range - local and remote one.
    • Sort both iterator results.
    • Compute difference between two results (w.r.t. timestamps). Difference consists of keys that are present in remote node but not present (or older by timestamp) in local node.
    • Restore computed difference using bulk_read/bulk_write operations.
  • Show statistics, exit.

New recovery will be implemented as simple python package.

Work Items

  • New metadata
  • New iterators
  • New recovery script

Future plans

For quite a while now we have cool server-side scripting framework in elliptics called cocaine. Using it we can create more flexible recovery tool outside of elliptics main codebase.

blueprints/elliptics/recovery.txt ยท Last modified: 2013/08/02 08:01 by savetherbtz