SkillAgentSearch skills...

Yawndb

In-memory circular array database

Install / Use

/learn @selectel/Yawndb
About this skill

Quality Score

0/100

Supported Platforms

Universal

README

YAWNDB

Build Status

YAWNDB is an in-memory circular array database written in Erlang. This language is based upon the lightweight process model allowing to process large amounts of data without consuming extra system resources. In YAWNDB the data are stored in a circular buffer named a conveyor in YAWNDB terminology. All the incoming data are dispersed and put into the archives named buckets according to certain rules. A rule is a set of properties for a set of statistical data (i.e. data size, data collection frequency, etc.)

The data in YAWNDB are represented as triplets composed of time, value and key. The key (in YAWNDB terminology it is also called path) is a sequence of lowercase letters and numeric symbols defining on which conveyor the triplet will be put.

Our approach allows to gain the following benefits:

  • the obsolete data are deleted without excessive consumption of resources;
  • fixed memory usage for a fixed number of keys;
  • fixed access time for any records.

Architecture

YAWNDB is based on a circular array algorithm implemented in Ecirca library written in C. YAWNDB modules written in Erlang interact with Ecirca via native implemented functions (NIF). Data retention is provided by Bitcask application written in Erlang. The REST API is based on Cowboy web server. The data is written via the socket and read via the REST API interface.

Installation

Clone the repository:

$ git clone https://github.com/selectel/yawndb.git

and then execute the following commands:

$ cd yawndb
$ make all

then

$ ./start.sh

Copy the configuration file:

cp priv/yawndb.yml.example priv/yawndb.yml

or create an appropriate symlink.

Configuration

All YAWNDB settings are kept in the configuration file yawndb.yml.

Data Saving Settings

| Parameter | Type | Description | |----------------------|---------|------------------------------------------------------------------------------------| | flush_enabled | boolean | Enabling/disabling saving to disk | | flush_dir | string | Directory to save data | | flush_period | integer | Data saving frequency (in seconds) | | flush_late_threshold | float | Write to log if saving takes more than flush_period * flush_late_threshold seconds | | flush_cache | integer | Read cache | | flush_write_buffer | integer | Write cache | | flush_block_size | integer | Block size for saving to disk |

Obsolete data deletion settings

| Parameter | Type | Description | |----------------------|---------|---------------------------------------| | cleanup_enabled | boolean | Enable/disable obsolete data deletion | | cleanup_period | integer | Deletion frequency, in seconds |

User API settings

| Parameter | Type | Description | |----------------------|---------|---------------------------------------------------------| | listen_jsonapi_req | boolean | Enable/disable the user API | | jsonapi_iface | string | Specify which network interface that will be used | | jsonapi_port | integer | Specify which local port number to use | | jsonapi_prespawned | integer | Number of workers previously created (can be increased) |

Administrative API settings

| Parameter | Type | Description | |----------------------|---------|---------------------------------------------------------| | listen_tcpapi_req | boolean | Enabling/disabling the administrator API | | tcpapi_iface | string | Specify which network interface that will be used | | tcpapi_port | integer | Specify which local port number to use | | tcpapi_prespawned | integer | Number of workers previously created (can be increased) |

Data storing settings

| Parameter | Type | Description | |----------------------|---------|-------------------------------------| | max_slice | integer | Maximum size of the slice to choose | | rules | list | Rules regulating data retention |

Data storing rules

| Parameter | Type | Description | |----------------------|---------|---------------------------------------| | name | string | The name of the rule | | prefix | string | The prefix referred by the rule | | limit | integer | Conveyor size | | split | string | Define the way the value will be divided in case it does not match the bucket: proportional - the value will be proportionally divided between the two nearest buckets; equal - the value will be divided into halves between the two nearest buckets; forward/backward - the whole value wil be put into the previous (or the nearest) bucket | | type | string | Define the value to be saved in buckets: last - only the last value will be saved; max - the maximal incoming value will be saved; min - the minimal incoming value will be saved; sum - the sum of the incoming values will be saved; avg - the average value will be saved | | value_size | integer | Define the value size for a bucket (16, 32 и 64 bits), possible values: small, medium, large | additional_values | list | A list of "foobar: weak/strong" pairs from 0 to 14 symbols in length. Sometimes it is necessary to save the additional values not being numbers ('timeout', for example). The list of additional values is used for this purpose. The values can be either strong or weak. The weak values are updated if the bucket is updated while the strong values are never updated. | | autoremove | boolean | Enable/disable conveyor removal in case of data deterioration |

Get data API

The data are captured with the HTTP API. There are two types of API in YAWNDB

  • the administrative API and the user API. All API settings are specified in the configuration file in the sections Admin JSON API and User JSON API.

A typical API response is given in the following form:

{
    "status": "ok",
    "code": "ok",
    "answer": {}
}

The status field may take the values ok or error. The code field contains "ok" on success. The answer is displayed in the answer field. If an error occurs, its code will be displayed in the code field; the human-readable description of the error will be displayed in the answer field.

Get a list of rules for a selected path

GET /paths/:path/rules

Example of an answer:

{
    "status": "ok",
    "code": "ok",
    "answer": ["stat"]
}

Get a list of time-value pairs for a given period:

POST /paths/slice

The conveyors are indicated in the body of the request: paths=path1/rule,path2/rule2,path3/rule3.

Request parameters:

| Parameter | Type | Description | |-----------|---------|-----------------------------| | from | integer | Initial time, UTC timestamp | | to | integer | End time, UTC timestamp |

Example of an answer:

{
 "status": "ok",
 "code": "ok",
 "answer": [
    [63555098820, 10, 12],
    [63555098888, 16, "empty"],
    [63555098940, 12, 10],
    [63555099000, 3, 8],
    [63555099060, 10, 12],
    [63555099120, 2, 9],
    [63555099180, 7, 12],
    [63555099240, 3, 4],
    [63555099300, 20, 10],
    [63555099360, 2, 11]
 ]
}

Get a list of the last time-value pairs for a given period:

GET /paths/:path/:rule/slice

Request parameters:

| Parameter | Type | Description | |-----------|---------|----------------------------| | from | integer | Initial time UTC timestamp | | to | integer | End time, UTC timestamp |

Example of an answer:

{
 "status": "ok",
 "code": "ok",
 "answer": [
    [63555098820, 12],
    [63555098880, "empty"],
    [63555098940, 10],
    [63555099000, "empty"],
    [63555099060, 15],
    [63555099120, "empty"],
    [63555099180, 12],
    [63555099240, "empty"],
    [63555099300, 10],
    [63555099360, "empty"]
 ]
}

The answer contains a list of time-value pairs for a given period; if there is no value for a certain moment, the string empty is returned.

Get a list of the last time-value pairs for a conveyor:

Request parameters:

| Parameter | Type | Description | |-----------|---------|---------------------------| | n | integer | The number of last values |

Example of an answer:

{
 "status": "ok",
 "code": "ok",
 "answer": [
    [63555103140, 11],
    [63555103200, 10],
    [63555103260, 10],
    [63555103320, 12],
    [63555103380, 10],
    [63555103440, 10],
    [63555103500, 11],
    [63555103560, 10],
    [63555103620, 10],
    [63555103680, 14]
 ]
}

The answer contains a list of time-value pairs for a given period; if there is no value for a certain moment, the string empty is returned.

Administrative API

Get a server status:

GET /status

Example of an answer:

{
 "status": "ok",
 "code": "ok",
 "answer": {
    "read_rpm": 37,
    "write_rpm": 816042,
    "read_rps": 1,
    "write_rps": 13601,
    "processes_now": 1628
View on GitHub
GitHub Stars129
CategoryData
Updated12mo ago
Forks12

Languages

Erlang

Security Score

92/100

Audited on Apr 9, 2025

No findings