Erlcass
High-Performance Erlang Cassandra driver based on DataStax cpp-driver
Install / Use
/learn @silviucpp/ErlcassREADME
ErlCass
An Erlang Cassandra driver, based on [DataStax cpp driver][1] focused on performance.
Note for v4.0.0
-
Starting with
erlcassversion v4.x the native driver is based on Datastax cpp-driver > 2.10.0 which is a massive release that includes many new features as well as architectural and performance improvements. -
Some cluster configs were removed while other configs were added. For more info please see the [Changelog][5].
-
This new version adds support for speculative execution: For certain applications it is of the utmost importance to minimize latency. Speculative execution is a way to minimize latency by preemptively executing several instances of the same query against different nodes. The fastest response is then returned to the client application, and the other requests are cancelled. Speculative execution is disabled by default. (see
speculative_execution_policy)
Update from 2.x to 3.0
This update breaks the compatibility with the other versions. All query results will return in case of success:
okinstead{ok, []}for all DDL and DML queries (because they never returns any column or row){ok, Columns, Rows}instead{ok, Rows}, where also each row is returned as a list not as a tuple as was before.
Implementation note
How ErlCass affects the Erlang schedulers
It's well-known that NIF's can affect the Erlang schedulers performances in case the functions are not returning in less than 1-2 ms and blocks the threads.
Because the DataStax cpp driver is async, ErlCass won't block the scheduler threads and all calls to the native
functions will return immediately. The DataStax driver use its own thread pool for managing the requests.
Also, the responses are received on these threads and sent back to Erlang calling processes using enif_send in
an async manner.
Features
List of supported features:
- Asynchronous API
- Synchronous API
- Simple, Prepared, and Batch statements
- [Avoid undesired tombstone while null binding][10] (only on protocol 4 or newer).
- Paged queries
- Asynchronous I/O, parallel execution, and request pipelining
- Connection pooling
- Automatic node discovery
- Automatic reconnection
- Configurable load balancing
- Works with any cluster size
- Authentication
- SSL
- Latency-aware routing
- Performance metrics
- Tuples and UDTs
- Nested collections
- Retry policies
- Support for materialized view and secondary index metadata
- Support for clustering key order,
frozen<>and Cassandra version metadata - Reverse DNS with SSL peer identity verification support
- Randomized contact points
- Speculative execution
Missing features from Datastax driver can be found into the [Todo List][9].
Benchmark comparing with other drivers
The benchmark (benchmarks/benchmark.erl) is spawning N processes that will send a total of X request using the async
api's and then waits to read X responses. In benchmarks/benchmark.config you can find the config's for every driver
used in tests. During test in case of unexpected results from driver will log errors in console.
To run the benchmark yourself you should do:
- change the cluster ip in
benchmark.configfor all drivers - run
make setup_benchmark(this will compile the app using the bench profile and create the necessary schema) - use
make benchmarkas described above
The following test was run on a Ubuntu 16.04 LTS (Intel(R) Core(TM) i5-2500 CPU @ 3.30GHz 4 cores) and the cassandra cluster was running on other 3
physical machines in the same LAN. The schema is created using prepare_load_test_table from benchmarks/load_test.erl.
Basically the schema contains all possible data types and the query is based on a primary key (will return the same
row all the time which is fine because we test the driver performances and not the server one)
To create schema:
make setup_benchmark
To run the benchmark:
make benchmark MODULE=erlcass PROCS=100 REQ=100000
Where:
MODULE: the driver used to benchmark. Can be one of :erlcassormarinaPROCS: the number or erlang processes used to send the requests (concurrency level). Default 100.REQ: the number of requests to be sent. Default 100000.
The results for 100 concurrent processes that sends 100k queries. Picked the average time from 3 runs:
| cassandra driver | Time (ms) | Req/sec | |:--------------------:| ---------:|---------:| | [erlcass][8] v4.0.0 | 947 | 105544 | | [marina][7] 0.3.5 | 2360 | 42369 |
Changelog
Changelog is available [here][5].
Getting started:
The application is compatible with both rebar or rebar3.
In case you receive any error related to compiling of the DataStax driver you can try to run rebar with sudo in
order to install all dependencies. Also you can check [wiki section][2] for more details
To speed up compilation, you can use ccache. Once installed, enable it by setting the ERLCASS_USE_CCACHE environment variable before running the build:
ERLCASS_USE_CCACHE=1 rebar3 compile
On subsequent builds of unchanged code, ccache will serve compiled objects from its cache, significantly reducing build times.
Data types
In order to see the relation between Cassandra column types and Erlang types please check this [wiki section][3]
Starting the application
application:start(erlcass).
Setting the log level
Erlcass is using OTP logger for logging the errors. Beside the fact that you can set in logger the desired log level,
for better performances it's better to set also in erlcass the desired level otherwise there will be a lot of
resources consumed for messages that are going to be dropped anyway. Also the native driver performances can decrease
because of the time spent in generating the logs and sending them from C++ into Erlang.
Available Log levels are:
-define(CASS_LOG_DISABLED, 0).
-define(CASS_LOG_CRITICAL, 1).
-define(CASS_LOG_ERROR, 2).
-define(CASS_LOG_WARN, 3). % default
-define(CASS_LOG_INFO, 4).
-define(CASS_LOG_DEBUG,5).
-define(CASS_LOG_TRACE, 6).
In order to change the log level for the native driver you need to set the log_level environment variable for
erlcass into your app config file, example: {log_level, 3}.
Setting the cluster options
The cluster options can be set inside your app.config file under the cluster_options key:
{erlcass, [
{log_level, 3},
{keyspace, <<"keyspace">>},
{cluster_options,[
{contact_points, <<"172.17.3.129,172.17.3.130,172.17.3.131">>},
{latency_aware_routing, true},
{token_aware_routing, true},
{number_threads_io, 4},
{queue_size_io, 128000},
{core_connections_host, 1},
{tcp_nodelay, true},
{tcp_keepalive, {true, 60}},
{connect_timeout, 5000},
{request_timeout, 5000},
{retry_policy, {default, true}},
{default_consistency_level, 6}
]}
]},
Tips for production environment:
- Use
token_aware_routingandlatency_aware_routing - Don't use
number_threads_iobigger than the number of your cores. - Use
tcp_nodelayand enabletcp_keepalive - Don't use large values for
core_connections_host. The driver is system call bound and performs better with less I/O threads and connections because it can batch a larger number of writes into a single system call (the driver will naturally attempt to coalesce these operations). You may want to reduce the number of I/O threads to 2 or 3 and reduce the core connections to 1 (default).
All available options are described in the following [wiki section][4].
Add a prepare statement
Example:
ok = erlcass:add_prepare_statement(select_blogpost,
<<"select * from blogposts where domain = ? LIMIT 1">>),
In case you want to overwrite the default consistency level for that prepare statement use a tuple for the
query argument: {Query, ConsistencyLevelHere}
Also this is possible using {Query, Options} where options is a proplist with the following options supported:
consistency_level- If it's missing the statement will be executed using the default consistency level value.serial_consistency_level- This consistency can only be either?CASS_CONSISTENCY_SERIALor?CASS_CONSISTENCY_LOCAL_SERIALand if not present, it defaults to?CASS_CONSISTENCY_SERIAL. This option will be ignored for anything else that a conditional update/insert.null_binding- Boolean (by defaulttrue). Provides a way to disable the null values binding. [Binding null values][10] will create undesired tombstone in cassandra.
Example:
ok = erlcass:add_prepare_statement(select_blogpost,
{<<"select * from blogposts where domain = ? LIMIT 1">>, ?CASS_CONSISTENCY_LOCAL_QUORUM}).
or
ok = erlcass:add_prepare_statement(insert_blogpost, {
<<"UPDATE blogposts SET author = ? WHERE domain = ? IF EXISTS">>, [
{consistency_level, ?CASS_CONSISTENCY_LOCAL_QUORUM},
{serial_consistency_level, ?CASS_CONSISTENCY_LOCAL_SERIAL}]
}).
Run a prepared statement query
You can bind the parameters in 2 ways: by name and by index. You can use ?BIND_BY_INDEX and ?BIND_BY_NAME from
execute/3 in order to specify the desired method. By default is binding by index.
Example:
%bind by name
erlcass:execute(select_blogpost, ?BIND_BY_NAME, [{<<"domain">>, <<"Domain_1">>}]).
%bind by index
erlcass:execute(select_blogpost, [<<"Domain_1">>]).
%bind by index
erlcass:execute(select_blogpost
Related Skills
node-connect
349.0kDiagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps
frontend-design
109.4kCreate distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.
openai-whisper-api
349.0kTranscribe audio via OpenAI Audio Transcriptions API (Whisper).
qqbot-media
349.0kQQBot 富媒体收发能力。使用 <qqmedia> 标签,系统根据文件扩展名自动识别类型(图片/语音/视频/文件)。
