Act
Aerospike Certification Tool
Install / Use
/learn @aerospike/ActREADME
Aerospike Certification Tool (ACT)
This project is maintained by Aerospike
Overview
ACT provides a pair of programs for testing and certifying flash/SSD devices' performance for Aerospike Database data and index storage. ACT measures latency during a mixed load of read and write operations while modeling the Aerospike Database server's I/O pattern as closely as practical.
ACT allows you to test a single device or multiple devices, using your actual connector/controller hardware.
There are two programs: act_storage models Aeropike Database data storage I/O patterns, and act_index models Aeropike Database index storage I/O patterns for Aerospike Database's "All Flash" mode.
The purpose of this certification is:
- Determine if an SSD device(s) will stand up to the demands of a high-speed real-time database.
- Evaluate the upper limits of the throughput you can expect from a device(s).
Not all SSDs can handle the high volume of transactions required by high performance real-time databases like Aerospike Database. Many SSDs are rated for 100K+ reads/writes per second, but in production the actual load they can withstand for sustained periods of time is generally much lower. In the process of testing many common SSDs in high-throughput tests, Aerospike developed this certification tool, ACT, that you can use to test/certify an SSD for yourself.
We have found performance – especially latency – of SSDs to be highly dependent on the write load the SSD is subjected to. Over the first few hours of a test, performance can be excellent, but past the 4- to 12-hour mark (depending on the device), performance can suffer.
The ACT tool allows you to test an SSD device(s) for yourself. In addition, Aerospike has tested a variety of SSDs and has specific recommendations. For more information, visit the Aerospike Database documentation at: http://www.aerospike.com/docs/operations/plan/ssd/ssd_certification.html.
What ACT Does
By default, act_storage performs a combination of large (128K) block reads and writes and small (1.5K) block reads, simulating standard real-time Aerospike Database data read/write and defragmentation loads.
By default, act_index performs a mixture of 4K reads and writes, simulating standard real-time Aerospike Database "All Flash" index device loads.
Latencies are measured for a long enough period of time (typically 24 hours) to evaluate device stability and overall performance.
Traffic/Loading
You can simulate:
- "Nx" load - 1x load (2000 reads/sec and 1000 writes/sec per device) times N
- any other stress load or high-performance load (custom configurable)
Latency Rate Analysis
ACT's output shows latency broken into intervals of 2^n ms: 1, 2, 4, 8 ... ms (analysis program's display intervals are configurable).
For example, a test might indicate that 0.25% of requests failed to complete in 1 ms or less and 0.01% of requests failed to complete in 8 ms or less.
Methodology for act_storage
The small read operations model client read requests. Requests are done at the specified rate by a number of service threads.
The large-block read and write operations model the Aerospike server's write requests and defragmentation process. The operations occur at a rate determined by the specified write request rate, and are executed from one dedicated large-block read thread and one dedicated large-block write thread per device.
Methodology for act_index
The 4K device reads model index element access that occurs during client read and write requests, and defragmentation. One device read is executed on service threads for each client read, and for each client write. In addition, more reads are executed in "cache threads" to model index element access during defragmentation.
The "cache threads" also execute all the 4k device writes, which model index element changes due to client write requests and defragmentation.
Unlike the Aerospike Database "All Flash" mode, act_index does not mmap files in mounted directories on the devices - it models the raw device I/O pattern, assuming no caching benefit from mmap. Therefore to configure act_index we simply specify the devices.
Process for Certifying Device(s) for 30x Performance
In general, we recommend that you certify a device for 30x performance. Many devices do not pass the 30x certification. If you do not have a high-volume application, you may find that a 10x or 20x certification will be sufficient. The instructions below describe the 30x certification process, but you may need to adjust the test based on your requirements.
To certify a device(s) for 30x performance with Aerospike Database requires two stages:
- Test a single device to determine performance using the hardware configuration and connectors. The single-device certification will help you determine individual device performance.
- If you will be using multiple devices, you can then run ACT to test multiple devices to see how the results will be affected by the capacity of the bus or the throughput of the RAID controller that is managing your devices.
The test process with ACT is the same for both stages, but in the first stage you are testing a device and in the second stage, you are testing the linearity/scalability of your connector with multiple devices installed.
The single-device stage takes 48 hours. The multi-device stage takes an additional 48 hours.
The first stage is to certify a single device, to test the device itself and the connection.
Begin by installing your SSD device. Our website has more details about installing SSDs in different environments and configurations at http://www.aerospike.com/docs/operations/plan/ssd/ssd_setup.html.
Test 1: Test under high loads
Run ACT for 24 hrs using the 30x test (60000 reads/sec and 30000 writes/sec). The device passes this test if less than 5% of operations fail to complete in 1 ms or less.
Many devices fail this test and are unsuitable for use with Aerospike Database.
Test 2: Stress test to ensure the device does not fail under excessive loads
Run a 60x test for 24 hrs (120000 reads/sec and 60000 writes/sec). The device passes this test if ACT runs to completion, regardless of the error rate.
If you are testing a single device, then the device is certified when it passes Test 1 and Test 2.
The second stage is to certify multiple devices, to make sure that performance scales linearly when you add devices.
Install the additional SSDs to be tested. Our website has more details about installing SSDs in different environments and configurations at http://www.aerospike.com/docs/operations/plan/ssd/ssd_setup.html.
Test 3: Repeat Test 1, with all devices installed: Test under high loads
Run ACT for 24 hrs using the 30x test (60000 reads/sec and 30000 writes/sec per device). The devices pass this test if less than 5% of operations fail to complete in 1 ms or less.
Test 4: Repeat Test 2, with all devices installed: Stress test to ensure the devices do not fail under excessive loads
Run a 60x test for 24 hrs (120000 reads/sec and 60000 writes/sec per device). The devices pass this test if ACT runs to completion, regardless of the error rate.
The devices are certified if they pass Test 3 and Test 4.
Once the device(s) has been certified, the device can be used with Aerospike Database.
Determining Expected Performance at Higher Throughput
If your application is going to have high volumes of transactions and your device(s) passes the 30x certification, we recommend that you test your device to determine its upper limit on transaction processing latency. This will help you determine how many SSDs you will need to run your application when you are fully scaled up.
To certify a device(s) at higher levels of performance, do the certification process as described above, but use higher loads (80x, 100x, etc.). Test the device(s) at progressively higher rates until more than 5% of operations fail in 1 ms.
For example, if you test at 60x and less than 5% of operations fail to complete in 1 ms, re-run the test at 80x, etc. When the device completes the test at a particular speed with more than 5% of operations failing to complete in 1 ms (i.e., fails the test), then the device is certified at the next lower level where the device DOES have fewer than 5% of errors in under 1 ms.
If your device is testing well at higher loads, you may want to shorten the test time. Running ACT for six hours will give you a good idea whether your device can pass ACT testing at a given traffic volume. Before certifying your device at a given traffic level, we recommend a full 24-hour test.
As before, test a single device first, and then test with multiple devices to make sure that the performance scales linearly with your connector/controller.
Getting Started
Download the ACT package through git:
$ git clone https://github.com/aerospike/act.git
This creates an /act directory.
Alternately you can download the ZIP or TAR file from the links at the left. When you unpack/untar the file, it creates an /aerospike-act-<version> directory.
Install the Required Libraries
Before you can build ACT, you need to install some libraries.
For CentOS:
$ sudo yum install make gcc
For Debian or Ubuntu:
$ sudo apt-get install make gcc libc6-dev
Build the package.
$ cd act OR cd /aerospike-act-<version>
$ make
This will create 3 binaries in a target/bin directory:
-
act_prep: This executable prepares a device for ACT by writing zeroes on every sector of the disk and then filling it up with random data (salting). This simulates a normal production state.
-
act_storage: The exe
