<h1 align="center">Unbranded Cloud Serving Benchmark</h1> <h3 align="center"> Yahoo Cloud Serving Benchmark for NoSQL Databases<br/> Refactored and Extended with Batch and Range Queries<br/> </h3> <br/> <p align="center"> <a href="https://discord.gg/AxsU9mctAn"><img height="25" src="https://github.com/unum-cloud/ustore/raw/main/assets/icons/discord.svg" alt="Discord"></a>     <a href="https://www.linkedin.com/company/unum-cloud/"><img height="25" src="https://github.com/unum-cloud/ustore/raw/main/assets/icons/linkedin.svg" alt="LinkedIn"></a>     <a href="https://twitter.com/unum_cloud"><img height="25" src="https://github.com/unum-cloud/ustore/raw/main/assets/icons/twitter.svg" alt="Twitter"></a>     <a href="https://unum.cloud/post"><img height="25" src="https://github.com/unum-cloud/ustore/raw/main/assets/icons/blog.svg" alt="Blog"></a>     <a href="https://github.com/unum-cloud/ucset"><img height="25" src="https://github.com/unum-cloud/ustore/raw/main/assets/icons/github.svg" alt="GitHub"></a> </p>

Unum Cloud Serving Benchmark is the grandchild of Yahoo Cloud Serving Benchmark, reimplemented in C++, with fewer mutexes or other bottlenecks, and with additional "batch" and "range" workloads, crafted specifically for the Big Data age!

| | Present in YCSB | Present in UCSB | | :---------------------- | :-------------: | :-------------: | | Size of the dataset | ✅ | ✅ | | DB configuration files | ✅ | ✅ | | Workload specifications | ✅ | ✅ | | Tracking hardware usage | ❌ | ✅ | | Workload Isolation | ❌ | ✅ | | Concurrency | ❌ | ✅ | | Batch Operations | ❌ | ✅ | | Bulk Operations | ❌ | ✅ | | Support of Transactions | ❌ | ✅ |

As you may know, benchmarking databases is very complex. There is too much control flow to tune, so instead of learning the names of a thousand CLI arguments, you'd use a run.py script to launch the benchmarks. The outputs will be placed in the bench/results/ folder.

git clone https://github.com/unum-cloud/ucsb.git && cd ucsb && ./run.py

Supported Databases

Key-Value Stores and NoSQL databases differ in supported operations. Including the ones queried by UCSB, like "batch" operations. When batches aren't natively supported, we simulate them with multiple single-entry operations.

| | Bulk Scan | Batch Read | Batch Write | Integer Keys | | :--------------------- | :-------: | :--------: | :---------: | :----------: | | | | | | | | 💾 Embedded Databases | | | | | | WiredTiger | ✅ | ❌ | ❌ | ✅ | | LevelDB | ✅ | ❌ | ✅ | ❌ | | RocksDB | ✅ | ✅ | ✅ | ❓ | | LMDB | ✅ | ❌ | ❌ | ❌ | | UDisk | ✅ | ✅ | ✅ | ✅ | | | | | | | | 🖥️ Standalone Databases | | | | | | Redis | ❌ | ✅ | ✅ | ❌ | | MongoDB | ✅ | ✅ | ✅ | ✅ |

There is also asymmetry elsewhere:

WiredTiger supports fixed-size integer keys.
LevelDB only supports variable length keys and values.
RocksDB has minimal support for fixed_key_len, incompatible with BlockBasedTable.
UDisk supports both fixed-size keys and values.

Just like YCSB, we use 8-byte integer keys and 1000-byte values. Both WiredTiger and UDisk were configured to use integer keys natively. RocksDB wrapper reverts the order of bytes in keys to use the native comparator. None of the DBs was set to use fixed-size values, as only UDisk supports that.

Yet Another Benchmark?

Yes. In the DBMS world there are just 2 major benchmarks:

YCSB for NoSQL.
TPC for SQL.

With YCSB everything seems simple - clone the repo, pick a DBMS, run the benchmark. TPC suite seems more "enterprisey", and after a few years in the industry, I still don't understand the procedure. Moreover, most SQL databases these days are built on top of other NoSQL solutions, so NoSQL is more foundational. So naturally we used YCSB internally.

We were getting great numbers. All was fine until it wasn't. We looked under the hood and realized that the benchmark code itself was less efficient than the databases it was trying to evaluate, causing additional bottlenecks and affecting the measurements. So just like others, we decided to port it to C++, refactor it, and share with the world.

Preset Workloads

∅: imports monotonically increasing keys 🔄
A: 50% reads + 50% updates, all random
C: reads, all random
D: 95% reads + 5% inserts, all random
E: range scan 🔄
✗: batch read 🆕
Y: batch insert 🆕
Z: scans 🆕

The ∅ was previously implemented as one-by-one inserts, but some KVS support the external construction of its internal representation files. The E was previously mixed with 5% insertions.

Ways to Spoil a DBMS Benchmark

Unlike humans, ACID is one of the best things that can happen to DBMS 😁

Durability vs Write Speed

Like all good things, ACID is unreachable, because of at least one property - Durability. Absolute Durability is practically impossible and high Durability is expensive.

All high-performance DBs are designed as Log Structured Merge Trees. It's a design that essentially bans in-place file overwrites. Instead, it builds layers of immutable files arranged in a Tree-like order. The problem is that until you have enough content to populate an entire top-level file, you keep data in RAM - in structures often called MemTables.

LSM Tree

If the lights go off, volatile memory will be discarded. So a copy of every incoming write is generally appended to a Write-Ahead-Log (WAL). Two problems here:

You can't have a full write confirmation before appending to WAL. It's still a write to disk. A system call. A context switch to kernel space. Want to avoid it with io_uring or SPDK, then be ready to change all the above logic to work in an async manner, but fast enough not to create a new bottleneck. Hint: std::async will not cut it.
WAL is functionally stepping on the toes of a higher-level logic. Every wrapping DBMS, generally implements such mechanisms, so they disable WAL in KVS, to avoid extra stalls and replication. Example: Yugabyte is a port of Postgres to RocksDB and disables the embedded WAL.

We generally disable WAL and benchmark the core. Still, you can tweak all of that in the UCSB configuration files yourself.

Furthermore, as widely discussed, flushing the data still may not guarantee it's preservation on your SSD. So pick you ~~poison~~ hardware wisely and tune your benchmarks cautiously.

Strict vs Flexible RAM Limits

When users specify a RAM limit for a KVS, they expect all of the required in-memory state to fit into that many bytes. It would be too obvious for modern software, so here is one more problem.

Fast I/O is hard. The faster you want it, the more abstractions you will need to replace.

graph LR
    Application -->|libc| LIBC[Userspace Buffers]
    Application -->|mmap| PC[Page Cache]
    Application -->|mmap+O_DIRECT| BL[Block I/O Layer]
    Application -->|SPDK| DL[Device Layer]

    LIBC --> PC
    PC --> BL
    BL --> DL

Generally, OS keeps copies of the requested pages in RAM cache. To avoid it, enable O_DIRECT. It will slow down the app and would require some more engineering. For one, all the disk I/O will have to be aligned to page sizes, generally 4KB, which includes both the address in the file and the address in the userspace buffers. Split-loads should also be managed with an extra code on your side. So most KVS (except for UDisk, of course 😂) solutions don't bothe

Ucsb

Install / Use

README