Sherman
Sherman: A Write-Optimized Distributed B+Tree Index on Disaggregated Memory
Install / Use
/learn @thustorage/ShermanREADME
Sherman: A Write-Optimized Distributed B+Tree Index on Disaggregated Memory
Sherman is a B+Tree on disaggregated memory; it uses one-sided RDMA verbs to perform all index operations. Sherman includes three techniques to boost write performance:
- A hierarchical locks leveraging on-chip memory of RDMA NICs.
- Coalescing dependent RDMA commands
- Two-level version layout in leaf nodes
For more details, please refer to our paper:
[SIGMOG'22] Sherman: A Write-Optimized Distributed B+Tree Index on Disaggregated Memory. Qing Wang and Youyou Lu and Jiwu Shu.
Update (2024.10)
Please use Deft for evaluation, which improving Sherman in performance and correct synchronization.
System Requirements
- Mellanox ConnectX-5 NICs and above
- RDMA Driver: MLNX_OFED_LINUX-4.7-3.2.9.0 (If you use MLNX_OFED_LINUX-5**, you should modify codes to resolve interface incompatibility)
- NIC Firmware: version 16.26.4012 and above (to support on-chip memory, you can use
ibstatto obtain the version) - memcached (to exchange QP information)
- cityhash
- boost 1.53 (to support
boost::coroutines::symmetric_coroutine)
Setup about RDMA Network
1. RDMA NIC Selection.
You can modify this line according the RDMA NIC you want to use, where ibv_get_device_name(deviceList[i]) is the name of RNIC (e.g., mlx5_0)
https://github.com/thustorage/Sherman/blob/9bb950887cd066ebf4f906edbb43bae8e728548d/src/rdma/Resource.cpp#L28
2. Gid Selection.
If you use RoCE, modify gidIndex in this line according to the shell command show_gids, which is usually 3.
https://github.com/thustorage/Sherman/blob/c5ee9d85e090006df39c0afe025c8f54756a7aea/include/Rdma.h#L60
3. MTU Selection.
If you use RoCE and the MTU of your NIC is not equal to 4200 (check with ifconfig), modify the value path_mtu in src/rdma/StateTrans.cpp
4. On-Chip Memory Size Selection.
Change the constant kLockChipMemSize in include/Commmon.h, making it <= max size of on-chip memory.
Getting Started
cd Sherman./script/hugepage.shto request huge pages from OS (use./script/clear_hugepage.shto return huge pages)mkdir build; cd build; cmake ..; make -jcp ../script/restartMemc.sh .- configure
../memcached.conf, where the 1st line is memcached IP, the 2nd is memcached port
For each run with kNodeCount servers:
./restartMemc.sh(to initialize memcached server)- In each server, execute
./benchmark kNodeCount kReadRatio kThreadCount
We emulate each server as one compute node and one memory node: In each server, as the compute node, we launch
kThreadCountclient threads; as the memory node, we launch one memory thread.kReadRatiois the ratio ofgetoperations.
In
./test/benchmark.cpp, we can modifykKeySpaceandzipfan, to generate different workloads. In addition, we can open the macroUSE_COROto bindkCoroCntcoroutine on each client thread.
Known bugs
- The two-level version may induce inconsistency in some concurrent cases. Refer to this SIGMOD'23 paper
TODO
- Re-write
deleteoperations
