Tailslayer

Library for reducing tail latency in RAM reads

Generate Convert Improve

Install / Use

/learn @LaurieWired/Tailslayer

About this skill

Quality Score

0/100

README

Tailslayer

Tailslayer is a C++ library that reduces tail latency in RAM reads caused by DRAM refresh stalls.

It replicates data across multiple, independent DRAM channels with uncorrelated refresh schedules, using (undocumented!) channel scrambling offsets that works on AMD, Intel, and Graviton. Once the request comes in, Tailslayer issues hedged reads across all replicas, allowing the work to be performed on whichever result responds first.

Usage

The library code is available in hedged_reader.cpp and the example using the library can be found in tailslayer_example.cpp. To use it, copy include/tailslayer into your project and #include <tailslayer/hedged_reader.hpp>. The library currently works with two channels (updates to come!), but full N-way usage is available in the benchmark.

You provide the value type and two functions as template parameters:

Signal function: Add the loop that waits for the external signal. This determines when to read. Return the desired index to read, and the read immediately fires.
Final work function: This receives the value immediately after it is read. Add the desired value processing code here.

#include <tailslayer/hedged_reader.hpp>

[[gnu::always_inline]] inline std::size_t my_signal() {
    // Wait for your event, then return the index to read
    return index_to_read;
}

template <typename T>
[[gnu::always_inline]] inline void my_work(T val) {
    // Use the value
}

int main() {
    using T = uint8_t;
    tailslayer::pin_to_core(tailslayer::CORE_MAIN);

    tailslayer::HedgedReader<T, my_signal, my_work<T>> reader{};
    reader.insert(0x43);
    reader.insert(0x44);
    reader.start_workers();
}

Arguments can be passed to either function via ArgList:

tailslayer::HedgedReader<T, my_signal, my_work<T>,
    tailslayer::ArgList<1, 2>,   // args to signal function
    tailslayer::ArgList<2>       // args to final work function
> reader{};

You can also optionally pass in a different channel offset, channel bit, and number of replicas to the constructor. Note: Each insert copies the element N times where N is the number of replicas. It does the address calculation work on the backend, allowing tailslayer to act as a hedged vector that uses logical indices. Additionally, each replica is pinned to a separate core, and will spin on that core according to the signal function until the read happens.

Build the example

make
./tailslayer_example

Benchmarks and spike timing

The discovery/ directory contains supporting code used to characterize DRAM refresh behavior:

discovery/benchmark/: Channel-hedged read benchmark
discovery/trefi_probe.c: Spike timing probe for measuring the refresh cycle

cd discovery/benchmark
make
sudo chrt -f 99 ./hedged_read_cpp --all --channel-bit 8

Related Skills

node-connect

351.2k

Diagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps

frontend-design

110.6k

Create distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.

openai-whisper-api

351.2k

Transcribe audio via OpenAI Audio Transcriptions API (Whisper).

qqbot-media

351.2k

QQBot 富媒体收发能力。使用 <qqmedia> 标签，系统根据文件扩展名自动识别类型（图片/语音/视频/文件）。