SkillAgentSearch skills...

Tailslayer

Library for reducing tail latency in RAM reads

Install / Use

/learn @LaurieWired/Tailslayer
About this skill

Quality Score

0/100

Supported Platforms

Universal

README

License GitHub stars GitHub forks GitHub contributors Follow @lauriewired

<img width="2490" height="1148" alt="tailslayer3" src="https://github.com/user-attachments/assets/35d6cd98-ab9d-4ea6-8804-dff3a1b8698b" />

Tailslayer

Tailslayer is a C++ library that reduces tail latency in RAM reads caused by DRAM refresh stalls.

It replicates data across multiple, independent DRAM channels with uncorrelated refresh schedules, using (undocumented!) channel scrambling offsets that works on AMD, Intel, and Graviton. Once the request comes in, Tailslayer issues hedged reads across all replicas, allowing the work to be performed on whichever result responds first.

<img width="4986" height="2796" alt="cross_platform_nway" src="https://github.com/user-attachments/assets/4b4a5614-00e4-4845-8a4b-f4adecef5b4d" />

Usage

The library code is available in hedged_reader.cpp and the example using the library can be found in tailslayer_example.cpp. To use it, copy include/tailslayer into your project and #include <tailslayer/hedged_reader.hpp>. The library currently works with two channels (updates to come!), but full N-way usage is available in the benchmark.

You provide the value type and two functions as template parameters:

  1. Signal function: Add the loop that waits for the external signal. This determines when to read. Return the desired index to read, and the read immediately fires.
  2. Final work function: This receives the value immediately after it is read. Add the desired value processing code here.
#include <tailslayer/hedged_reader.hpp>

[[gnu::always_inline]] inline std::size_t my_signal() {
    // Wait for your event, then return the index to read
    return index_to_read;
}

template <typename T>
[[gnu::always_inline]] inline void my_work(T val) {
    // Use the value
}

int main() {
    using T = uint8_t;
    tailslayer::pin_to_core(tailslayer::CORE_MAIN);

    tailslayer::HedgedReader<T, my_signal, my_work<T>> reader{};
    reader.insert(0x43);
    reader.insert(0x44);
    reader.start_workers();
}

Arguments can be passed to either function via ArgList:

tailslayer::HedgedReader<T, my_signal, my_work<T>,
    tailslayer::ArgList<1, 2>,   // args to signal function
    tailslayer::ArgList<2>       // args to final work function
> reader{};

You can also optionally pass in a different channel offset, channel bit, and number of replicas to the constructor. Note: Each insert copies the element N times where N is the number of replicas. It does the address calculation work on the backend, allowing tailslayer to act as a hedged vector that uses logical indices. Additionally, each replica is pinned to a separate core, and will spin on that core according to the signal function until the read happens.

Build the example

make
./tailslayer_example

Benchmarks and spike timing

The discovery/ directory contains supporting code used to characterize DRAM refresh behavior:

  • discovery/benchmark/: Channel-hedged read benchmark
  • discovery/trefi_probe.c: Spike timing probe for measuring the refresh cycle
cd discovery/benchmark
make
sudo chrt -f 99 ./hedged_read_cpp --all --channel-bit 8

Related Skills

View on GitHub
GitHub Stars523
CategoryDevelopment
Updated1m ago
Forks27

Languages

C++

Security Score

95/100

Audited on Apr 8, 2026

No findings