SkillAgentSearch skills...

Jetstreamer

A Solana project geared towards realtime indexing, research, and backfilling with support for all epochs in the history of Solana mainnet, capable of 2.7M TPS+

Install / Use

/learn @anza-xyz/Jetstreamer
About this skill

Quality Score

0/100

Supported Platforms

Universal

README

Jetstreamer

Crates.io Docs.rs CI

Overview

Jetstreamer is a high-throughput Solana backfilling and research toolkit designed to stream historical chain data live over the network from Project Yellowstone's Old Faithful archive, which is a comprehensive open source archive of all Solana blocks and transactions from genesis to the current tip of the chain. Given the right hardware and network connection, Jetstreamer can stream data at over 2.7M TPS to a local Jetstreamer plugin or geyser plugin. Higher speeds are possible with better hardware (in our case 64 core CPU, 30 Gbps+ network for the 2.7M TPS record).

Jetstreamer exposes three companion crates:

  • jetstreamer – the primary facade that wires firehose ingestion into your plugins through JetstreamerRunner.
  • jetstreamer-firehose – async helpers for downloading, compacting, and replaying Old Faithful CAR archives at scale.
  • jetstreamer-plugin – a trait-based framework for building structured observers with ClickHouse-friendly batching and runtime metrics.
  • jetstreamer-utils - utils used by the Jetstreamer ecosystem.

Every crate ships with rich module-level documentation and runnable examples. Visit docs.rs/jetstreamer to explore the API surface in detail.

All 3 sub-crates are provided as re-exports within the main jetstreamer crate via the following re-exports:

  • jetstreamer::firehose
  • jetstreamer::plugin
  • jetstreamer::utils

Limitations

While Jetstreamer is able to play back all blocks, transactions, epochs, and rewards in the history of Solana mainnet, it is limited by what is in Old Faithful. Old Faithful does not contain account updates, so Jetstreamer at the moment also does not have them. Transaction logs are available in transaction_status_meta.log_messages for all epochs.

It is worth noting that the way Old Faithful and thus Jetstreamer stores transactions, they are stored in their "already-executed" state as they originally appeared to Geyser when they were first executed. Thus while Jetstreamer can replay ledger data, it is not executing transactions directly, and when we say 2.7M TPS, we mean "2.7M transactions processed by a Jetstreamer or Geyser plugin locally, streamed over the internet from the Old Faithful archive."

Quick Start

To get an idea of what Jetstreamer is capable of, you can try out the demo CLI that runs Jetstreamer Runner with the Program Tracking plugin enabled. Pass --with-plugin instruction-tracking (or repeat the flag to run both built-ins) to change the default set:

Jetstreamer Runner CLI

# Replay all transactions in epoch 800, using the default number of multiplexing threads based on your system
cargo run --release -- 800

# The same as above, but tuning network capacity for 10 Gbps, resulting in a higher number of multiplexing threads
JETSTREAMER_NETWORK_CAPACITY_MB=10000 cargo run --release -- 800

# Do the same but for slots 358560000 through 367631999, which is epoch 830-850 (slot ranges can be cross-epoch!)
# and using 8 threads explicitly instead of using automatic thread count
JETSTREAMER_THREADS=8 cargo run --release -- 358560000:367631999

# Replay epoch 800 with the instruction tracking plugin instead of the default
cargo run --release -- 800 --with-plugin instruction-tracking

If JETSTREAMER_THREADS is omitted, Jetstreamer auto-sizes the worker pool using the same hardware-aware heuristic exposed by jetstreamer_firehose::system::optimal_firehose_thread_count.

For sequential replay mode, enable --sequential (or JETSTREAMER_SEQUENTIAL=1). In this mode Jetstreamer uses a single firehose worker and reuses JETSTREAMER_THREADS as ripget parallel download concurrency:

# Sequential mode with CLI flag
JETSTREAMER_THREADS=4 cargo run --release -- 800 --sequential

# Sequential mode with explicit ripget window override
JETSTREAMER_SEQUENTIAL=1 JETSTREAMER_BUFFER_WINDOW=4GiB cargo run --release -- 800

JETSTREAMER_BUFFER_WINDOW defaults to min(4 GiB, 15% of available RAM) when unset.

The built-in program and instruction tracking plugins now record vote and non-vote activity separately. program_invocations includes an is_vote flag per row, while slot_instructions stores separate vote/non-vote instruction and transaction counts.

The CLI accepts either <start>:<end> slot ranges or a single epoch on the command line. See JetstreamerRunner::parse_cli_args for the precise rules.

ClickHouse Integration

Jetstreamer Runner has a built-in ClickHouse integration (by default a clickhouse server is spawned running out of the bin directory in the repo)

To manage the ClickHouse integration with ease, the following bundled Cargo aliases are provided when within the jetstreamer workspace:

cargo clickhouse-server
cargo clickhouse-client

cargo clickhouse-server launches the same ClickHouse binary that Jetstreamer Runner spawns in bin/, while cargo clickhouse-client connects to the local instance so you can inspect tables populated by the runner or plugin runner.

While Jetstreamer is running, you can use cargo clickhouse-client to connect directly to the ClickHouse instance that Jetstreamer has spawned. If you want to access data after a run has finished, you can run cargo clickhouse-server to bring up that server again using the data that is currently in the bin directory. It is also possible to copy a bin directory from one system to another as a way of migrating data.

Writing Jetstreamer Plugins

Jetstreamer Plugins are plugins that can be run by the Jetstreamer Runner.

Implement the Plugin trait to observe epoch/block/transaction/reward/entry events. The example below mirrors the crate-level documentation and demonstrates how to react to both transactions and blocks.

Note that Jetstreamer's firehose and underlying interface emits BlockData::PossibleLeaderSkipped events whenever it observes a slot gap. These represent either leader-skipped slots or blocks that have not arrived yet; when the real block eventually shows up, BlockData::Block will be emitted for it just like normal geyser streams.

Also note that because Jetstreamer spawns parallel threads that process different subranges of the overall slot range at the same time, while each thread sees a purely sequential view of transactions, downstream services such as databases that consume this data will see writes in a fairly arbitrary order, so you should design your database tables and shared data structures accordingly.

use std::sync::Arc;

use clickhouse::Client;
use jetstreamer::{
    JetstreamerRunner,
    firehose::firehose::{BlockData, TransactionData},
    firehose::epochs,
    plugin::{Plugin, PluginFuture},
};

struct LoggingPlugin;

impl Plugin for LoggingPlugin {
    fn name(&self) -> &'static str {
        "logging"
    }

    fn on_transaction<'a>(
        &'a self,
        _thread_id: usize,
        _db: Option<Arc<Client>>,
        tx: &'a TransactionData,
    ) -> PluginFuture<'a> {
        Box::pin(async move {
            println!("tx {} landed in slot {}", tx.signature, tx.slot);
            Ok(())
        })
    }

    fn on_block<'a>(
        &'a self,
        _thread_id: usize,
        _db: Option<Arc<Client>>,
        block: &'a BlockData,
    ) -> PluginFuture<'a> {
        Box::pin(async move {
            if block.was_skipped() {
                println!("slot {} was skipped", block.slot());
            } else {
                println!("processed block at slot {}", block.slot());
            }
            Ok(())
        })
    }
}

let (start_slot, end_inclusive) = epochs::epoch_to_slot_range(800);

JetstreamerRunner::new()
    .with_plugin(Box::new(LoggingPlugin))
    .with_threads(4)
    .with_slot_range_bounds(start_slot, end_inclusive + 1)
    .with_clickhouse_dsn("https://clickhouse.example.com")
    .run()
    .expect("runner completed");

If you prefer to configure Jetstreamer via the command line, keep using JetstreamerRunner::parse_cli_args to hydrate the runner from process arguments and environment variables.

When JETSTREAMER_CLICKHOUSE_MODE is auto (the default), Jetstreamer inspects the DSN to decide whether to launch the bundled ClickHouse helper or connect to an external cluster.

Alternate Archive Mirrors

Jetstreamer defaults to the public Old Faithful mirror (https://files.old-faithful.net), but the firehose can also stream CARs and compact indexes directly from authenticated S3-compatible storage. Configure the backend via the following environment variables:

  • JETSTREAMER_ARCHIVE_BACKEND (default http): set to s3 to force the S3 client.
  • JETSTREAMER_HTTP_BASE_URL: base URL or s3://bucket/prefix for CAR files.
  • JETSTREAMER_COMPACT_INDEX_BASE_URL: optional override for compact indexes (also accepts s3:// URIs).
  • JETSTREAMER_ARCHIVE_BASE: single knob that applies to both cars and indexes when the more specific variables are unset.
  • JETSTREAMER_S3_BUCKET, JETSTREAMER_S3_PREFIX, JETSTREAMER_S3_INDEX_PREFIX: bucket/prefix overrides when not encoded in the s3:// URL.
  • JETSTREAMER_S3_REGION and JETSTREAMER_S3_ENDPOINT: region plus optional custom endpoint (e.g. https://s3.eu-central-003.backblazeb2.com).
  • JETSTREAMER_S3_ACCESS_KEY, JETSTREAMER_S3_SECRET_KEY, JETSTREAMER_S3_SESSION_TOKEN: credentials used for signing requests (falls back to AWS standard env vars).

S3 support is compiled behind the s3-backend Cargo feature. Enable it when running or depending on jetstreamer

Related Skills

View on GitHub
GitHub Stars189
CategoryCustomer
Updated2d ago
Forks41

Languages

Rust

Security Score

95/100

Audited on Mar 29, 2026

No findings