Jetstreamer
A Solana project geared towards realtime indexing, research, and backfilling with support for all epochs in the history of Solana mainnet, capable of 2.7M TPS+
Install / Use
/learn @anza-xyz/JetstreamerREADME
Jetstreamer
Overview
Jetstreamer is a high-throughput Solana backfilling and research toolkit designed to stream historical chain data live over the network from Project Yellowstone's Old Faithful archive, which is a comprehensive open source archive of all Solana blocks and transactions from genesis to the current tip of the chain. Given the right hardware and network connection, Jetstreamer can stream data at over 2.7M TPS to a local Jetstreamer plugin or geyser plugin. Higher speeds are possible with better hardware (in our case 64 core CPU, 30 Gbps+ network for the 2.7M TPS record).
Jetstreamer exposes three companion crates:
jetstreamer– the primary facade that wires firehose ingestion into your plugins throughJetstreamerRunner.jetstreamer-firehose– async helpers for downloading, compacting, and replaying Old Faithful CAR archives at scale.jetstreamer-plugin– a trait-based framework for building structured observers with ClickHouse-friendly batching and runtime metrics.jetstreamer-utils- utils used by the Jetstreamer ecosystem.
Every crate ships with rich module-level documentation and runnable examples. Visit docs.rs/jetstreamer to explore the API surface in detail.
All 3 sub-crates are provided as re-exports within the main jetstreamer crate via the
following re-exports:
jetstreamer::firehosejetstreamer::pluginjetstreamer::utils
Limitations
While Jetstreamer is able to play back all blocks, transactions, epochs, and rewards in the
history of Solana mainnet, it is limited by what is in Old Faithful. Old Faithful does not
contain account updates, so Jetstreamer at the moment also does not have them. Transaction logs
are available in transaction_status_meta.log_messages for all epochs.
It is worth noting that the way Old Faithful and thus Jetstreamer stores transactions, they are stored in their "already-executed" state as they originally appeared to Geyser when they were first executed. Thus while Jetstreamer can replay ledger data, it is not executing transactions directly, and when we say 2.7M TPS, we mean "2.7M transactions processed by a Jetstreamer or Geyser plugin locally, streamed over the internet from the Old Faithful archive."
Quick Start
To get an idea of what Jetstreamer is capable of, you can try out the demo CLI that runs
Jetstreamer Runner with the Program Tracking plugin enabled. Pass --with-plugin instruction-tracking (or repeat the flag to run both built-ins) to change the default set:
Jetstreamer Runner CLI
# Replay all transactions in epoch 800, using the default number of multiplexing threads based on your system
cargo run --release -- 800
# The same as above, but tuning network capacity for 10 Gbps, resulting in a higher number of multiplexing threads
JETSTREAMER_NETWORK_CAPACITY_MB=10000 cargo run --release -- 800
# Do the same but for slots 358560000 through 367631999, which is epoch 830-850 (slot ranges can be cross-epoch!)
# and using 8 threads explicitly instead of using automatic thread count
JETSTREAMER_THREADS=8 cargo run --release -- 358560000:367631999
# Replay epoch 800 with the instruction tracking plugin instead of the default
cargo run --release -- 800 --with-plugin instruction-tracking
If JETSTREAMER_THREADS is omitted, Jetstreamer auto-sizes the worker pool using the same
hardware-aware heuristic exposed by
jetstreamer_firehose::system::optimal_firehose_thread_count.
For sequential replay mode, enable --sequential (or JETSTREAMER_SEQUENTIAL=1). In this mode
Jetstreamer uses a single firehose worker and reuses JETSTREAMER_THREADS as ripget parallel
download concurrency:
# Sequential mode with CLI flag
JETSTREAMER_THREADS=4 cargo run --release -- 800 --sequential
# Sequential mode with explicit ripget window override
JETSTREAMER_SEQUENTIAL=1 JETSTREAMER_BUFFER_WINDOW=4GiB cargo run --release -- 800
JETSTREAMER_BUFFER_WINDOW defaults to min(4 GiB, 15% of available RAM) when unset.
The built-in program and instruction tracking plugins now record vote and non-vote activity
separately. program_invocations includes an is_vote flag per row, while slot_instructions
stores separate vote/non-vote instruction and transaction counts.
The CLI accepts either <start>:<end> slot ranges or a single epoch on the command line. See
JetstreamerRunner::parse_cli_args
for the precise rules.
ClickHouse Integration
Jetstreamer Runner has a built-in ClickHouse integration (by default a clickhouse server is
spawned running out of the bin directory in the repo)
To manage the ClickHouse integration with ease, the following bundled Cargo aliases are
provided when within the jetstreamer workspace:
cargo clickhouse-server
cargo clickhouse-client
cargo clickhouse-server launches the same ClickHouse binary that Jetstreamer Runner spawns in
bin/, while cargo clickhouse-client connects to the local instance so you can inspect
tables populated by the runner or plugin runner.
While Jetstreamer is running, you can use cargo clickhouse-client to connect directly to the
ClickHouse instance that Jetstreamer has spawned. If you want to access data after a run has
finished, you can run cargo clickhouse-server to bring up that server again using the data
that is currently in the bin directory. It is also possible to copy a bin directory from
one system to another as a way of migrating data.
Writing Jetstreamer Plugins
Jetstreamer Plugins are plugins that can be run by the Jetstreamer Runner.
Implement the Plugin trait to observe epoch/block/transaction/reward/entry events. The
example below mirrors the crate-level documentation and demonstrates how to react to both
transactions and blocks.
Note that Jetstreamer's firehose and underlying interface emits BlockData::PossibleLeaderSkipped
events whenever it observes a slot gap. These represent either leader-skipped slots or blocks
that have not arrived yet; when the real block eventually shows up, BlockData::Block will be
emitted for it just like normal geyser streams.
Also note that because Jetstreamer spawns parallel threads that process different subranges of the overall slot range at the same time, while each thread sees a purely sequential view of transactions, downstream services such as databases that consume this data will see writes in a fairly arbitrary order, so you should design your database tables and shared data structures accordingly.
use std::sync::Arc;
use clickhouse::Client;
use jetstreamer::{
JetstreamerRunner,
firehose::firehose::{BlockData, TransactionData},
firehose::epochs,
plugin::{Plugin, PluginFuture},
};
struct LoggingPlugin;
impl Plugin for LoggingPlugin {
fn name(&self) -> &'static str {
"logging"
}
fn on_transaction<'a>(
&'a self,
_thread_id: usize,
_db: Option<Arc<Client>>,
tx: &'a TransactionData,
) -> PluginFuture<'a> {
Box::pin(async move {
println!("tx {} landed in slot {}", tx.signature, tx.slot);
Ok(())
})
}
fn on_block<'a>(
&'a self,
_thread_id: usize,
_db: Option<Arc<Client>>,
block: &'a BlockData,
) -> PluginFuture<'a> {
Box::pin(async move {
if block.was_skipped() {
println!("slot {} was skipped", block.slot());
} else {
println!("processed block at slot {}", block.slot());
}
Ok(())
})
}
}
let (start_slot, end_inclusive) = epochs::epoch_to_slot_range(800);
JetstreamerRunner::new()
.with_plugin(Box::new(LoggingPlugin))
.with_threads(4)
.with_slot_range_bounds(start_slot, end_inclusive + 1)
.with_clickhouse_dsn("https://clickhouse.example.com")
.run()
.expect("runner completed");
If you prefer to configure Jetstreamer via the command line, keep using
JetstreamerRunner::parse_cli_args to hydrate the runner from process arguments and
environment variables.
When JETSTREAMER_CLICKHOUSE_MODE is auto (the default), Jetstreamer inspects the DSN to
decide whether to launch the bundled ClickHouse helper or connect to an external cluster.
Alternate Archive Mirrors
Jetstreamer defaults to the public Old Faithful mirror (https://files.old-faithful.net), but
the firehose can also stream CARs and compact indexes directly from authenticated
S3-compatible storage. Configure the backend via the following environment variables:
JETSTREAMER_ARCHIVE_BACKEND(defaulthttp): set tos3to force the S3 client.JETSTREAMER_HTTP_BASE_URL: base URL ors3://bucket/prefixfor CAR files.JETSTREAMER_COMPACT_INDEX_BASE_URL: optional override for compact indexes (also acceptss3://URIs).JETSTREAMER_ARCHIVE_BASE: single knob that applies to both cars and indexes when the more specific variables are unset.JETSTREAMER_S3_BUCKET,JETSTREAMER_S3_PREFIX,JETSTREAMER_S3_INDEX_PREFIX: bucket/prefix overrides when not encoded in thes3://URL.JETSTREAMER_S3_REGIONandJETSTREAMER_S3_ENDPOINT: region plus optional custom endpoint (e.g.https://s3.eu-central-003.backblazeb2.com).JETSTREAMER_S3_ACCESS_KEY,JETSTREAMER_S3_SECRET_KEY,JETSTREAMER_S3_SESSION_TOKEN: credentials used for signing requests (falls back to AWS standard env vars).
S3 support is compiled behind the s3-backend Cargo feature. Enable it when running or
depending on jetstreamer
Related Skills
openhue
344.1kControl Philips Hue lights and scenes via the OpenHue CLI.
sag
344.1kElevenLabs text-to-speech with mac-style say UX.
weather
344.1kGet current weather and forecasts via wttr.in or Open-Meteo
tweakcc
1.5kCustomize Claude Code's system prompts, create custom toolsets, input pattern highlighters, themes/thinking verbs/spinners, customize input box & user message styling, support AGENTS.md, unlock private/unreleased features, and much more. Supports both native/npm installs on all platforms.
