SkillAgentSearch skills...

Tonbo

Tonbo is an embedded database for serverless and edge runtimes.

Install / Use

/learn @tonbo-io/Tonbo
About this skill

Quality Score

0/100

Supported Platforms

Universal

README

<p align="left"> <a href="https://tonbo.io"> <picture> <img width="680" src="https://github.com/user-attachments/assets/f6949c40-012f-425e-8ad8-6b3fe11ce672" /> </picture> </a> </p>

Tonbo

crates.io crates.io docs.rs ci discord

Website | Rust Doc | Blog | Community

Tonbo is an embedded database for serverless and edge runtimes. Your data is stored as Parquet on S3, coordination happens through a manifest, and compute stays fully stateless.

Why Tonbo?

Serverless compute is stateless, but your data isn't. Tonbo bridges this gap:

  • Async-first: The entire storage and query engine is fully async, built for serverless and edge environments.
  • No server to manage: Data lives on S3, coordination via manifest, compute is stateless
  • Arrow-native: Define rich data type, declarative schemas, query with zero-copy RecordBatch
  • Runs anywhere: Tokio, WASM, edge runtimes, or as a storage engine for building your own data infrastructure.
  • Open formats: Standard Parquet files readable by any tool

When to use Tonbo?

  • Build serverless or edge applications that need a durable state layer without running a database.
  • Store append-heavy or event-like data directly in S3 and query it with low overhead.
  • Embed a lightweight MVCC + Parquet storage engine inside your own data infrastructure.
  • Run workloads in WASM or Cloudflare Workers that require structured persistence.

Quick Start

use tonbo::{db::{AwsCreds, ObjectSpec, S3Spec}, prelude::*};

#[derive(Record)]
struct User {
    #[metadata(k = "tonbo.key", v = "true")]
    id: String,
    name: String,
    score: Option<i64>,
}

// Open on S3
let s3 = S3Spec::new("my-bucket", "data/users", AwsCreds::from_env()?);
let db = DbBuilder::from_schema(User::schema())?
    .object_store(ObjectSpec::s3(s3))?.open().await?;

// Insert
let users = vec![User { id: "u1".into(), name: "Alice".into(), score: Some(100) }];
let mut builders = User::new_builders(users.len());
builders.append_rows(users);
db.ingest(builders.finish().into_record_batch()).await?;

// Query
let filter = Expr::gt("score", ScalarValue::from(80_i64));
let results = db.scan().filter(filter).collect().await?;

For local development, use .on_disk("/tmp/users")? instead. See examples/ for more.

Installation

cargo add tonbo@0.4.0-a0 tokio

Or add to Cargo.toml:

[dependencies]
tonbo = "0.4.0-a0"
tokio = { version = "1", features = ["rt-multi-thread", "macros"] }

Examples

Run with cargo run --example <name>:

  • 01_basic: Define schema, insert, and query in 30 lines
  • 02_transaction: MVCC transactions with upsert, delete, and read-your-writes
  • 02b_snapshot: Consistent point-in-time reads while writes continue
  • 03_filter: Predicates: eq, gt, in, is_null, and, or, not
  • 04_s3: Store Parquet files on S3/R2/MinIO with zero server config
  • 05_scan_options: Projection pushdown reads only the columns you need
  • 06_composite_key: Multi-column keys for time-series and partitioned data
  • 07_streaming: Process millions of rows without loading into memory
  • 08_nested_types: Deep struct nesting + Lists stored as Arrow StructArray
  • 09_time_travel: Query historical snapshots via MVCC timestamps
  • 10_dynamic: Dynamic schemas without #[derive(Record)] (basic, metadata, composite, transactions)
  • 11_observability: Tracing spans and structured logging with OpenTelemetry

Architecture

Tonbo implements a merge-tree optimized for object storage: writes go to WAL → MemTable → Parquet SSTables, with MVCC for snapshot isolation and a manifest for coordination via compare-and-swap:

  • Stateless compute: A worker only needs to read and update the manifest; no long-lived coordinator is required.
  • Object storage CAS: The manifest is committed using compare-and-swap on S3, so any function can safely participate in commits.
  • Immutable data: Data files are write-once Parquet SSTables, which matches the strengths of S3 and other object stores.

See docs/overview.md for the full design.

Observability

Tonbo uses the tracing crate for structured logging and distributed tracing. As a library, Tonbo emits spans and events but never initializes a global subscriber - your application controls the observability configuration.

Quick setup for development:

tracing_subscriber::fmt()
    .with_env_filter("info,tonbo=debug")
    .init();

Production with JSON output:

tracing_subscriber::fmt()
    .json()
    .with_env_filter("info,tonbo=info")
    .init();

OpenTelemetry integration:

use tracing_subscriber::{layer::SubscriberExt, util::SubscriberInitExt};

let tracer = opentelemetry_otlp::new_pipeline()
    .tracing()
    .install_batch(opentelemetry_sdk::runtime::Tokio)?;

tracing_subscriber::registry()
    .with(tracing_opentelemetry::layer().with_tracer(tracer))
    .with(tracing_subscriber::fmt::layer())
    .init();

See examples/11_observability.rs for complete examples and docs/rfcs/0012-logs-traces.md for design details.

Documentation

Development

Coverage

Install coverage tooling once:

rustup component add llvm-tools-preview
cargo install cargo-llvm-cov --version 0.6.12 --locked

Run coverage locally:

cargo llvm-cov --workspace --lcov --output-path lcov.info --summary

Generate an HTML report:

cargo llvm-cov --workspace --html

Compaction Benchmark (Local FS)

Default run:

cargo bench --bench compaction_local

Example tuned run:

TONBO_COMPACTION_BENCH_INGEST_BATCHES=768 \
TONBO_COMPACTION_BENCH_ROWS_PER_BATCH=96 \
TONBO_COMPACTION_BENCH_KEY_SPACE=4096 \
TONBO_COMPACTION_BENCH_ARTIFACT_ITERATIONS=64 \
TONBO_COMPACTION_BENCH_CRITERION_SAMPLE_SIZE=30 \
TONBO_COMPACTION_BENCH_WAL_SYNC=always \
cargo bench --bench compaction_local

Benchmark configuration is env-only via TONBO_COMPACTION_BENCH_*. For benchmark setup, scenario wiring, and JSON artifact schema/output, see:

  • benches/compaction_local.rs
  • benches/compaction/common.rs
  • docs/benchmark_results.md for the consolidated PR-facing summary
  • benches/compaction/results/compaction_local_baseline.md for raw baseline evidence

Project status

Tonbo is currently in alpha. APIs may change, and we're actively iterating based on feedback. We recommend starting with development and non-critical workloads before moving to production.

Features

Storage

  • [x] Parquet files on object storage (S3, R2) or local filesystem
  • [x] Manifest-driven coordination (CAS commits, no server needed)
  • [ ] (in-progress) Remote compaction (offload to serverless functions)
  • [ ] (in-progress) Branching (git-like fork and merge for datasets)
  • [ ] Time-window compaction strategy

Schema & Query

  • [x] Arrow-native schemas (#[derive(Record)] or dynamic Schema)
  • [x] Projection pushdown (read only needed columns)
  • [x] Zero-copy reads via Arrow RecordBatch
  • [ ] (in-progress) Filter pushdown (predicates evaluated at storage layer)

Backends

  • [x] Local filesystem
  • [x] S3 / S3-compatible (R2, MinIO)
  • [ ] (in-progress) OPFS (browser storage)

Runtime

  • [x] Async-first (Tokio)
  • [x] Edge runtimes (Deno, Cloudflare Workers)
  • [x] WebAssembly
  • [ ] (in-progress) io_uring
  • [ ] (in-progress) Python async bindings
  • [ ] JavaScript/TypeScript async bindings

Integrations

  • [ ] DataFusion TableProvider
  • [ ] Postgres Foreign Data Wrapper

Observability

  • [x] Structured logging via tracing crate
  • [x] OpenTelemetry compatible (via tracing-opentelemetry)
  • [ ] (in-progress) Async-aware spans for storage operations

License

Apache License 2.0. See LICENSE for details.

View on GitHub
GitHub Stars1.5k
CategoryData
Updated1d ago
Forks97

Languages

Rust

Security Score

100/100

Audited on Mar 26, 2026

No findings