SkillAgentSearch skills...

Frigatebird

πŸͺΆ columnar SQL database built from scratch

Install / Use

/learn @Frigatebird-db/Frigatebird
About this skill

Quality Score

0/100

Supported Platforms

Universal

README

<div align="center"> <img src="./assets/frigatebird.svg" alt="Frigatebird" width="25%"> <p>Frigatebird: A columnar SQL database with push-based query execution</p> <sub>Named after the <a href="https://en.wikipedia.org/wiki/Frigatebird">frigatebird</a>, one of the fastest birds that can stay aloft for weeks with minimal effort</sub>

License: MIT

</div>

Demo

Overview

Frigatebird is an embedded columnar database that implements a push-based Volcano execution model with morsel-driven parallelism. Queries compile into pipelines where operators push batches downstream through channels, enabling parallel execution across multiple workers.

Architecture

<div> <img src="./assets/architecture.png" alt="architecture" width="65%"> </div> <!-- ![architecture](./assets/architecture.png) -->

Key Features

  • Push-based execution - Operators push data downstream instead of pulling, enabling natural parallelism
  • Morsel-driven parallelism - Data processed in 50k-row morsels that flow through pipelines independently
  • Late materialization - Columns loaded only when needed, reducing I/O for selective queries
  • Three-tier caching - Uncompressed (hot) β†’ Compressed (warm) β†’ Disk (cold)
  • Vectorized filtering - Bitmap operations process 64 rows per CPU instruction
  • Dictionary encoding - Automatic compression for low-cardinality string columns
  • WAL durability - Write-ahead logging with three-phase commit for crash recovery
  • io_uring + O_DIRECT - Batched async I/O bypassing OS page cache (Linux)

Quick Start

# Start the interactive CLI
make frigatebird
sql> CREATE TABLE events (id TEXT, ts TIMESTAMP, data TEXT) ORDER BY id
OK

sql> INSERT INTO events (id, ts, data) VALUES ('sensor-1', '2024-01-15 10:30:00', 'temperature=23.5')
OK

sql> SELECT * FROM events WHERE id = 'sensor-1'
id       | ts                  | data
sensor-1 | 2024-01-15 10:30:00 | temperature=23.5
(1 rows)

note: ORDER BY is mandatory while creating a table

How It Works

Query Pipeline

A query like SELECT name FROM users WHERE age > 25 AND city = 'NYC' compiles into a pipeline:

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”     β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”     β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚   Step 0    β”‚     β”‚   Step 1    β”‚     β”‚   Step 2    β”‚
β”‚  column:age │────▢│ column:city │────▢│ column:name │────▢ Output
β”‚ filter:>25  β”‚     β”‚ filter:=NYC β”‚     β”‚ (project)   β”‚
β”‚   (root)    β”‚     β”‚             β”‚     β”‚             β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜     β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜     β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
   187k rows           31k rows            8k rows

Each step loads only its column and filters, passing surviving row IDs downstream. The final step materializes projection columns only for rows that passed all filters.

Late Materialization

Without late materialization:  Load ALL columns Γ— 187k rows = 750k values
With late materialization:     age: 187k + city: 31k + name: 8k = 226k values
                               ─────────────────────────────────────────────
                               70% less data loaded

Parallel Execution

Workers grab morsels (page groups) using lock-free atomic compare-and-swap:

Time ──────────────────────────────────────────────▢

Worker 0  β•‘β–“β–“β–“ M0:S0 β–“β–“β–“β•‘     β•‘β–“β–“ M0:S1 β–“β–“β•‘    β•‘β–“ M0:S2 β–“β•‘
Worker 1  β•‘β–“β–“β–“ M1:S0 β–“β–“β–“β•‘   β•‘β–“β–“ M1:S1 β–“β–“β•‘   β•‘β–“ M1:S2 β–“β•‘
Worker 2    β•‘β–“β–“β–“ M2:S0 β–“β–“β–“β•‘   β•‘β–“β–“ M2:S1 β–“β–“β•‘    β•‘β–“ M2:S2 β–“β•‘
Worker 3      β•‘β–“β–“ M3:S0 β–“β–“β•‘   β•‘β–“ M3:S1 β–“β•‘    β•‘β–“ M3:S2 β–“β•‘

Legend: M0:S0 = Morsel 0, Step 0

Multiple morsels execute concurrently. Channels buffer batches between steps.

CLI Commands

| Command | Description | |---------|-------------| | \d | List all tables | | \d <table> | Describe table schema | | \dt | List all tables | | \i <file> | Execute SQL from file | | \q | Quit |

SQL Support

DDL

  • CREATE TABLE <name> (...columns...) ORDER BY <col[, ...]>

DML

  • INSERT INTO <table> (...) VALUES (...)
  • UPDATE <table> SET ... [WHERE ...]
  • DELETE FROM <table> [WHERE ...]

Queries

  • Single table SELECT with WHERE, ORDER BY, LIMIT, OFFSET
  • Predicates: AND, OR, comparisons, BETWEEN, LIKE, ILIKE, RLIKE, IN, IS NULL
  • Aggregates: COUNT, SUM, AVG, MIN, MAX, VARIANCE, STDDEV, PERCENTILE_CONT
  • Aggregate modifiers: FILTER (WHERE ...), DISTINCT
  • Conditional aggregates: sumIf, avgIf, countIf
  • GROUP BY with HAVING, QUALIFY, DISTINCT
  • Window functions: ROW_NUMBER, RANK, DENSE_RANK, LAG, LEAD, SUM, FIRST_VALUE, LAST_VALUE
  • Time functions: TIME_BUCKET, DATE_TRUNC
  • Scalar functions: ABS, ROUND, CEIL, FLOOR, EXP, LN, LOG, POWER, WIDTH_BUCKET

Data Types

  • TEXT / VARCHAR / STRING
  • INT / INTEGER / BIGINT
  • FLOAT / DOUBLE / REAL
  • BOOL / BOOLEAN
  • TIMESTAMP / DATETIME
  • UUID
  • INET / IP

Storage

Data is stored in columnar format with one file per column:

storage/
β”œβ”€β”€ data.00000          # Page data (4GB max per file, auto-rotates)
└── data.00001

wal_files/
β”œβ”€β”€ frigatebird/        # WAL for durability
└── frigatebird-meta/   # Metadata journal

Pages are compressed with LZ4 and aligned to 4KB boundaries for O_DIRECT I/O.

Running Tests

cargo test

Documentation

See the docs/ directory for detailed documentation:

License

MIT License - see LICENSE for details.

View on GitHub
GitHub Stars105
CategoryData
Updated17d ago
Forks2

Languages

Rust

Security Score

95/100

Audited on Mar 15, 2026

No findings