Csvdb
Converts SQLite and DuckDB databases to and from plain-text CSV directories. Each .csvdb export contains a schema.sql and one CSV per table, so you can diff and version-control your data. Also infers schemas from raw CSVs, detects primary keys, and produces format-independent checksums.
Install / Use
/learn @jeff-gorelick/CsvdbREADME
csvdb
Version-control your relational data like code.
Note: This is beta software. The API and file format may change. Use with caution in production.
SQLite and DuckDB files are binary — git can't diff them, reviewers can't read them, and merges are impossible. csvdb converts your database into a directory of plain-text CSV files + schema.sql, fully diffable and round-trip lossless. Convert back to SQLite, DuckDB, or Parquet when you need query performance.
# git diff myapp.csvdb/rates.csv
"date","rate"
"2024-01-01","4.50"
-"2024-04-01","4.25"
+"2024-04-01","3.75"
+"2024-07-01","3.50"
Every change is a readable, reviewable line in a PR. No binary blobs, no "file changed" with no context.
Use cases:
- Seed data and test fixtures committed alongside code
- Config and lookup tables reviewed in PRs before deploy
- CI integrity checks:
csvdb checksum data.csvdb/ | grep $EXPECTED - Migrating between SQLite, DuckDB, and Parquet without ETL scripts
- Manual edits in a spreadsheet or text editor, rebuild with one command
- Audit trail:
git blameon any CSV row shows who changed it and when
Directory Layouts
A .csvdb directory contains:
mydb.csvdb/
csvdb.toml # format version, export settings
schema.sql # CREATE TABLE, CREATE INDEX, CREATE VIEW
users.csv # one file per table
orders.csv
A .parquetdb directory has the same structure with Parquet files instead of CSVs:
mydb.parquetdb/
csvdb.toml # format version, export settings
schema.sql # CREATE TABLE, CREATE INDEX, CREATE VIEW
users.parquet # one file per table
orders.parquet
The schema defines the structure. The data files hold the data. csvdb.toml records the format version and the settings used to produce the export.
Why csvdb
CSV format works with standard tools:
- Edit with any text editor or spreadsheet
- Diff and merge with git
- Process with awk, pandas, Excel
SQLite/DuckDB format provides fast access:
- Indexed lookups without scanning entire files
- Views for complex joins and computed columns
- Full SQL query support
- Single-file distribution
Parquet format provides columnar storage:
- Efficient compression and encoding
- Fast analytical queries
- Wide ecosystem support (Spark, pandas, DuckDB, etc.)
- Per-table
.parquetfiles in a.parquetdbdirectory
csvdb lets you store data as CSV (human-readable, git-friendly) and convert to SQLite, DuckDB, or Parquet when you need query performance.
Installation
# Rust (via cargo)
cargo install csvdb
# Python library (import csvdb)
pip install csvdb-py
# Standalone binary (via uv/pipx)
uv tool install csvdb-cli # then: csvdb ...
pipx install csvdb-cli # then: csvdb ...
Quick Start
# Convert an existing SQLite database to csvdb
csvdb to-csvdb mydb.sqlite
git add mydb.csvdb/
git commit -m "Track data in csvdb format"
# Edit data
vim mydb.csvdb/users.csv
# Rebuild database
csvdb to-sqlite mydb.csvdb/
# Or export to Parquet
csvdb to-parquetdb mydb.csvdb/
Commands
init — Create csvdb from raw CSV files
# From a directory of CSV files
csvdb init ./raw_csvs/
# From a single CSV file
csvdb init data.csv
Creates a .csvdb directory by:
- Inferring schema from CSV headers and data types
- Detecting primary keys (columns named
idor<table>_id) - Detecting foreign keys (columns like
user_idreferencingusers.id) - Copying CSV files
Options:
-o, --output <dir>- Custom output directory--force- Overwrite existing output directory--no-pk-detection- Disable automatic primary key detection--no-fk-detection- Disable automatic foreign key detection--tables <list>- Only include these tables (comma-separated)--exclude <list>- Exclude these tables (comma-separated)
to-csvdb — Export database to csvdb
# From SQLite
csvdb to-csvdb mydb.sqlite
# From DuckDB
csvdb to-csvdb mydb.duckdb
# From Parquet
csvdb to-csvdb mydb.parquetdb/
csvdb to-csvdb single_table.parquet
Creates mydb.csvdb/ containing:
schema.sql- table definitions, indexes, views*.csv- one file per table, sorted by primary key
Supports multiple input formats:
- SQLite (
.sqlite,.sqlite3,.db) - DuckDB (
.duckdb) - parquetdb (
.parquetdbdirectory) - Parquet (
.parquetsingle file)
Options:
-o, --output <dir>- Custom output directory--order <mode>- Row ordering mode (see below)--null-mode <mode>- NULL representation in CSV (see below)--natural-sort- Sort string PKs naturally (e.g. "item2" before "item10")--order-by <clause>- Custom ORDER BY clause (e.g. "created_at DESC")--compress- Compress CSV files with gzip (produces.csv.gzfiles)--incremental- Only re-export tables whose data has changed--pipe- Write to temp directory, output only path (for piping)--force- Overwrite existing output directory--tables <list>- Only include these tables (comma-separated)--exclude <list>- Exclude these tables (comma-separated)
to-sqlite — Build SQLite database
csvdb to-sqlite mydb.csvdb/
csvdb to-sqlite mydb.parquetdb/
Creates mydb.sqlite from a csvdb or parquetdb directory.
Options:
--force- Overwrite existing output file--tables <list>- Only include these tables (comma-separated)--exclude <list>- Exclude these tables (comma-separated)
to-duckdb — Build DuckDB database
csvdb to-duckdb mydb.csvdb/
csvdb to-duckdb mydb.parquetdb/
Creates mydb.duckdb from a csvdb or parquetdb directory.
Options:
--force- Overwrite existing output file--tables <list>- Only include these tables (comma-separated)--exclude <list>- Exclude these tables (comma-separated)
to-parquetdb — Convert any format to Parquet
# From SQLite
csvdb to-parquetdb mydb.sqlite
# From DuckDB
csvdb to-parquetdb mydb.duckdb
# From csvdb
csvdb to-parquetdb mydb.csvdb/
# From a single Parquet file
csvdb to-parquetdb users.parquet
Creates mydb.parquetdb/ containing:
schema.sql- table definitions, indexes, viewscsvdb.toml- format version and export settings*.parquet- one Parquet file per table
Supports multiple input formats:
- SQLite (
.sqlite,.sqlite3,.db) - DuckDB (
.duckdb) - csvdb (
.csvdbdirectory) - parquetdb (
.parquetdbdirectory) - Parquet (
.parquetsingle file)
Options:
-o, --output <dir>- Custom output directory--order <mode>- Row ordering mode (see below)--null-mode <mode>- NULL representation (see below)--pipe- Write to temp directory, output only path (for piping)--force- Overwrite existing output directory--tables <list>- Only include these tables (comma-separated)--exclude <list>- Exclude these tables (comma-separated)
validate — Check structural integrity
csvdb validate mydb.csvdb/
csvdb validate mydb.parquetdb/
Checks that a .csvdb or .parquetdb directory is structurally valid:
schema.sqlexists and parses correctly- Every table in the schema has a corresponding data file
- No orphan data files without schema entries
Returns exit code 0 if valid, 1 if errors found.
sql — Run read-only SQL queries
csvdb sql "SELECT name, score FROM users ORDER BY score DESC" mydb.csvdb/
csvdb sql "SELECT * FROM orders WHERE total > 100" mydb.sqlite
csvdb sql "SELECT COUNT(*) FROM events" mydb.duckdb
Runs a read-only SQL query against any supported format. The query is executed in an in-memory SQLite database loaded from the input.
Options:
--format <csv|table>- Output format (default: table for TTY, csv for pipe)
watch — Auto-rebuild on changes
csvdb watch mydb.csvdb/ --target sqlite
csvdb watch mydb.csvdb/ --target duckdb
csvdb watch mydb.csvdb/ --target parquetdb
Monitors a .csvdb directory for file changes and automatically rebuilds the target database. Does an initial build, then watches for modifications to CSV files or schema.sql.
Options:
--target <sqlite|duckdb|parquetdb>- Target format to build (required)--debounce <ms>- Debounce interval in milliseconds (default: 500)--order <mode>- Row ordering (for parquetdb target)--null-mode <mode>- NULL representation (for parquetdb target)--tables <list>- Only include these tables (comma-separated)--exclude <list>- Exclude these tables (comma-separated)
hooks — Git hooks for csvdb
csvdb hooks install # Install pre-commit and post-merge hooks
csvdb hooks install --force # Overwrite existing hooks
csvdb hooks uninstall # Remove csvdb git hooks
Installs git hooks that automatically rebuild databases when .csvdb files are committed or merged.
checksum — Verify data integrity
csvdb checksum mydb.sqlite
csvdb checksum mydb.csvdb/
csvdb checksum mydb.duckdb
csvdb checksum mydb.parquetdb/
csvdb checksum users.parquet
Computes a SHA-256 checksum of the database content. The checksum is:
- Format-independent: Same data produces same hash regardless of format
- Deterministic: Same data always produces same hash
- Content-based: Includes schema structure and all row data
Use checksums to verify roundtrip conversions:
csvdb checksum original.sqlite # a1b2c3...
csvdb to-csvdb original.sqlite
csvdb to-duckdb original.csvdb/
csvdb checksum original.duckdb # a1b2c3... (same!)
csvdb to-parquetdb original.csvdb/
csvdb checksum original.parquetdb/ # a1b2c3... (same!)
diff — Compare two databases
csvdb diff v1.csvdb/ v2.csvdb/
csvdb diff base.sqlite modified.csvdb/
csvdb diff --format json v1.csvdb/ v2.csvdb/
Compares two databases or csvdb directories and reports row-level differences: added, deleted, and modified rows with column-level detail. Works across formats (e.g. SQLite vs csvdb).
Returns exit code 0 if identical, 1 if differe
Related Skills
feishu-drive
342.0k|
things-mac
342.0kManage Things 3 via the `things` CLI on macOS (add/update projects+todos via URL scheme; read/search/list from the local Things database)
clawhub
342.0kUse the ClawHub CLI to search, install, update, and publish agent skills from clawhub.com
codebase-memory-mcp
1.1kHigh-performance code intelligence MCP server. Indexes codebases into a persistent knowledge graph — average repo in milliseconds. 66 languages, sub-ms queries, 99% fewer tokens. Single static binary, zero dependencies.
