Csvdb

Converts SQLite and DuckDB databases to and from plain-text CSV directories. Each .csvdb export contains a schema.sql and one CSV per table, so you can diff and version-control your data. Also infers schemas from raw CSVs, detects primary keys, and produces format-independent checksums.

Generate Convert Improve

Install / Use

/learn @jeff-gorelick/Csvdb

About this skill

Quality Score

0/100

README

csvdb

Version-control your relational data like code.

Note: This is beta software. The API and file format may change. Use with caution in production.

SQLite and DuckDB files are binary — git can't diff them, reviewers can't read them, and merges are impossible. csvdb converts your database into a directory of plain-text CSV files + schema.sql, fully diffable and round-trip lossless. Convert back to SQLite, DuckDB, or Parquet when you need query performance.

 # git diff myapp.csvdb/rates.csv
 "date","rate"
 "2024-01-01","4.50"
-"2024-04-01","4.25"
+"2024-04-01","3.75"
+"2024-07-01","3.50"

Every change is a readable, reviewable line in a PR. No binary blobs, no "file changed" with no context.

Use cases:

Seed data and test fixtures committed alongside code
Config and lookup tables reviewed in PRs before deploy
CI integrity checks: csvdb checksum data.csvdb/ | grep $EXPECTED
Migrating between SQLite, DuckDB, and Parquet without ETL scripts
Manual edits in a spreadsheet or text editor, rebuild with one command
Audit trail: git blame on any CSV row shows who changed it and when

Directory Layouts

A .csvdb directory contains:

mydb.csvdb/
  csvdb.toml    # format version, export settings
  schema.sql    # CREATE TABLE, CREATE INDEX, CREATE VIEW
  users.csv     # one file per table
  orders.csv

A .parquetdb directory has the same structure with Parquet files instead of CSVs:

mydb.parquetdb/
  csvdb.toml       # format version, export settings
  schema.sql       # CREATE TABLE, CREATE INDEX, CREATE VIEW
  users.parquet    # one file per table
  orders.parquet

The schema defines the structure. The data files hold the data. csvdb.toml records the format version and the settings used to produce the export.

Why csvdb

CSV format works with standard tools:

Edit with any text editor or spreadsheet
Diff and merge with git
Process with awk, pandas, Excel

SQLite/DuckDB format provides fast access:

Indexed lookups without scanning entire files
Views for complex joins and computed columns
Full SQL query support
Single-file distribution

Parquet format provides columnar storage:

Efficient compression and encoding
Fast analytical queries
Wide ecosystem support (Spark, pandas, DuckDB, etc.)
Per-table .parquet files in a .parquetdb directory

csvdb lets you store data as CSV (human-readable, git-friendly) and convert to SQLite, DuckDB, or Parquet when you need query performance.

Installation

# Rust (via cargo)
cargo install csvdb

# Python library (import csvdb)
pip install csvdb-py

# Standalone binary (via uv/pipx)
uv tool install csvdb-cli   # then: csvdb ...
pipx install csvdb-cli      # then: csvdb ...

Quick Start

# Convert an existing SQLite database to csvdb
csvdb to-csvdb mydb.sqlite
git add mydb.csvdb/
git commit -m "Track data in csvdb format"

# Edit data
vim mydb.csvdb/users.csv

# Rebuild database
csvdb to-sqlite mydb.csvdb/

# Or export to Parquet
csvdb to-parquetdb mydb.csvdb/

Commands

init — Create csvdb from raw CSV files

# From a directory of CSV files
csvdb init ./raw_csvs/

# From a single CSV file
csvdb init data.csv

Creates a .csvdb directory by:

Inferring schema from CSV headers and data types
Detecting primary keys (columns named id or <table>_id)
Detecting foreign keys (columns like user_id referencing users.id)
Copying CSV files

Options:

-o, --output <dir> - Custom output directory
--force - Overwrite existing output directory
--no-pk-detection - Disable automatic primary key detection
--no-fk-detection - Disable automatic foreign key detection
--tables <list> - Only include these tables (comma-separated)
--exclude <list> - Exclude these tables (comma-separated)

to-csvdb — Export database to csvdb

# From SQLite
csvdb to-csvdb mydb.sqlite

# From DuckDB
csvdb to-csvdb mydb.duckdb

# From Parquet
csvdb to-csvdb mydb.parquetdb/
csvdb to-csvdb single_table.parquet

Creates mydb.csvdb/ containing:

schema.sql - table definitions, indexes, views
*.csv - one file per table, sorted by primary key

Supports multiple input formats:

SQLite (.sqlite, .sqlite3, .db)
DuckDB (.duckdb)
parquetdb (.parquetdb directory)
Parquet (.parquet single file)

Options:

-o, --output <dir> - Custom output directory
--order <mode> - Row ordering mode (see below)
--null-mode <mode> - NULL representation in CSV (see below)
--natural-sort - Sort string PKs naturally (e.g. "item2" before "item10")
--order-by <clause> - Custom ORDER BY clause (e.g. "created_at DESC")
--compress - Compress CSV files with gzip (produces .csv.gz files)
--incremental - Only re-export tables whose data has changed
--pipe - Write to temp directory, output only path (for piping)
--force - Overwrite existing output directory
--tables <list> - Only include these tables (comma-separated)
--exclude <list> - Exclude these tables (comma-separated)

to-sqlite — Build SQLite database

csvdb to-sqlite mydb.csvdb/
csvdb to-sqlite mydb.parquetdb/

Creates mydb.sqlite from a csvdb or parquetdb directory.

Options:

--force - Overwrite existing output file
--tables <list> - Only include these tables (comma-separated)
--exclude <list> - Exclude these tables (comma-separated)

to-duckdb — Build DuckDB database

csvdb to-duckdb mydb.csvdb/
csvdb to-duckdb mydb.parquetdb/

Creates mydb.duckdb from a csvdb or parquetdb directory.

Options:

--force - Overwrite existing output file
--tables <list> - Only include these tables (comma-separated)
--exclude <list> - Exclude these tables (comma-separated)

to-parquetdb — Convert any format to Parquet

# From SQLite
csvdb to-parquetdb mydb.sqlite

# From DuckDB
csvdb to-parquetdb mydb.duckdb

# From csvdb
csvdb to-parquetdb mydb.csvdb/

# From a single Parquet file
csvdb to-parquetdb users.parquet

Creates mydb.parquetdb/ containing:

schema.sql - table definitions, indexes, views
csvdb.toml - format version and export settings
*.parquet - one Parquet file per table

Supports multiple input formats:

SQLite (.sqlite, .sqlite3, .db)
DuckDB (.duckdb)
csvdb (.csvdb directory)
parquetdb (.parquetdb directory)
Parquet (.parquet single file)

Options:

-o, --output <dir> - Custom output directory
--order <mode> - Row ordering mode (see below)
--null-mode <mode> - NULL representation (see below)
--pipe - Write to temp directory, output only path (for piping)
--force - Overwrite existing output directory
--tables <list> - Only include these tables (comma-separated)
--exclude <list> - Exclude these tables (comma-separated)

validate — Check structural integrity

csvdb validate mydb.csvdb/
csvdb validate mydb.parquetdb/

Checks that a .csvdb or .parquetdb directory is structurally valid:

schema.sql exists and parses correctly
Every table in the schema has a corresponding data file
No orphan data files without schema entries

Returns exit code 0 if valid, 1 if errors found.

sql — Run read-only SQL queries

csvdb sql "SELECT name, score FROM users ORDER BY score DESC" mydb.csvdb/
csvdb sql "SELECT * FROM orders WHERE total > 100" mydb.sqlite
csvdb sql "SELECT COUNT(*) FROM events" mydb.duckdb

Runs a read-only SQL query against any supported format. The query is executed in an in-memory SQLite database loaded from the input.

Options:

--format <csv|table> - Output format (default: table for TTY, csv for pipe)

watch — Auto-rebuild on changes

csvdb watch mydb.csvdb/ --target sqlite
csvdb watch mydb.csvdb/ --target duckdb
csvdb watch mydb.csvdb/ --target parquetdb

Monitors a .csvdb directory for file changes and automatically rebuilds the target database. Does an initial build, then watches for modifications to CSV files or schema.sql.

Options:

--target <sqlite|duckdb|parquetdb> - Target format to build (required)
--debounce <ms> - Debounce interval in milliseconds (default: 500)
--order <mode> - Row ordering (for parquetdb target)
--null-mode <mode> - NULL representation (for parquetdb target)
--tables <list> - Only include these tables (comma-separated)
--exclude <list> - Exclude these tables (comma-separated)

hooks — Git hooks for csvdb

csvdb hooks install         # Install pre-commit and post-merge hooks
csvdb hooks install --force # Overwrite existing hooks
csvdb hooks uninstall       # Remove csvdb git hooks

Installs git hooks that automatically rebuild databases when .csvdb files are committed or merged.

checksum — Verify data integrity

csvdb checksum mydb.sqlite
csvdb checksum mydb.csvdb/
csvdb checksum mydb.duckdb
csvdb checksum mydb.parquetdb/
csvdb checksum users.parquet

Computes a SHA-256 checksum of the database content. The checksum is:

Format-independent: Same data produces same hash regardless of format
Deterministic: Same data always produces same hash
Content-based: Includes schema structure and all row data

Use checksums to verify roundtrip conversions:

csvdb checksum original.sqlite       # a1b2c3...
csvdb to-csvdb original.sqlite
csvdb to-duckdb original.csvdb/
csvdb checksum original.duckdb       # a1b2c3... (same!)
csvdb to-parquetdb original.csvdb/
csvdb checksum original.parquetdb/   # a1b2c3... (same!)

diff — Compare two databases

csvdb diff v1.csvdb/ v2.csvdb/
csvdb diff base.sqlite modified.csvdb/
csvdb diff --format json v1.csvdb/ v2.csvdb/

Compares two databases or csvdb directories and reports row-level differences: added, deleted, and modified rows with column-level detail. Works across formats (e.g. SQLite vs csvdb).

Returns exit code 0 if identical, 1 if differe

Related Skills

feishu-drive

342.0k

things-mac

342.0k

Manage Things 3 via the `things` CLI on macOS (add/update projects+todos via URL scheme; read/search/list from the local Things database)

clawhub

342.0k

Use the ClawHub CLI to search, install, update, and publish agent skills from clawhub.com

codebase-memory-mcp

1.1k

High-performance code intelligence MCP server. Indexes codebases into a persistent knowledge graph — average repo in milliseconds. 66 languages, sub-ms queries, 99% fewer tokens. Single static binary, zero dependencies.

jeff-gorelick

View profile

View on GitHub

GitHub Stars6

CategoryData

Updated1mo ago

Forks1

jeff-gorelick/csvdb

Languages

Rust

Security Score

85/100

Audited on Feb 27, 2026

No findings