Ledgerdb
LedgerDB is a Git-native document database that stores each write as an immutable commit, replicates via push/pull, serves reads from a SQLite sidecar index, and resolves conflicts with CAS.
Install / Use
/learn @osvaldoandrade/LedgerdbREADME
LedgerDB: Git-Native Distributed Database
1. Introduction & Vision
LedgerDB is a high-reliability, immutable document store built directly upon the Git object model. It bridges the gap between application versioning and data storage by treating database transactions as immutable blobs within a standard Git repository.
Unlike traditional databases that manage opaque binary storage files, LedgerDB leverages the Git Merkle DAG to provide:
- Tamper-Evident History: Every state change is cryptographically linked to its parent.
- Decentralized Replication: Any clone of the repository is a valid read/write replica.
- Offline-First: Writes can occur locally and be synchronized later via standard Git protocols.
This document serves as the Master Specification, outlining the system's architecture and indexing the detailed technical specifications located in the /docs directory.
2. Get Started (CLI)
Install
# macOS / Linux / Windows (Git Bash): build from source and install into your PATH
curl -fsSL https://raw.githubusercontent.com/osvaldoandrade/ledgerdb/main/install.sh | sh
# npm (downloads a prebuilt binary from GitHub Releases)
npm i -g @osvaldoandrade/ledgerdb@latest
Installer knobs:
LEDGERDB_REF: git ref (branch/tag/commit), defaultmainLEDGERDB_BIN_DIR: install directory (defaults to first writable dir onPATH)LEDGERDB_BIN_NAME: installed binary name (defaults to detected./cmd/<name>)LEDGERDB_PKG: Go package to build (defaults to auto-detect under./cmd)
# Build the CLI
make build
# Initialize a bare repo (recommended layout + history mode)
ledgerdb init --name "LedgerDB" --repo ./ledgerdb.git --layout sharded --history-mode append
# Apply a collection schema (example)
ledgerdb collection apply tasks --schema ./schemas/task.json --indexes "status,assignee"
# Write and read documents
ledgerdb doc put tasks "task_0001" --payload '{"title":"Ship v1","status":"todo","priority":"high"}'
ledgerdb doc get tasks "task_0001"
# Watch the SQLite sidecar index (state-based, O(changes))
ledgerdb index watch --db ./index.db --mode state --interval 1s --fast --batch-commits 200
Notes
- Sharded layout:
--layout shardedspreads documents across deep directories for stable filesystem performance. - State mode indexing:
--mode statereads fromstate/and only applies changed docs. It is the recommended mode for near real-time indexing. - History modes:
appendpreserves full audit history;amendkeeps only the latest state per document.
3. Architecture Overview
The system is architected as a "Smart Client, Dumb Storage" engine. The complexity of concurrency control, validation, and query execution resides in the client (CLI/SDK), while the storage layer is a pure, dumb Git Bare Repository.
graph TD
Client[Client SDK / CLI] -->|Put/Patch| Logic[Logic Layer]
Logic -->|Validate| Schema[JSON Schema]
Logic -->|Serialize| TxV3[TxV3 Protobuf]
Logic -->|CAS| Git[Git Bare Repo]
Git -->|Replication| Remote[Remote Origin]
4. The Storage Engine
The storage layer is responsible for mapping logical keys to physical files without performance degradation.
- Log-Structured Merge (LSM) on Git: We treat Git blobs as append-only logs. Data is never overwritten.
- Hierarchical Sharding: Keys are hashed (SHA-256) and mapped to deep directory structures to ensure $O(1)$ file system access and prevent directory bloating.
- TxV3 Protocol: A deterministic Protobuf binary format ensures that transaction hashes are reproducible and verifiable.
5. Partitioning & Distribution Strategy
To scale to millions of documents, LedgerDB uses a deterministic partitioning scheme.
- Content Addressing: The location of a document is mathematically determined by its key, eliminating the need for a central lookup table.
- Node Independence: Since partitioning is algorithmic, any client can locate any data segment without coordination.
6. Data Versioning & Causality
LedgerDB abandons "Wall-Clock Time" in favor of Causal History.
- Merkle DAG: Transactions form a Directed Acyclic Graph. A write is valid only if it references the correct parent hash.
- Branching: Concurrent writes creates branches (divergent histories).
- Semantic Merging: The system supports CRDT-inspired merging strategies for JSON documents to resolve branches automatically where possible.
7. Execution Model: The Write Path
High availability for writes is achieved through Optimistic Concurrency Control.
- Compare-and-Swap (CAS): We utilize OS-level atomic operations on the
refs/heads/mainfile to serialize writes. - No Global Locks: Writers do not block readers. Contention is handled via retries (Exponential Backoff) at the client level.
- Durability: The
fsyncof Git objects precedes the reference update, guaranteeing no data loss on crash.
8. Querying & Indexing
While the primary access pattern is Key-Value, LedgerDB supports secondary indexing.
- Materialized Views: Indexes are derived views of the immutable ledger.
- External Indexers: Because the ledger is open, external systems (like Elasticsearch or SQLite) can tail the Git log to build rich, queryable projections without affecting write performance.
- SQLite Sidecar:
ledgerdb index sync --db ./index.dbmaterializes per-collection tables for local querying (--batch-commits,--fast,--modereduce SQLite overhead). - Polling:
ledgerdb index watch --db ./index.db --interval 5skeeps the index fresh (--only-changes,--once,--jitter,--quiet,--batch-commits,--fast,--modeare available).
9. Integrity & Security
Security is built-in, not bolted on.
- Cryptographic Chaining: A
VERIFYoperation recomputes the hash of every transaction from Genesis to Head. Any bit-rot or tampering breaks the chain. - Signed Commits: Support for GPG/SSH signing of commits ensures non-repudiation of writes.
10. Replication & Synchronization
LedgerDB delegates replication to the robust Git protocol.
- Push/Pull: Nodes synchronize via standard
git fetchandgit push. - Eventual Consistency: The system is CP (Consistent/Partition-tolerant) during a write to a single master, but AP (Available/Partition-tolerant) across the distributed mesh.
11. Operational Tooling (CLI)
The ledgerdb CLI is the primary operator interface.
- Repo Management:
init,clone,status. - Data Ops:
put,get,patch,delete. - Indexing:
index syncto build SQLite projections. - Debug:
inspect,verify,log. - Maintenance:
maintenance gc,maintenance snapshot. - Logging:
--log-leveland--log-formatflags (orLEDGERDB_LOG_LEVEL,LEDGERDB_LOG_FORMATenv vars) to control verbosity and JSON output. - Signing:
--signand--sign-key(orLEDGERDB_GIT_SIGN,LEDGERDB_GIT_SIGN_KEY) to sign Git commits. - Sync: writes auto-fetch and auto-push by default (
--sync=falseto disable;LEDGERDB_AUTO_SYNC=false). - Dev Checks:
go test ./...,go test -race ./...,go vet ./...,golangci-lint run(uses.golangci.yml).
11.1 CLI Quick Start
# Build
go build ./cmd/ledgerdb
# Initialize a bare repo (layout + history mode are configurable)
ledgerdb init --name "LedgerDB" --repo ./ledgerdb.git --layout sharded --history-mode append
# Compact mode (single tx per doc, no history)
ledgerdb init --name "LedgerDB" --repo ./ledgerdb.git --layout sharded --history-mode amend
# Apply a collection schema
ledgerdb collection apply users --schema ./schemas/user.json --indexes "email,role"
# Write/read documents
ledgerdb doc put users "usr_123" --payload '{"name":"Alice","role":"admin"}'
ledgerdb doc get users "usr_123"
ledgerdb doc patch users "usr_123" --ops '[{"op":"replace","path":"/role","value":"viewer"}]'
ledgerdb doc delete users "usr_123"
ledgerdb doc log users "usr_123"
# Disable autosync (offline mode)
ledgerdb --sync=false doc put users "usr_123" --payload '{"name":"Alice","role":"admin"}'
# Sign commits (requires git signing configured)
ledgerdb --sign doc put users "usr_123" --payload '{"name":"Alice","role":"admin"}'
# Verify integrity (deep rehydrate)
ledgerdb integrity verify --deep
# Sync SQLite index (per-collection tables)
ledgerdb index sync --db ./index.db --batch-commits 200 --fast --mode state
# Watch SQLite index (poll for new commits)
ledgerdb index watch --db ./index.db --interval 5s --batch-commits 200 --fast --mode state
# One-shot sync with watch command (no loop)
ledgerdb index watch --db ./index.db --once --batch-commits 200 --fast --mode state
# Poll with jitter and silence no-op output
ledgerdb index watch --db ./index.db --interval 5s --jitter 1s --only-changes --quiet --batch-commits 200 --fast --mode state
# Inspect a tx blob by git object hash
ledgerdb inspect blob <object_hash>
# Maintenance
ledgerdb maintenance gc --prune=now
ledgerdb maintenance snapshot --threshold 50
11.2 Build & Install (Makefile)
make build
make install
- Override paths:
make install PREFIX=/usr/local - Shared libs:
make build-core-shared(falls back to archive if unsupported).
11.3 Go SDK (Core)
The Go SDK (github.com/osvaldoandrade/ledgerdb/pkg/ledgerdbsdk) uses the core services directly (no CLI dependency). It configures the SQLite watch and exposes SQL + key-value reads.
cfg := ledgerdbsdk.DefaultConfig("/path/to/ledgerdb.git")
cfg.AutoWatch = true
ctx := context.Background()
client, err := led
