SparrowDB

Embedded graph database — Cypher queries, no server, no subscription. Rust-native with Python, Node.js & Ruby bindings.

Generate Convert Improve

Install / Use

/learn @ryaker/SparrowDB

About this skill

Quality Score

0/100

README

<p align="center"> <img src="docs/logo.png" alt="SparrowDB" width="260" /> </p> <h1 align="center">SparrowDB</h1> <p align="center"><strong>The SQLite of graph databases. Embedded, Cypher-native, zero infrastructure.</strong></p> <p align="center"> <a href="https://github.com/ryaker/SparrowDB/actions"><img src="https://github.com/ryaker/SparrowDB/actions/workflows/ci.yml/badge.svg" alt="CI" /></a> <a href="https://crates.io/crates/sparrowdb"><img src="https://img.shields.io/crates/v/sparrowdb.svg" alt="crates.io" /></a> <a href="https://docs.rs/sparrowdb"><img src="https://docs.rs/sparrowdb/badge.svg" alt="docs.rs" /></a> <a href="LICENSE"><img src="https://img.shields.io/badge/License-MIT-yellow.svg" alt="License: MIT" /></a> <img src="https://img.shields.io/badge/status-pre--1.0%20%7C%20building%20in%20public-orange.svg" alt="Status" /> <img src="https://img.shields.io/badge/bindings-Python%20%7C%20Node.js%20%7C%20Ruby-blue.svg" alt="Bindings" /> <a href="https://deepwiki.com/ryaker/SparrowDB"><img src="https://deepwiki.com/badge.svg" alt="Ask DeepWiki" /></a> </p>

SparrowDB is an embedded graph database. It links directly into your process — Rust, Python, Node.js, or Ruby — and gives you a real Cypher query interface backed by a WAL-durable store on disk. No server. No JVM. No cloud subscription. No daemon to babysit.

If your data is fundamentally relational — recommendations, social graphs, dependency trees, fraud rings, knowledge graphs — and you want to query it with multi-hop traversals instead of JOIN chains, SparrowDB is the drop-in answer.

Quick Start

use sparrowdb::GraphDb;

fn main() -> sparrowdb::Result<()> {
    let db = GraphDb::open(std::path::Path::new("social.db"))?;

    db.execute("CREATE (alice:Person {name: 'Alice', age: 30})")?;
    db.execute("CREATE (bob:Person   {name: 'Bob',   age: 25})")?;
    db.execute("MATCH (a:Person {name:'Alice'}), (b:Person {name:'Bob'}) CREATE (a)-[:KNOWS]->(b)")?;

    // Who does Alice know? Who do *they* know?
    let fof = db.execute("MATCH (a:Person {name:'Alice'})-[:KNOWS*1..2]->(f) RETURN DISTINCT f.name")?;
    // -> [["Bob"], ["Carol"]]  (Carol is a friend-of-friend)
    let _ = fof;
    Ok(())
}

That's it. The database is a directory on disk. Ship it.

Performance: Faster Than Neo4j Where It Counts

Benchmarked against Neo4j 5.x on the SNAP Facebook dataset (4,039 nodes, 88,234 edges). All figures are p50 latency, v0.1.15.

| Query | SparrowDB | Neo4j | vs Neo4j | |-------|-----------|-------|---------| | Point Lookup (indexed) | 103µs | 321µs | 3x faster | | Global COUNT(*) | 2.2µs | 202µs | 93x faster | | Top-10 by Degree | 401µs | 17,588µs | 44x faster | | Mutual Friends (Q8) | 0.72ms | 352µs | 2x faster |

Point lookups, aggregations, and mutual-neighbor queries beat a running Neo4j server — with no JVM, no server process, no network hop.

Q8 dropped from 153ms → 0.67ms (−99.6%) in v0.1.15. Deep traversal (Q3/Q4/Q5) is slower than a warmed Neo4j server — that's expected for an embedded engine without parallel execution. The target workload is agents, CLIs, and apps that need a graph database without operating one.

Cold start: ~27ms — viable for serverless and short-lived processes where Neo4j's server startup is disqualifying.

Built for AI Agents and MCP

SparrowDB ships with a first-class MCP server (sparrowdb-mcp) — the only embedded graph database with native MCP support. It speaks JSON-RPC 2.0 over stdio and plugs directly into Claude Desktop and any MCP-compatible AI client.

cargo install sparrowdb --bin sparrowdb-mcp --locked

Add to ~/Library/Application Support/Claude/claude_desktop_config.json:

{
  "mcpServers": {
    "sparrowdb": {
      "command": "/absolute/path/to/sparrowdb-mcp",
      "args": []
    }
  }
}

Your AI assistant can now query and write to your graph database using natural tool calls:

| Tool | Description | |------|-------------| | execute_cypher | Execute any Cypher statement; returns result rows | | create_entity | Create a node with a label and properties | | add_property | Set a property on nodes matching a filter | | checkpoint | Flush WAL and compact | | info | Database metadata |

Full setup: docs/mcp-setup.md

Why this matters for agent builders: Multi-agent systems need shared, persistent graph state. SparrowDB gives your agents a knowledge graph they can read and write without spinning up a server. Pair it with SparrowOntology for schema-enforced agent memory and governance.

Why SparrowDB

The graph database landscape has a gap.

Neo4j is powerful, but it requires a running server, a JVM, and a license the moment you need production features. DGraph is horizontally scalable, but you don't need horizontal scale — you need to ship your app. Every existing option assumes you want to operate a database cluster, not embed a graph engine.

SparrowDB fills the same role SQLite fills for relational data: zero infrastructure, full capability, open source, MIT licensed.

| Question | Answer | |---|---| | Does it need a server? | No. It's a library. | | Does it need a cloud account? | No. It's a file on disk. | | Can it survive kill -9? | Yes. WAL + crash recovery. | | Can multiple threads read at once? | Yes. SWMR — readers never block writers. | | Does the Python binding release the GIL? | Yes. Every call into the engine releases it. | | Can I use it from an AI assistant? | Yes. Built-in MCP server. |

When to Use SparrowDB

SparrowDB is the right choice when:

Your data has structure that's hard to flatten. Social follows, product recommendations, dependency graphs, org charts, bill-of-materials, knowledge graphs — these are terrible in SQL and natural in graphs.
You're building an application, not operating a database. You want to cargo add sparrowdb and ship, not provision instances.
You need multi-hop queries. MATCH (a)-[:FOLLOWS*1..3]->(b) is one query. In SQL it's recursive CTEs all the way down.
You're embedding into a CLI, desktop app, agent, or edge service. SparrowDB opens in milliseconds and has no runtime overhead when idle.

SparrowDB is not the right choice when:

Deep multi-hop traversal on large high-fanout graphs is your primary workload. If you're running 5-hop queries across a billion-edge social graph, use Neo4j. SparrowDB is a single-process embedded engine — it's not trying to win that race.
You need distributed writes across many nodes, or your graph has billions of edges and requires horizontal sharding. Use Neo4j Aura or DGraph for that.

Install

Node.js

npm install sparrowdb

Rust

[dependencies]
sparrowdb = "0.1"

Python

# Build from source (requires Rust toolchain):
cd crates/sparrowdb-python && maturin develop

PyPI package coming soon. Pre-built wheels are on the roadmap.

Ruby

# Build from source (requires Rust toolchain):
cd crates/sparrowdb-ruby && bundle install && rake compile

RubyGems package coming soon.

CLI

cargo install sparrowdb --bin sparrowdb

MCP Server (Claude Desktop integration)

cargo install sparrowdb --bin sparrowdb-mcp --locked

Features

Cypher Support

| Feature | Status | |---------|--------| | CREATE, MATCH, SET, DELETE | ✅ | | WHERE — =, <>, <, <=, >, >= | ✅ | | WHERE n.prop CONTAINS str / STARTS WITH str | ✅ | | WHERE n.prop IS NULL / IS NOT NULL | ✅ | | 1-hop and multi-hop edges (a)-[:R]->()-[:R]->(c) | ✅ | | Undirected edges (a)-[:R]-(b) | ✅ | | Reverse-arrow pattern (a)->()<-(c) | ✅ | | Variable-length paths [:R*1..N] | ✅ | | Multi-label nodes (n:A:B) | ✅ | | RETURN DISTINCT, ORDER BY, LIMIT, SKIP | ✅ | | COUNT(*), COUNT(expr), COUNT(DISTINCT expr) | ✅ | | SUM, AVG, MIN, MAX | ✅ | | collect() — aggregate into list | ✅ | | coalesce(expr1, expr2, …) — first non-null | ✅ | | WITH … WHERE pipeline (filter mid-query) | ✅ | | WITH … MATCH pipeline (chain traversals) | ✅ | | WITH … UNWIND pipeline | ✅ | | UNWIND list AS var MATCH (n {id: var}) | ✅ | | OPTIONAL MATCH | ✅ | | UNION / UNION ALL | ✅ | | MERGE — upsert node with ON CREATE SET / ON MATCH SET | ✅ | | MATCH (a),(b) MERGE (a)-[:R]->(b) — idempotent edge | ✅ | | CREATE (a)-[:REL]->(b) — directed edge | ✅ | | CASE WHEN … THEN … ELSE … END | ✅ | | EXISTS { (n)-[:REL]->(:Label) } | ✅ | | EXISTS in WITH … WHERE | ✅ | | shortestPath((a)-[:R*]->(b)) | ✅ | | ANY / ALL / NONE / SINGLE list predicates | ✅ | | id(n), labels(n), type(r) | ✅ | | size(), range(), toInteger(), toString() | ✅ | | toUpper(), toLower(), trim(), replace(), substring() | ✅ | | abs(), ceil(), floor(), sqrt(), sign() | ✅ | | Parameters $param | ✅ | | CALL db.index.fulltext.queryNodes — scored full-text search | ✅ | | CALL db.schema() | ✅ | | Subqueries CALL { … } | ⚠️ Partial |

Engine & Storage

WAL durability — write-ahead log with crash recovery; survives hard kills
SWMR concurrency — single-writer, multiple-reader; readers never block writers
Chunked vectorized pipeline — 4-phase chunked execution engine for multi-hop traversals; FrontierScratch arena eliminates per-hop allocation; SlotIntersect for mutual-neighbor queries
Factorized execution — multi-hop traversals avoid materializing O(N²) intermediate rows
B-tree property index — equality lookups in O(log n), not full label scans; persisted to disk
Inverted text index — CONTAINS / STARTS WITH routed through an index
Full-text search — relevance-scored queryNodes without Elasticsearch
External merge sort — ORDER BY on large results spills to di

Related Skills

himalaya

353.1k

CLI to manage emails via IMAP/SMTP. Use `himalaya` to list, read, write, reply, forward, search, and organize emails from the terminal. Supports multiple accounts and message composition with MML (MIME Meta Language).

taskflow

353.1k

name: taskflow description: Use when work should span one or more detached tasks but still behave like one job with a single owner context. TaskFlow is the durable flow substrate under authoring layer

claude-opus-4-5-migration

111.6k

Migrate prompts and code from Claude Sonnet 4.0, Sonnet 4.5, or Opus 4.1 to Opus 4.5

coding-agent

353.1k

Delegate coding tasks to Codex, Claude Code, or Pi agents via background process