FerrisSearch

A distributed search and SQL analytics engine in Rust with Raft consensus, hybrid BM25+vector search, and a search-aware query planner — powered by Tantivy, DataFusion, and USearch

Generate Convert Improve

Install / Use

/learn @RamakrishnaChilaka/FerrisSearch

About this skill

Quality Score

0/100

README

FerrisSearch

A Rust-native distributed search and search-aware analytics engine with Raft consensus, hybrid vector search, OpenSearch-compatible APIs, and SQL over matched docs — powered by <a href="https://github.com/quickwit-oss/tantivy">Tantivy</a> <a href="#getting-started">Getting Started</a> · <a href="#api-reference">API Reference</a> · <a href="#how-hybrid-sql-works">Architecture</a> · <a href="#benchmarks">Benchmarks</a> · <a href="#replication">Replication</a> · <a href="#testing">Testing</a> · <a href="#roadmap">Roadmap</a>

FerrisSearch is a high-performance, Rust-native distributed search engine with OpenSearch-compatible REST APIs, hybrid vector retrieval, and a search-aware SQL layer for querying matched documents as a dataset. It is built for teams that want the familiar OpenSearch interface with the performance and safety of Rust, without giving up on structured analytics over search results.

⚡ Performance: 2M documents — ingestion at 10,402 docs/sec, search at 142.4 queries/sec (p50 = 24.8ms), zero errors — see benchmarks

Highlights

OpenSearch-compatible REST API — drop-in PUT /{index}, POST /_doc, GET /_search endpoints
Raft consensus — cluster state managed by openraft; quorum-based leader election, linearizable writes, automatic failover, persistent log storage via redb
Vector search — k-NN approximate nearest neighbor search via USearch (HNSW algorithm); hybrid full-text + vector queries
Search-aware SQL — SQL over matched docs with pushdown-aware planning, local fast-field execution when possible, planner metadata, and grouped analytics over the matched result set
Distributed clustering — multi-node clusters with shard-based data distribution
Synchronous replication — primary-replica replication over gRPC; writes acknowledged only after all in-sync replicas confirm
Scatter-gather search — queries fan out across shards, results merged and returned
Crash recovery — binary write-ahead log (WAL) with configurable durability (request fsync-per-write or async timer-based); sequence number checkpointing and translog-based replica recovery
Zero external dependencies — no JVM, no Zookeeper, just a single binary

Why FerrisSearch Is Different

FerrisSearch is not just exposing SQL on top of a search API. The current direction is a true hybrid execution model:

Tantivy executes search-native work first: full-text matching, scoring, and pushdown-friendly structured filters
Fast fields stay in the hot path: when the query is eligible and shards are local, structured columns are read directly from Tantivy fast fields instead of materializing _source
SQL runs on matched docs, not the whole index: Arrow and DataFusion operate on the narrowed result set or merged partial states
Planner metadata is visible: responses show execution_mode and a planner block so you can see what was pushed down vs. what stayed residual

That makes FerrisSearch useful for workflows like:

relevance debugging with score and structured filters in one query
grouped analytics over matched docs
internal dashboards over live search results
interactive search + analysis without moving logic into client code

How Hybrid SQL Works

flowchart LR
  A[SQL Query] --> B[Hybrid Planner]
  B --> C[Tantivy Search-Native Work]
  B --> D[Residual SQL Work]
  C --> E[Matched docs plus score]
  C --> F[Fast-field reads or shard-local partials]
  E --> G[Arrow RecordBatch]
  F --> G
  D --> H[DataFusion]
  G --> H
  H --> I[Rows plus planner metadata]

The intended execution order is:

Tantivy handles text_match(...), scoring, and pushdown-friendly structured filters.
Eligible structured columns are read directly from fast fields.
Arrow batches represent matched docs or merged partial states.
DataFusion executes only the remaining relational work.
The API returns rows plus execution_mode and planner metadata.

Getting Started

Prerequisites

Rust (2024 edition)
Protobuf compiler (protoc)

Single node

cargo run

Docker

docker build -t ferrissearch .
docker run -p 9200:9200 -p 9300:9300 ferrissearch

curl http://localhost:9200/

{"name": "ferrissearch-node", "version": "0.1.0", "engine": "tantivy"}

Multi-node cluster

# Terminal 1
./dev_cluster.sh 1    # HTTP 9200 · Transport 9300 · Raft ID 1

# Terminal 2
./dev_cluster.sh 2    # HTTP 9201 · Transport 9301 · Raft ID 2

# Terminal 3
./dev_cluster.sh 3    # HTTP 9202 · Transport 9302 · Raft ID 3

Configuration

Configure via config/ferrissearch.yml or FERRISSEARCH_* environment variables:

| Option | Default | Description | |--------|---------|-------------| | node_name | node-1 | Node identifier | | cluster_name | ferrissearch | Cluster name | | http_port | 9200 | REST API port | | transport_port | 9300 | gRPC transport port | | data_dir | ./data | Data storage directory | | seed_hosts | ["127.0.0.1:9300"] | Seed nodes for discovery | | raft_node_id | 1 | Unique Raft consensus node ID | | translog_durability | request | Translog fsync mode: request (per-write) or async (timer) | | translog_sync_interval_ms | (unset) | Background fsync interval when durability is async (default: 5000) |

API Reference

Indices

# Create an index
curl -X PUT 'http://localhost:9200/my-index' \
  -H 'Content-Type: application/json' \
  -d '{"settings": {"number_of_shards": 1, "number_of_replicas": 1}}'

# Create an index with field mappings
curl -X PUT 'http://localhost:9200/movies' \
  -H 'Content-Type: application/json' \
  -d '{
    "settings": {"number_of_shards": 3, "number_of_replicas": 1},
    "mappings": {
      "properties": {
        "title":     {"type": "text"},
        "genre":     {"type": "keyword"},
        "year":      {"type": "integer"},
        "rating":    {"type": "float"},
        "embedding": {"type": "knn_vector", "dimension": 3}
      }
    }
  }'

# Delete an index
curl -X DELETE 'http://localhost:9200/my-index'

# Get index settings
curl 'http://localhost:9200/my-index/_settings'

# Update dynamic settings (refresh_interval, number_of_replicas)
curl -X PUT 'http://localhost:9200/my-index/_settings' \
  -H 'Content-Type: application/json' \
  -d '{"index": {"refresh_interval": "2s", "number_of_replicas": 2}}'

Supported field types: text (analyzed), keyword (exact match), integer, float, boolean, knn_vector. Unmapped fields are indexed into a catch-all "body" field for backward compatibility.

Documents

# Index a document (auto-generated ID)
curl -X POST 'http://localhost:9200/my-index/_doc' \
  -H 'Content-Type: application/json' \
  -d '{"title": "Hello World", "tags": "rust search"}'

# Index a document with explicit ID
curl -X PUT 'http://localhost:9200/my-index/_doc/1' \
  -H 'Content-Type: application/json' \
  -d '{"title": "Hello World", "year": 2024}'

# Get a document
curl 'http://localhost:9200/my-index/_doc/{id}'

# Delete a document
curl -X DELETE 'http://localhost:9200/my-index/_doc/{id}'

# Partial update a document (merge fields)
curl -X POST 'http://localhost:9200/my-index/_update/1' \
  -H 'Content-Type: application/json' \
  -d '{"doc": {"rating": 9.5, "genre": "scifi"}}'

# Bulk index
curl -X POST 'http://localhost:9200/my-index/_bulk' \
  -H 'Content-Type: application/json' \
  -d '[
    {"_doc_id": "doc-1", "_source": {"name": "Alice"}},
    {"_doc_id": "doc-2", "_source": {"name": "Bob"}}
  ]'

Search

# Match all
curl 'http://localhost:9200/my-index/_search'

# Query string with pagination
curl 'http://localhost:9200/my-index/_search?q=rust&from=0&size=10'

# Count all documents (fast — uses metadata, no search)
curl 'http://localhost:9200/my-index/_count'

# Count matching documents
curl -X POST 'http://localhost:9200/my-index/_count' \
  -H 'Content-Type: application/json' \
  -d '{"query": {"term": {"brand": "Apple"}}}'

# DSL: match query
curl -X POST 'http://localhost:9200/my-index/_search' \
  -H 'Content-Type: application/json' \
  -d '{"query": {"match": {"title": "search engine"}}}'

# DSL: bool query (must + must_not)
curl -X POST 'http://localhost:9200/my-index/_search' \
  -H 'Content-Type: application/json' \
  -d '{
    "query": {
      "bool": {
        "must": [{"match": {"title": "rust"}}],
        "must_not": [{"match": {"title": "web"}}]
      }
    }
  }'

# DSL: bool query (should = OR)
curl -X POST 'http://localhost:9200/my-index/_search' \
  -H 'Content-Type: application/json' \
  -d '{
    "query": {
      "bool": {
        "should": [
          {"match": {"title": "rust"}},
          {"match": {"title": "python"}}
        ]
      }
    },
    "from": 0,
    "size": 5
  }'

# Fuzzy query (typo-tolerant search)
curl -X POST 'http://localhost:9200/my-index/_search' \
  -H 'Content-Type: application/json' \
  -d '{"query": {"fuzzy": {"title": {"value": "rsut", "fuzziness": 2}}}}'

Search-Aware SQL

POST /{index}/_sql runs a SQL query over the matched document set. Tantivy still handles text matching, relevance scoring, and pushed-down structured filters; Arrow and DataFusion handle the residual SQL-style projection, ordering, grouping, and aggregation after search-aware planning.

Current behavior:

text_match(field, 'query') is pushed into Tantivy
simple =, >, >=, <, <= predicates on structured fields are pushed into Tantivy filters
score is exposed as a normal SQL column
projection, ORDER BY score, avg(field),

Related Skills

himalaya

339.5k

CLI to manage emails via IMAP/SMTP. Use `himalaya` to list, read, write, reply, forward, search, and organize emails from the terminal. Supports multiple accounts and message composition with MML (MIME Meta Language).

oracle

339.5k

Best practices for using the oracle CLI (prompt + file bundling, engines, sessions, and file attachment patterns).

prose

339.5k

OpenProse VM skill pack. Activate on any `prose` command, .prose files, or OpenProse mentions; orchestrates multi-agent workflows.

Command Development

83.9k

This skill should be used when the user asks to "create a slash command", "add a command", "write a custom command", "define command arguments", "use command frontmatter", "organize commands", "create command with file references", "interactive command", "use AskUserQuestion in command", or needs guidance on slash command structure, YAML frontmatter fields, dynamic arguments, bash execution in commands, user interaction patterns, or command development best practices for Claude Code.

RamakrishnaChilaka

View profile

View on GitHub

GitHub Stars6

CategoryData

Updated10h ago

Forks0

RamakrishnaChilaka/FerrisSearch

Languages

Rust

Security Score

75/100

Audited on Mar 28, 2026

No findings