Vectra: local, file‑backed vector database for Node.js

Overview

Vectra is a file‑backed, in‑memory vector database for Node.js. It works like a local Pinecone Qdrant: each index is just a folder on disk with an index.json file containing vectors and any metadata fields you choose to index; all other metadata is stored per‑item as separate JSON files. Queries use a Pinecone‑compatible subset of MongoDB‑style operators for filtering, then rank matches by cosine similarity. Because the entire index is loaded into memory, lookups are extremely fast (often <1 ms for small indexes, commonly 1–2 ms for larger local sets). It’s ideal when you want simple, zero‑infrastructure retrieval over a small, mostly static corpus. Pinecone‑style namespaces aren’t built‑in, but you can mimic them by using separate folders (indexes).

Typical use cases:

Prompt augmentation over a small, mostly static corpus
Infinite few‑shot example libraries
Single‑document or small multi‑document Q&A
Local/dev workflows where hosted vector DBs are overkill

Table of contents

Why Vectra
When to use (and when not)
Requirements
Install
Quick Start
- Path A: LocalIndex (items + metadata)
- Path B: LocalDocumentIndex (documents + chunking + retrieval)
CLI in 60 seconds
Core concepts
File-backed vs in-memory usage
Best practices
Performance and limits
Troubleshooting (quick)
Next steps
License
Project links

Why Vectra

Zero infrastructure: everything lives in a local folder; no servers, clusters, or managed services required.
Predictable local performance: full in‑memory scans with pre‑normalized cosine similarity deliver sub‑millisecond to low‑millisecond latency for small/medium corpora.
Simple mental model: one folder per index; index.json holds vectors and indexed fields, while non‑indexed metadata is stored as per‑item JSON.
Easy portability: because the format is file‑based and language‑agnostic, indexes can be written in one language and read in another.
Pinecone‑style filtering: use a familiar subset of MongoDB query operators to filter by metadata before similarity ranking.
Great for prompt engineering: quickly assemble and retrieve few‑shot examples or small static corpora without external dependencies.

When to use (and when not)

Use Vectra when:

You have a small, mostly static corpus (e.g., a few hundred to a few thousand chunks).
You want zero‑infrastructure local retrieval with fast, predictable latency.
You’re assembling “infinite few‑shot” example libraries or single/small document Q&A.
You need portable, file‑based indexes that other languages can read/write.
You want simple “namespaces” by using separate folders per dataset.

Avoid Vectra when:

You need long‑term, ever‑growing chat memory or very large corpora (the entire index loads into RAM).
You require multi‑tenant, networked, or horizontally scalable serving.
You need advanced vector DB features like HNSW/IVF indexing, sharding/replication, or distributed operations.

Notes and tips:

Mimic namespaces via separate index folders.
Index only the metadata fields you’ll filter on; keep everything else in per‑item JSON.
Rough sizing: a 1536‑dim float32 vector is ~6 KB, plus JSON/metadata overhead; size indexes accordingly to your RAM budget.

Requirements

Node.js 20.x or newer
A package manager (npm or yarn)
An embeddings provider for similarity search:
- OpenAI (API key + model, e.g., text-embedding-3-large or compatible)
- Azure OpenAI (endpoint, deployment name, API key)
- OpenAI‑compatible OSS endpoint (model name + base URL)
If you plan to ingest web pages via the CLI or API, outbound network access to those URLs
Sufficient RAM to hold your entire index in memory during queries (see “Performance and limits”)

Install

npm: npm install vectra
yarn: yarn add vectra

CLI usage

Run without installing globally: npx vectra --help
Optional global install: npm install -g vectra (then use vectra --help)

Quick Start

Two common paths:

Path A: you already have vectors (or can generate them) and want to store items + metadata.
Path B: you have raw text documents; Vectra will chunk, embed, and retrieve relevant spans.

Path A: LocalIndex (items + metadata)

Create a folder‑backed index
Choose which metadata fields to index (others are stored per‑item on disk)
Insert items (vector + metadata)
Query by vector with optional metadata filters

TypeScript example:

import path from 'node:path';
import { LocalIndex } from 'vectra';
import { OpenAI } from 'openai';

// 1) Create an index folder
const index = new LocalIndex(path.join(process.cwd(), 'my-index'));

// 2) Create the index (set which metadata fields you want searchable)
if (!(await index.isIndexCreated())) {
  await index.createIndex({
    version: 1,
    metadata_config: { indexed: ['category'] }, // only these fields live in index.json; others go to per-item JSON
  });
}

// 3) Prepare an embeddings helper (use any provider you like)
const openai = new OpenAI({ apiKey: process.env.OPENAI_API_KEY! });
async function getVector(text: string): Promise<number[]> {
  const resp = await openai.embeddings.create({
    model: 'text-embedding-3-small',
    input: text,
  });
  return resp.data[0].embedding;
}

// 4) Insert items
await index.insertItem({
  vector: await getVector('apple'),
  metadata: { text: 'apple', category: 'food', note: 'stored on disk if not indexed' },
});
await index.insertItem({
  vector: await getVector('blue'),
  metadata: { text: 'blue', category: 'color' },
});

// 5) Query by vector, optionally filter by metadata
async function query(text: string) {
  const v = await getVector(text);
  // Signature: queryItems(vector, queryString, topK, filter?)
  const results = await index.queryItems(v, '', 3, { category: { $eq: 'food' } });
  for (const r of results) {
    console.log(r.score.toFixed(4), r.item.metadata.text);
  }
}

await query('banana'); // should surface 'apple' in top results

Supported filter operators (subset): $and, $or, $eq, $ne, $gt, $gte, $lt, $lte, $in, $nin. Only fields listed in metadata_config.indexed are stored inline and should be used for filtering (everything else is kept per‑item on disk).

Path B: LocalDocumentIndex (documents + chunking + retrieval)

Create a document index backed by an embeddings model
Add documents (raw strings, files, or URLs)
Query by text; Vectra returns the most relevant chunks grouped by document
Render top sections for direct drop‑in to prompts
Optional hybrid retrieval: add BM25 keyword matches alongside semantic matches

TypeScript example:

import path from 'node:path';
import { LocalDocumentIndex, OpenAIEmbeddings } from 'vectra';

// 1) Configure embeddings (OpenAI, Azure OpenAI, or OpenAI‑compatible OSS)
const embeddings = new OpenAIEmbeddings({
  apiKey: process.env.OPENAI_API_KEY!,
  model: 'text-embedding-3-small',
  maxTokens: 8000, // batching limit for chunked requests
});

// 2) Create the index
const docs = new LocalDocumentIndex({
  folderPath: path.join(process.cwd(), 'my-doc-index'),
  embeddings,
  // optional: customize chunking
  // chunkingConfig: { chunkSize: 512, chunkOverlap: 0, keepSeparators: true }
});

if (!(await docs.isIndexCreated())) {
  await docs.createIndex({ version: 1 });
}

// 3) Add a document (string); you can also add files/URLs via FileFetcher/WebFetcher or the CLI
const uri = 'doc://welcome';
const text = `
Vectra is a file-backed, in-memory vector DB for Node.js. It supports Pinecone-like metadata filtering
and fast local retrieval. It’s ideal for small, mostly static corpora and prompt augmentation.
`;
await docs.upsertDocument(uri, text, 'md'); // optional docType hints chunking

// 4) Query and render sections for your prompt
const results = await docs.queryDocuments('What is Vectra best suited for?', {
  maxDocuments: 5,
  maxChunks: 20,
  // isBm25: true, // turn on hybrid (semantic + keyword) retrieval
});

// Take top document and render spans of text
if (results.length > 0) {
  const top = results[0];
  console.log('URI:', top.uri, 'score:', top.score.toFixed(4));
  const sections = await top.renderSections(2000, 1, true); // maxTokens per section, number of sections
  for (const s of sections) {
    console.log('Section score:', s.score.toFixed(4), 'tokens:', s.tokenCount, 'bm25:', s.isBm25);
    console.log(s.text);
  }
}

Notes:

queryDocuments returns LocalDocumentResult objects, each with scored chunks. renderSections merges adjacent chunks, keeps within your token budget, and can optionally add overlapping context for readability.
Hybrid retrieval: set isBm25: true in queryDocuments to include keyword‑based chunks (Okapi‑BM25) alongside semantic chunks. Each rendered section includes isBm25 to help you distinguish them.

CLI in 60 seconds

Three steps: create → add → query. No servers, just a folder.

Create an index folder

npx vectra create ./my-d

Vectra

Install / Use

README