Momo - Self-Hostable AI Memory System

"You, my friend, are just a few plumbs short of a fruit pie." — Momo

Momo is a self-hostable AI memory system written in Rust — inspired by SuperMemory. It provides long-term memory for AI agents using LibSQL's native vector search capabilities — no external vector database required. Single binary, runs anywhere.

Quick Start
Features
Documentation
SDKs
Docker
Process Isolation
Development
Maintainers
Contributing
Credits
License

Quick Start

The fastest way to get Momo running is via Docker:

docker run --name momo -d --restart unless-stopped -p 3000:3000 -e MOMO_API_KEYS=dev-key -v momo-data:/data ghcr.io/momomemory/momo:latest

export API_KEY=dev-key

Then open:

Web console: http://localhost:3000/
API docs: http://localhost:3000/api/v1/docs
MCP endpoint: http://localhost:3000/mcp

Add a Memory

curl -X POST http://localhost:3000/api/v1/conversations:ingest \
  -H "Authorization: Bearer ${API_KEY}" \
  -H "Content-Type: application/json" \
  -d '{"messages": [{"role": "user", "content": "I prefer dark mode"}], "containerTag": "user_1"}'

Search Everything

curl -X POST http://localhost:3000/api/v1/search \
  -H "Authorization: Bearer ${API_KEY}" \
  -H "Content-Type: application/json" \
  -d '{"q": "What are the user preferences?", "containerTags": ["user_1"], "scope": "hybrid"}'

Use As MCP Server

Momo includes a built-in MCP server (streamable HTTP) compatible with Supermemory-style workflows.

{
  "mcpServers": {
    "momo": {
      "url": "http://localhost:3000/mcp",
      "headers": {
        "Authorization": "Bearer your_momo_api_key",
        "x-sm-project": "default"
      }
    }
  }
}

Provided MCP primitives:

Tools: memory, recall, listProjects, whoAmI
Resources: supermemory://profile, supermemory://projects
Prompt: context

For handshake details, manual curl testing, and troubleshooting, see MCP Guide.

Features

Vector Search: Native LibSQL vector embeddings (no external vector DB needed)
Embedded Web Console: Bun + Preact UI embedded in the binary and served from /
Local AI Pipeline: Built-in support for local embeddings (FastEmbed) and transcription (Whisper)
External Embeddings: Support for OpenAI, OpenRouter, Ollama, and LM Studio APIs
Universal Ingestion: Extract from URLs, PDFs, HTML, DOCX, Images (OCR), and Audio/Video
Memory Versioning: Track memory updates with parent/child relationships
Contradiction Management: Automatically detect and resolve conflicting memories
Knowledge Graph: Memory relationships, graph traversal, container-level graphs
User Profiling: Auto-generated user profiles from accumulated memories
Autonomous Synthesis: Optional background engine that derives new insights from existing data
Intelligent Decay: Relevance scoring that automatically prunes stale or irrelevant memories
AST-Aware Code Chunking: Tree-sitter based chunking for multiple programming languages
Multi-Tenant by Design: Scalable container-based isolation for multi-user applications
Reranking: Improved search relevance using cross-encoder models

Documentation

For detailed documentation, see the docs directory.

SDKs

| Language | Package | Status | |----------|---------|--------| | TypeScript | @momomemory/sdk | Stable | | Python | momomemory-sdk | Coming Soon | | Rust | momo-sdk | Coming Soon | | Go | momo-go | Coming Soon |

Docker

Use the published container image:

docker run --name momo -d --restart unless-stopped -p 3000:3000 -e MOMO_API_KEYS=dev-key -v momo-data:/data ghcr.io/momomemory/momo:latest

export API_KEY=dev-key

To follow logs:

docker logs -f momo

Process Isolation

Momo can run API and background ingestion/workers in separate processes.

Runtime Modes

MOMO_RUNTIME_MODE=all (default): supervisor mode; launches API and worker subprocesses
MOMO_RUNTIME_MODE=api: HTTP/API only (no background workers)
MOMO_RUNTIME_MODE=worker: Background workers only (no HTTP server)

To force legacy single-process behavior when MOMO_RUNTIME_MODE=all:

./momo --single-process
# or
MOMO_SINGLE_PROCESS=true ./momo

Example split deployment:

# API process
MOMO_RUNTIME_MODE=api MOMO_PORT=3000 ./momo

# Worker process
MOMO_RUNTIME_MODE=worker ./momo

Read/Write Split (LibSQL Replica Strategy)

Use a dedicated read backend while keeping writes on the primary DB:

DATABASE_URL=libsql://primary.turso.io
DATABASE_AUTH_TOKEN=primary-token

DATABASE_READ_URL=libsql://read-replica.turso.io
DATABASE_READ_AUTH_TOKEN=read-token
DATABASE_READ_LOCAL_PATH=local-read-replica.db
DATABASE_READ_SYNC_INTERVAL_SECS=2

Search reads use the read backend; writes continue through the primary backend.

Development

# From monorepo root: run backend + frontend with auto-reload
just dev

# Build frontend bundle (embedded in binary)
just build-frontend

# Build server (includes frontend bundle)
just build

# Run tests
cargo test

# Check for issues
cargo clippy

# Format code
cargo fmt

Note: when frontend assets are missing, Rust build uses momo/build.rs to run bun install and bun run build.

Maintainers

@watzon

Contributing

See CONTRIBUTING.md for details on how to get involved. PRs accepted.

Credits

Inspired by Supermemory
Named after Momo, Aang's loyal flying lemur companion 🦇
Built with LibSQL, FastEmbed, and Axum

Momo

Install / Use

README