Rememex

semantic search for your local files find by meaning, not keywords. 120+ file types, OCR, MCP server for AI agents. 100% private.

Generate Convert Improve

Install / Use

/learn @illegal-instruction-co/Rememex

About this skill

Quality Score

0/100

README

<img src="src/assets/rememex.png" width="300" /> <h1 align="center">Rememex</h1> <a href="https://github.com/illegal-instruction-co/rememex/releases"><img src="https://img.shields.io/github/v/release/illegal-instruction-co/rememex?style=flat-square" alt="GitHub release" /></a> <a href="LICENSE"><img src="https://img.shields.io/badge/license-MIT-blue?style=flat-square" alt="License: MIT" /></a> <a href="https://github.com/illegal-instruction-co/rememex/releases"><img src="https://img.shields.io/badge/platform-Windows%2010%2B-0078D4?style=flat-square&logo=windows" alt="Platform: Windows" /></a> <a href="https://github.com/illegal-instruction-co/rememex/stargazers"><img src="https://img.shields.io/github/stars/illegal-instruction-co/rememex?style=flat-square" alt="GitHub stars" /></a> <a href="https://github.com/illegal-instruction-co/rememex"><img src="https://img.shields.io/badge/price-free%20forever-brightgreen?style=flat-square" alt="Free" /></a> a semantic upgrade to your file system. you type meaning, it finds files. nothing leaves your machine. named after Vannevar Bush's <a href="https://en.wikipedia.org/wiki/Memex">Memex</a> (1945), a vision of a device that stores and retrieves all human knowledge.

windows 10+ only for now. uses UWP OCR and mica backdrop.

why rememex?

| | rememex | ripgrep | Everything | Sourcegraph | Microsoft Recall | |---|---|---|---|---|---| | search type | semantic + keyword hybrid | regex / literal text | filename (content via content:) | keyword + symbol + semantic | screenshots your entire life every 5 seconds | | understands meaning | ✅ | ❌ | ❌ | ✅ | ✅ (it saw everything. literally everything.) | | local & private | ✅ everything on your machine | ✅ | ✅ | cloud or self-hosted | "local" (pinky promise) | | file types | 120+ (code, docs, images, configs) | text files | all files (index by name) | code repos | your screen. all of it. always. | | image OCR | ✅ built-in | ❌ | ❌ | ❌ | ✅ (it OCRs your passwords too) | | EXIF / GPS | ✅ reverse geocodes to city names | ❌ | ❌ | ❌ | knows where you are anyway | | MCP server | ✅ built-in for AI agents | ❌ | ❌ | ? | no but copilot watches you type | | price | free, open source | free, open source | free | starts at $49/user/mo | free* (*costs your dignity) | | vibes | finds what you mean | finds what you type | finds filenames | enterprise™ | big brother as a feature |

what it does

indexes 120+ file types (code, docs, images, configs, whatever)
OCR on images via windows built-in engine
reads EXIF → reverse geocodes GPS to city names. search "photos from istanbul" and it works
EXIF dates → human words. "summer morning" finds a photo from july at 8am
hybrid search: vector + full-text + JINA cross-encoder reranker
smart chunking per language (rust at fn/struct, python at def/class, etc)
semantic containers for isolation (work/personal/research)
MCP server for AI agents. details → · agent instructions →
annotations: attach searchable notes to any file, from the UI or via MCP. agents and humans share the same knowledge layer
optional cloud embeddings -- plug in OpenAI, Gemini, Cohere, or any compatible API. default is still 100% local

architecture

indexing

graph LR
    W[file watcher] -->|change event| SI[index single file]
    WB[WalkBuilder] -->|bulk scan| B[collect files]
    B --> C{image?}
    C -->|yes| D[UWP OCR + EXIF]
    C -->|no| E[file_io reader]
    D --> F[git context]
    E --> F
    F --> G["semantic chunking (per-language)"]
    G --> H[embedding provider]
    H -->|local ONNX or remote API| I[(lancedb)]
    I --> J[ANN + FTS index build]

search

graph LR
    Q[query] --> QR[query router]
    QR -->|weights + hyde flag| HYDE{hyde?}
    HYDE -->|conceptual| LLM[LLM hypothetical doc]
    HYDE -->|other| EMB[embed query]
    LLM --> EMB
    Q --> EXP[expand query variants]
    EMB --> VS[vector search]
    EXP --> FTS[full-text search]
    VS --> HM["hybrid merge (RRF)"]
    FTS --> HM
    EMB --> AS[annotation search]
    AS --> AM[merge annotations]
    HM --> AM
    AM --> RR[JINA reranker]
    RR --> SC[score normalization]
    SC --> MMR[MMR diversity]
    MMR --> R[ranked results]

    UI[tauri UI] --> Q
    MCP[MCP server] -->|stdio| Q

run it

npm install
npm run tauri dev        # dev is slow
npm run tauri build      # release build, use this for real speed

Alt+Space to toggle. config & docs → CONFIG.md

RAM usage peaks during initial indexing — this is expected. once indexing completes, it drops and stays stable.

try it with real data

we ship a test dataset so you can see what semantic search actually feels like. 2,483 resume PDFs across 24 professions, accountants to teachers.

# unzip test-set/data.zip somewhere
# create a new container in rememex, point it at the unzipped folder
# wait for indexing (~30 min on local embeddings)

we indexed it and ran these queries. all results below used the most basic config — no cloud APIs, no fine-tuning:

| setting | value | |---------|-------| | embedding model | Multilingual-E5-Base (local ONNX, ~170MB) | | reranker | off | | chunk size | 512 tokens, 64 overlap | | query router | on | | MMR diversity | on (~65% balance) | | HyDE | off | | embedding provider | local — zero API calls |

real results, real scores:

| query | top result | score | why it's interesting | |-------|-----------|-------|---------------------| | "software engineer who knows Python and machine learning" | AGRICULTURE/62994611.pdf — Python, TensorFlow, Keras, Scikit-learn, Pandas | 55.2 | filed under AGRICULTURE. rememex found it anyway | | "nurse with emergency room experience" | ADVOCATE/46772262.pdf — Certified Emergency Nurse, Trauma Nurse Specialist | 58.2 | filed under ADVOCATE. wrong folder, right person | | "someone who can cook Italian and French cuisine" | CHEF/10276858.pdf — Italian cuisine, fine dining, ethnic foods preparation | 34.7 | query said "French" too — top result has Italian, #3 result has "French cuisine talent". it splits the match across candidates | | "MBA graduate with sales leadership" | DIGITAL-MEDIA/20330739.pdf — built $25MM sales teams, Exec Director of Sales | 55.5 | MBA + sales, found in DIGITAL-MEDIA folder. categories don't matter | | "graphic designer with Photoshop and Illustrator" | DESIGNER/29147100.pdf — Adobe Photoshop, Illustrator, InDesign, portfolio link | 66.1 | highest score. exact skill match + portfolio | | "civil engineer with AutoCAD and project management" | CONSTRUCTION/32025286.pdf — AutoCAD Civil 3D, cost analysis, full project admin | 59.4 | construction admin, not "civil engineer" by title. meaning > title |

the point: grep needs the exact keyword. rememex finds meaning — even when the words are different, even when the file is in the wrong folder.

agentic benchmark

same 5 tasks, same codebase. grep vs rememex MCP:

| task | grep | rememex | |------|------|---------| | "find where GPS coords become city names" | grep "GPS" → 0. grep "geocode" → found file, need to open. 3 steps | 1 step | | "find the quality filter threshold" | grep "threshold" → 0 (code says >= 25.0). failed | 1 step | | "find dedup logic for best chunk per file" | grep "dedup" → 0. grep "best" → noise. 3-5 steps | 1 step | | "find config migration handling" | grep "legacy" → wrong file. wrong answer | 1 step | | "find embedding batch size constant" | grep "batch_size" → 0 (it's EMBED_BATCH_SIZE). failed | 1 step |

grep needs the exact keyword. rememex needs the idea.

agents using rememex are expected to use 5-10x fewer tokens and complete tasks significantly faster. fewer search attempts, fewer wrong files opened, fewer round-trips. the benchmark above shows 1 step vs 3-5 , that's both speed and cost.

project structure

rememex/
├── src/                          # react/ts frontend
│   ├── components/               # UI components
│   │   ├── Sidebar.tsx           # sidebar: containers, annotations, filters
│   │   ├── SearchBar.tsx         # search input
│   │   ├── ResultsList.tsx       # virtualized search results
│   │   ├── StatusBar.tsx         # indexing status bar
│   │   ├── TitleBar.tsx          # custom window title bar
│   │   ├── Settings.tsx          # settings panel
│   │   └── settings/            # modular settings sub-panels
│   ├── locales/                  # i18n translations (en, tr)
│   ├── Modal.tsx                 # modal dialog component
│   ├── i18n.tsx                  # internationalization setup
│   ├── types.ts                  # shared TypeScript types
│   └── App.tsx                   # main app shell
├── src-tauri/
│   └── src/
│       ├── indexer/              # core engine
│       │   ├── mod.rs            # indexer orchestration, batch embed, reranker
│       │   ├── chunking.rs       # per-language semantic splitting
│       │   ├── embedding.rs      # fastembed ONNX inference
│       │   ├── embedding_provider.rs  # local/remote provider trait
│       │   ├── search.rs         # hybrid vector + full-text + reranker
│       │   ├── pipeline.rs       # search pipeline scoring
│       │   ├── annotations.rs    # annotation CRUD operations
│       │   ├── ocr.rs            # UWP OCR bridge
│       │   ├── file_io.rs        # file reading (text, pdf, binary)
│       │

Related Skills

himalaya

328.4k

CLI to manage emails via IMAP/SMTP. Use `himalaya` to list, read, write, reply, forward, search, and organize emails from the terminal. Supports multiple accounts and message composition with MML (MIME Meta Language).

node-connect

328.4k

Diagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps

frontend-design

80.9k

Create distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.

Hook Development

80.9k

This skill should be used when the user asks to "create a hook", "add a PreToolUse/PostToolUse/Stop hook", "validate tool use", "implement prompt-based hooks", "use ${CLAUDE_PLUGIN_ROOT}", "set up event-driven automation", "block dangerous commands", or mentions hook events (PreToolUse, PostToolUse, Stop, SubagentStop, SessionStart, SessionEnd, UserPromptSubmit, PreCompact, Notification). Provides comprehensive guidance for creating and implementing Claude Code plugin hooks with focus on advanced prompt-based hooks API.

illegal-instruction-co

View profile

View on GitHub

GitHub Stars56

CategoryDevelopment

Updated5d ago

Forks4

illegal-instruction-co/rememex

Languages

Rust

Security Score

85/100

Audited on Mar 16, 2026

No findings