Rememex
semantic search for your local files find by meaning, not keywords. 120+ file types, OCR, MCP server for AI agents. 100% private.
Install / Use
/learn @illegal-instruction-co/RememexQuality Score
Category
Development & EngineeringSupported Platforms
README
windows 10+ only for now. uses UWP OCR and mica backdrop.
<p align="center"> <img src="assets/search.gif" width="700" /> </p>why rememex?
| | rememex | ripgrep | Everything | Sourcegraph | Microsoft Recall |
|---|---|---|---|---|---|
| search type | semantic + keyword hybrid | regex / literal text | filename (content via content:) | keyword + symbol + semantic | screenshots your entire life every 5 seconds |
| understands meaning | ✅ | ❌ | ❌ | ✅ | ✅ (it saw everything. literally everything.) |
| local & private | ✅ everything on your machine | ✅ | ✅ | cloud or self-hosted | "local" (pinky promise) |
| file types | 120+ (code, docs, images, configs) | text files | all files (index by name) | code repos | your screen. all of it. always. |
| image OCR | ✅ built-in | ❌ | ❌ | ❌ | ✅ (it OCRs your passwords too) |
| EXIF / GPS | ✅ reverse geocodes to city names | ❌ | ❌ | ❌ | knows where you are anyway |
| MCP server | ✅ built-in for AI agents | ❌ | ❌ | ? | no but copilot watches you type |
| price | free, open source | free, open source | free | starts at $49/user/mo | free* (*costs your dignity) |
| vibes | finds what you mean | finds what you type | finds filenames | enterprise™ | big brother as a feature |
what it does
- indexes 120+ file types (code, docs, images, configs, whatever)
- OCR on images via windows built-in engine
- reads EXIF → reverse geocodes GPS to city names. search "photos from istanbul" and it works
- EXIF dates → human words. "summer morning" finds a photo from july at 8am
- hybrid search: vector + full-text + JINA cross-encoder reranker
- smart chunking per language (rust at
fn/struct, python atdef/class, etc) - semantic containers for isolation (work/personal/research)
- MCP server for AI agents. details → · agent instructions →
- annotations: attach searchable notes to any file, from the UI or via MCP. agents and humans share the same knowledge layer
- optional cloud embeddings -- plug in OpenAI, Gemini, Cohere, or any compatible API. default is still 100% local
architecture
indexing
graph LR
W[file watcher] -->|change event| SI[index single file]
WB[WalkBuilder] -->|bulk scan| B[collect files]
B --> C{image?}
C -->|yes| D[UWP OCR + EXIF]
C -->|no| E[file_io reader]
D --> F[git context]
E --> F
F --> G["semantic chunking (per-language)"]
G --> H[embedding provider]
H -->|local ONNX or remote API| I[(lancedb)]
I --> J[ANN + FTS index build]
search
graph LR
Q[query] --> QR[query router]
QR -->|weights + hyde flag| HYDE{hyde?}
HYDE -->|conceptual| LLM[LLM hypothetical doc]
HYDE -->|other| EMB[embed query]
LLM --> EMB
Q --> EXP[expand query variants]
EMB --> VS[vector search]
EXP --> FTS[full-text search]
VS --> HM["hybrid merge (RRF)"]
FTS --> HM
EMB --> AS[annotation search]
AS --> AM[merge annotations]
HM --> AM
AM --> RR[JINA reranker]
RR --> SC[score normalization]
SC --> MMR[MMR diversity]
MMR --> R[ranked results]
UI[tauri UI] --> Q
MCP[MCP server] -->|stdio| Q
run it
npm install
npm run tauri dev # dev is slow
npm run tauri build # release build, use this for real speed
Alt+Space to toggle. config & docs → CONFIG.md
RAM usage peaks during initial indexing — this is expected. once indexing completes, it drops and stays stable.
try it with real data
we ship a test dataset so you can see what semantic search actually feels like. 2,483 resume PDFs across 24 professions, accountants to teachers.
# unzip test-set/data.zip somewhere
# create a new container in rememex, point it at the unzipped folder
# wait for indexing (~30 min on local embeddings)
we indexed it and ran these queries. all results below used the most basic config — no cloud APIs, no fine-tuning:
| setting | value | |---------|-------| | embedding model | Multilingual-E5-Base (local ONNX, ~170MB) | | reranker | off | | chunk size | 512 tokens, 64 overlap | | query router | on | | MMR diversity | on (~65% balance) | | HyDE | off | | embedding provider | local — zero API calls |
real results, real scores:
| query | top result | score | why it's interesting |
|-------|-----------|-------|---------------------|
| "software engineer who knows Python and machine learning" | AGRICULTURE/62994611.pdf — Python, TensorFlow, Keras, Scikit-learn, Pandas | 55.2 | filed under AGRICULTURE. rememex found it anyway |
| "nurse with emergency room experience" | ADVOCATE/46772262.pdf — Certified Emergency Nurse, Trauma Nurse Specialist | 58.2 | filed under ADVOCATE. wrong folder, right person |
| "someone who can cook Italian and French cuisine" | CHEF/10276858.pdf — Italian cuisine, fine dining, ethnic foods preparation | 34.7 | query said "French" too — top result has Italian, #3 result has "French cuisine talent". it splits the match across candidates |
| "MBA graduate with sales leadership" | DIGITAL-MEDIA/20330739.pdf — built $25MM sales teams, Exec Director of Sales | 55.5 | MBA + sales, found in DIGITAL-MEDIA folder. categories don't matter |
| "graphic designer with Photoshop and Illustrator" | DESIGNER/29147100.pdf — Adobe Photoshop, Illustrator, InDesign, portfolio link | 66.1 | highest score. exact skill match + portfolio |
| "civil engineer with AutoCAD and project management" | CONSTRUCTION/32025286.pdf — AutoCAD Civil 3D, cost analysis, full project admin | 59.4 | construction admin, not "civil engineer" by title. meaning > title |
the point: grep needs the exact keyword. rememex finds meaning — even when the words are different, even when the file is in the wrong folder.
agentic benchmark
same 5 tasks, same codebase. grep vs rememex MCP:
| task | grep | rememex |
|------|------|---------|
| "find where GPS coords become city names" | grep "GPS" → 0. grep "geocode" → found file, need to open. 3 steps | 1 step |
| "find the quality filter threshold" | grep "threshold" → 0 (code says >= 25.0). failed | 1 step |
| "find dedup logic for best chunk per file" | grep "dedup" → 0. grep "best" → noise. 3-5 steps | 1 step |
| "find config migration handling" | grep "legacy" → wrong file. wrong answer | 1 step |
| "find embedding batch size constant" | grep "batch_size" → 0 (it's EMBED_BATCH_SIZE). failed | 1 step |
grep needs the exact keyword. rememex needs the idea.
agents using rememex are expected to use 5-10x fewer tokens and complete tasks significantly faster. fewer search attempts, fewer wrong files opened, fewer round-trips. the benchmark above shows 1 step vs 3-5 , that's both speed and cost.
<p align="center"> <img src="assets/mcp.gif" width="300" /> </p>project structure
rememex/
├── src/ # react/ts frontend
│ ├── components/ # UI components
│ │ ├── Sidebar.tsx # sidebar: containers, annotations, filters
│ │ ├── SearchBar.tsx # search input
│ │ ├── ResultsList.tsx # virtualized search results
│ │ ├── StatusBar.tsx # indexing status bar
│ │ ├── TitleBar.tsx # custom window title bar
│ │ ├── Settings.tsx # settings panel
│ │ └── settings/ # modular settings sub-panels
│ ├── locales/ # i18n translations (en, tr)
│ ├── Modal.tsx # modal dialog component
│ ├── i18n.tsx # internationalization setup
│ ├── types.ts # shared TypeScript types
│ └── App.tsx # main app shell
├── src-tauri/
│ └── src/
│ ├── indexer/ # core engine
│ │ ├── mod.rs # indexer orchestration, batch embed, reranker
│ │ ├── chunking.rs # per-language semantic splitting
│ │ ├── embedding.rs # fastembed ONNX inference
│ │ ├── embedding_provider.rs # local/remote provider trait
│ │ ├── search.rs # hybrid vector + full-text + reranker
│ │ ├── pipeline.rs # search pipeline scoring
│ │ ├── annotations.rs # annotation CRUD operations
│ │ ├── ocr.rs # UWP OCR bridge
│ │ ├── file_io.rs # file reading (text, pdf, binary)
│ │
Related Skills
himalaya
328.4kCLI to manage emails via IMAP/SMTP. Use `himalaya` to list, read, write, reply, forward, search, and organize emails from the terminal. Supports multiple accounts and message composition with MML (MIME Meta Language).
node-connect
328.4kDiagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps
frontend-design
80.9kCreate distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.
Hook Development
80.9kThis skill should be used when the user asks to "create a hook", "add a PreToolUse/PostToolUse/Stop hook", "validate tool use", "implement prompt-based hooks", "use ${CLAUDE_PLUGIN_ROOT}", "set up event-driven automation", "block dangerous commands", or mentions hook events (PreToolUse, PostToolUse, Stop, SubagentStop, SessionStart, SessionEnd, UserPromptSubmit, PreCompact, Notification). Provides comprehensive guidance for creating and implementing Claude Code plugin hooks with focus on advanced prompt-based hooks API.
