SkillAgentSearch skills...

Rememex

semantic search for your local files find by meaning, not keywords. 120+ file types, OCR, MCP server for AI agents. 100% private.

Install / Use

/learn @illegal-instruction-co/Rememex

README

<p align="center"> <img src="src/assets/rememex.png" width="300" /> </p> <h1 align="center">Rememex</h1> <p align="center"> <a href="https://github.com/illegal-instruction-co/rememex/releases"><img src="https://img.shields.io/github/v/release/illegal-instruction-co/rememex?style=flat-square" alt="GitHub release" /></a> <a href="LICENSE"><img src="https://img.shields.io/badge/license-MIT-blue?style=flat-square" alt="License: MIT" /></a> <a href="https://github.com/illegal-instruction-co/rememex/releases"><img src="https://img.shields.io/badge/platform-Windows%2010%2B-0078D4?style=flat-square&logo=windows" alt="Platform: Windows" /></a> <a href="https://github.com/illegal-instruction-co/rememex/stargazers"><img src="https://img.shields.io/github/stars/illegal-instruction-co/rememex?style=flat-square" alt="GitHub stars" /></a> <a href="https://github.com/illegal-instruction-co/rememex"><img src="https://img.shields.io/badge/price-free%20forever-brightgreen?style=flat-square" alt="Free" /></a> </p> <p align="center"> a semantic upgrade to your file system. you type meaning, it finds files. nothing leaves your machine. </p> <p align="center"> <em>named after Vannevar Bush's <a href="https://en.wikipedia.org/wiki/Memex">Memex</a> (1945), a vision of a device that stores and retrieves all human knowledge.</em> </p>

windows 10+ only for now. uses UWP OCR and mica backdrop.

<p align="center"> <img src="assets/search.gif" width="700" /> </p>

why rememex?

| | rememex | ripgrep | Everything | Sourcegraph | Microsoft Recall | |---|---|---|---|---|---| | search type | semantic + keyword hybrid | regex / literal text | filename (content via content:) | keyword + symbol + semantic | screenshots your entire life every 5 seconds | | understands meaning | ✅ | ❌ | ❌ | ✅ | ✅ (it saw everything. literally everything.) | | local & private | ✅ everything on your machine | ✅ | ✅ | cloud or self-hosted | "local" (pinky promise) | | file types | 120+ (code, docs, images, configs) | text files | all files (index by name) | code repos | your screen. all of it. always. | | image OCR | ✅ built-in | ❌ | ❌ | ❌ | ✅ (it OCRs your passwords too) | | EXIF / GPS | ✅ reverse geocodes to city names | ❌ | ❌ | ❌ | knows where you are anyway | | MCP server | ✅ built-in for AI agents | ❌ | ❌ | ? | no but copilot watches you type | | price | free, open source | free, open source | free | starts at $49/user/mo | free* (*costs your dignity) | | vibes | finds what you mean | finds what you type | finds filenames | enterprise™ | big brother as a feature |


what it does

  • indexes 120+ file types (code, docs, images, configs, whatever)
  • OCR on images via windows built-in engine
  • reads EXIF → reverse geocodes GPS to city names. search "photos from istanbul" and it works
  • EXIF dates → human words. "summer morning" finds a photo from july at 8am
  • hybrid search: vector + full-text + JINA cross-encoder reranker
  • smart chunking per language (rust at fn/struct, python at def/class, etc)
  • semantic containers for isolation (work/personal/research)
  • MCP server for AI agents. details → · agent instructions →
  • annotations: attach searchable notes to any file, from the UI or via MCP. agents and humans share the same knowledge layer
  • optional cloud embeddings -- plug in OpenAI, Gemini, Cohere, or any compatible API. default is still 100% local

architecture

indexing

graph LR
    W[file watcher] -->|change event| SI[index single file]
    WB[WalkBuilder] -->|bulk scan| B[collect files]
    B --> C{image?}
    C -->|yes| D[UWP OCR + EXIF]
    C -->|no| E[file_io reader]
    D --> F[git context]
    E --> F
    F --> G["semantic chunking (per-language)"]
    G --> H[embedding provider]
    H -->|local ONNX or remote API| I[(lancedb)]
    I --> J[ANN + FTS index build]

search

graph LR
    Q[query] --> QR[query router]
    QR -->|weights + hyde flag| HYDE{hyde?}
    HYDE -->|conceptual| LLM[LLM hypothetical doc]
    HYDE -->|other| EMB[embed query]
    LLM --> EMB
    Q --> EXP[expand query variants]
    EMB --> VS[vector search]
    EXP --> FTS[full-text search]
    VS --> HM["hybrid merge (RRF)"]
    FTS --> HM
    EMB --> AS[annotation search]
    AS --> AM[merge annotations]
    HM --> AM
    AM --> RR[JINA reranker]
    RR --> SC[score normalization]
    SC --> MMR[MMR diversity]
    MMR --> R[ranked results]

    UI[tauri UI] --> Q
    MCP[MCP server] -->|stdio| Q

run it

npm install
npm run tauri dev        # dev is slow
npm run tauri build      # release build, use this for real speed

Alt+Space to toggle. config & docs → CONFIG.md

RAM usage peaks during initial indexing — this is expected. once indexing completes, it drops and stays stable.


try it with real data

we ship a test dataset so you can see what semantic search actually feels like. 2,483 resume PDFs across 24 professions, accountants to teachers.

# unzip test-set/data.zip somewhere
# create a new container in rememex, point it at the unzipped folder
# wait for indexing (~30 min on local embeddings)

we indexed it and ran these queries. all results below used the most basic config — no cloud APIs, no fine-tuning:

| setting | value | |---------|-------| | embedding model | Multilingual-E5-Base (local ONNX, ~170MB) | | reranker | off | | chunk size | 512 tokens, 64 overlap | | query router | on | | MMR diversity | on (~65% balance) | | HyDE | off | | embedding provider | local — zero API calls |

real results, real scores:

| query | top result | score | why it's interesting | |-------|-----------|-------|---------------------| | "software engineer who knows Python and machine learning" | AGRICULTURE/62994611.pdf — Python, TensorFlow, Keras, Scikit-learn, Pandas | 55.2 | filed under AGRICULTURE. rememex found it anyway | | "nurse with emergency room experience" | ADVOCATE/46772262.pdf — Certified Emergency Nurse, Trauma Nurse Specialist | 58.2 | filed under ADVOCATE. wrong folder, right person | | "someone who can cook Italian and French cuisine" | CHEF/10276858.pdf — Italian cuisine, fine dining, ethnic foods preparation | 34.7 | query said "French" too — top result has Italian, #3 result has "French cuisine talent". it splits the match across candidates | | "MBA graduate with sales leadership" | DIGITAL-MEDIA/20330739.pdf — built $25MM sales teams, Exec Director of Sales | 55.5 | MBA + sales, found in DIGITAL-MEDIA folder. categories don't matter | | "graphic designer with Photoshop and Illustrator" | DESIGNER/29147100.pdf — Adobe Photoshop, Illustrator, InDesign, portfolio link | 66.1 | highest score. exact skill match + portfolio | | "civil engineer with AutoCAD and project management" | CONSTRUCTION/32025286.pdf — AutoCAD Civil 3D, cost analysis, full project admin | 59.4 | construction admin, not "civil engineer" by title. meaning > title |

the point: grep needs the exact keyword. rememex finds meaning — even when the words are different, even when the file is in the wrong folder.


agentic benchmark

same 5 tasks, same codebase. grep vs rememex MCP:

| task | grep | rememex | |------|------|---------| | "find where GPS coords become city names" | grep "GPS" → 0. grep "geocode" → found file, need to open. 3 steps | 1 step | | "find the quality filter threshold" | grep "threshold" → 0 (code says >= 25.0). failed | 1 step | | "find dedup logic for best chunk per file" | grep "dedup" → 0. grep "best" → noise. 3-5 steps | 1 step | | "find config migration handling" | grep "legacy" → wrong file. wrong answer | 1 step | | "find embedding batch size constant" | grep "batch_size" → 0 (it's EMBED_BATCH_SIZE). failed | 1 step |

grep needs the exact keyword. rememex needs the idea.

agents using rememex are expected to use 5-10x fewer tokens and complete tasks significantly faster. fewer search attempts, fewer wrong files opened, fewer round-trips. the benchmark above shows 1 step vs 3-5 , that's both speed and cost.

<p align="center"> <img src="assets/mcp.gif" width="300" /> </p>

project structure

rememex/
├── src/                          # react/ts frontend
│   ├── components/               # UI components
│   │   ├── Sidebar.tsx           # sidebar: containers, annotations, filters
│   │   ├── SearchBar.tsx         # search input
│   │   ├── ResultsList.tsx       # virtualized search results
│   │   ├── StatusBar.tsx         # indexing status bar
│   │   ├── TitleBar.tsx          # custom window title bar
│   │   ├── Settings.tsx          # settings panel
│   │   └── settings/            # modular settings sub-panels
│   ├── locales/                  # i18n translations (en, tr)
│   ├── Modal.tsx                 # modal dialog component
│   ├── i18n.tsx                  # internationalization setup
│   ├── types.ts                  # shared TypeScript types
│   └── App.tsx                   # main app shell
├── src-tauri/
│   └── src/
│       ├── indexer/              # core engine
│       │   ├── mod.rs            # indexer orchestration, batch embed, reranker
│       │   ├── chunking.rs       # per-language semantic splitting
│       │   ├── embedding.rs      # fastembed ONNX inference
│       │   ├── embedding_provider.rs  # local/remote provider trait
│       │   ├── search.rs         # hybrid vector + full-text + reranker
│       │   ├── pipeline.rs       # search pipeline scoring
│       │   ├── annotations.rs    # annotation CRUD operations
│       │   ├── ocr.rs            # UWP OCR bridge
│       │   ├── file_io.rs        # file reading (text, pdf, binary)
│       │   

Related Skills

View on GitHub
GitHub Stars56
CategoryDevelopment
Updated5d ago
Forks4

Languages

Rust

Security Score

85/100

Audited on Mar 16, 2026

No findings