Memento
Official Code of Memento: Fine-tuning LLM Agents without Fine-tuning LLMs
Install / Use
/learn @Memento-Teams/MementoREADME
Memento: Fine-tuning LLM Agents without Fine-tuning LLMs
<p align="center"> <b>Planner–Executor Architecture</b> • <b>Case-Based Reasoning</b> • <b>MCP Tooling</b> • <b>Memory-Augmented Learning</b> </p>A memory-based, continual-learning framework that helps LLM agents improve from experience without updating model weights.
<table> <tr> <td align="center" width="50%"> <img src="Figure/f1_val_test.jpg" width="90%"/> <br/> <sub><b>Memento vs. Baselines on GAIA validation and test sets.</b></sub> </td> <td align="center" width="50%"> <img src="Figure/f1_tasks.jpg" width="90%"/> <br/> <sub><b>Ablation study of Memento across benchmarks.</b></sub> </td> </tr> <tr> <td align="center" width="50%"> <img src="Figure/f1_iteration.jpg" width="90%"/> <br/> <sub><b>Continual learning curves across memory designs.</b></sub> </td> <td align="center" width="50%"> <img src="Figure/f1_ood.jpg" width="90%"/> <br/> <sub><b>Memento’s accuracy improvement on OOD datasets.</b></sub> </td> </tr> </table>
📰 News
- [2025.10.05] We’re excited to announce that our parametric Case-Based Reasoning inference code is now officially open-sourced! 🎉
- [2025.09.05] We’ve added support to deploy a local LLM as the executor using vLLM, please see client/agent_local_server.py. 🎉
- [2025.09.03] We’ve set up a WeChat group to make it easier to collaborate and exchange ideas on this project. Welcome to join the Group to share your thoughts, ask questions, or contribute your ideas! 🔥 🔥 🔥 Join our WeChat Group Now!
- [2025.08.30] We’re excited to announce that our no-parametric Case-Based Reasoning inference code is now officially open-sourced! 🎉
- [2025.08.28] We’ve created a Discord server to make discussions and collaboration around this project easier. Feel free to join and share your thoughts, ask questions, or contribute ideas! 🔥 🔥 🔥 Join our Discord!
- [2025.08.27] Thanks for your interest in our work! We’ll release our CBR code next week and our Parametric Memory code next month. We’ll keep updating on our further development.
- [2025.08.27] We add a new Crawler MCP in
server/ai_crawler.pyfor web crawling and query-aware content compression to reduce token cost. - [2025.08.26] We add the SerpAPI (https://serpapi.com/search-api) MCP tool to help you avoid using the search Docker and speed up development.
🔥 Key Features
- No LLM weight updates. Memento reframes continual learning as memory-based online reinforcement learning over a memory-augmented MDP. A neural case-selection policy guides actions; experiences are stored and reused via efficient Read/Write operations.
- Two-stage planner–executor loop. A CBR-driven Planner decomposes tasks and retrieves relevant cases; an Executor runs each subtask as an MCP client, orchestrating tools and writing back outcomes.
- Comprehensive tool ecosystem. Built-in support for web search, document processing, code execution, image/video analysis, and more through a unified MCP interface.
- Strong benchmark performance. Achieves competitive results across GAIA, DeepResearcher, SimpleQA, and HLE benchmarks.
🧠 Core Concept
Learn from experiences, not gradients. Memento logs successful & failed trajectories into a Case Bank and retrieves by value to steer planning and execution—enabling low-cost, transferable, and online continual learning.
🏗️ Architecture
Core Components
- Meta-Planner: Breaks down high-level queries into executable subtasks using GPT-4.1
- Executor: Executes individual subtasks using o3 or other models via MCP tools
- Case Memory: Stores final-step tuples (s_T, a_T, r_T) for experience replay
- MCP Tool Layer: Unified interface for external tools and services
Tool Ecosystem
- Web Research: Live search and controlled crawling via SearxNG
- Document Processing: Multi-format support (PDF, Office, images, audio, video)
- Code Execution: Sandboxed Python workspace with security controls
- Data Analysis: Excel processing, mathematical computations
- Media Analysis: Image captioning, video narration, audio transcription
🚀 Quick Start
Prerequisites
- Python 3.11+
- OpenAI API key (or compatible API endpoint)
- SearxNG instance for web search
- FFmpeg (system-level binary required for video processing)
- PyTorch 2.0+ with CUDA support (for Parametric Memory)
📖 For detailed installation instructions, see INSTALL.md
Installation
Method 1: Using uv (Recommended - Fast & Modern)
# Clone repository
git clone https://github.com/Agent-on-the-Fly/Memento
cd Memento
# Install uv if not already installed
curl -LsSf https://astral.sh/uv/install.sh | sh
# Sync dependencies and create virtual environment automatically
uv sync
# Activate the virtual environment
source .venv/bin/activate # On Windows: .venv\Scripts\activate
Method 2: Using pip with requirements.txt
# Clone repository
git clone https://github.com/Agent-on-the-Fly/Memento
cd Memento
# Create and activate virtual environment
python -m venv .venv
source .venv/bin/activate # On Windows: .venv\Scripts\activate
# Install dependencies
pip install -r requirements.txt
PyTorch Installation
For GPU support (Recommended for Parametric Memory):
# CUDA 11.8
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118
# CUDA 12.1
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121
# CPU only
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cpu
For more PyTorch installation options, visit: https://pytorch.org/get-started/locally/
System Dependencies Installation
FFmpeg Installation (Required)
FFmpeg is required for video processing functionality. The ffmpeg-python package in our dependencies requires a system-level FFmpeg binary.
Windows:
# Option 1: Using Conda (Recommended for isolated environment)
conda install -c conda-forge ffmpeg
# Option 2: Download from official website
# Visit https://ffmpeg.org/download.html and add to PATH
macOS:
# Using Homebrew
brew install ffmpeg
Linux:
# Debian/Ubuntu
sudo apt-get update && sudo apt-get install ffmpeg
Web Scraping & Search Setup
# Install and setup crawl4ai
crawl4ai-setup
crawl4ai-doctor
# Install playwright browsers
playwright install
Environment Variables Configuration
After creating the .env file, you need to configure the following API keys and service endpoints:
#===========================================
# OpenAI API Configuration
#===========================================
USE_AZURE_OPENAI=False
OPENAI_API_KEY=your_openai_api_key_here
OPENAI_BASE_URL=https://api.openai.com/v1 # or your custom endpoint
AZURE_OPENAI_API_KEY=your_azure_openai_api_key_here
AZURE_OPENAI_API_VERSION=your_azure_openai_api_version_here
AZURE_OPENAI_ENDPOINT=your_azure_openai_endpoint_here
#===========================================
# Tools & Services API
#===========================================
# Chunkr API (https://chunkr.ai/)
CHUNKR_API_KEY=your_chunkr_api_key_here
# Jina API
JINA_API_KEY=your_jina_api_key_here
# ASSEMBLYAI API
ASSEMBLYAI_API_KEY=your_assemblyai_api_key_here
Note: Replace your_*_api_key_here with your actual API keys. Some services are optional depending on which tools you plan to use.
SearxNG Setup
For web search capabilities, set up SearxNG: You can follow https://github.com/searxng/searxng-docker/ to set the docker and use our setting.
# In a new terminal
cd ./Memento/searxng-docker
docker compose up -d
Basic Usage
Interactive Mode
python client/agent.py
Parametric Memory Mode (Advanced - With Memory Retriever)
Parametric Memory enables the agent to learn from past experiences using a trained neural retriever model.
Step 1: Train the Memory Retriever
First, you need to train the retriever model with initial training data:
cd memory
# Train the retriever model
python train_memory_retriever.py \
--train training_data.jsonl \
--output_dir ./ckpts/retriever \
--use_plan \
--val_ratio 0.1 \
--batch_size 32 \
--lr 2e-5 \
--epochs 10 \
--save_best
Step 2: Configure Environment Variables
Add the following to your .env file:
# Memory Configuration
MEMORY_JSONL_PATH=../memory/memory.jsonl
TRAINING_DATA_PATH=../memory/training_data.jsonl
RETRIEVER_MODEL_PATH=../memory/ckpts/retriever/best.pt
MEMORY_TOP_K=8
MEMORY_MAX_POS_EXAMPLES=8
MEMORY_MAX_NEG_EXAMPLES=8
Step 3: Run Parametric Memory Agent
cd client
python parametric_memory.py
🔧 Configuration
Model Selection
- Planner Model: Defaults to
gpt-4.1for task decomposition - Executor Model: Defaults to
o3for task execution - Custom Models: Support for any OpenAI-compatible API
Tool Configuration
- Search: Configure SearxNG instance URL
- Code Execution: Customize import whitelist and security settings
- Document Processing: Set cache directories and processing limits
📊 Performance
Benchmark Results
- GAIA: 87.88% (Val, Pass@3 Top-1) and 79.40% (Test)
- DeepResearcher: 66.6% F1 / 80.4% PM, with +4.7–9.6 absolute gains on OOD datasets
- SimpleQA: 95.0%
- HLE: 24.4% PM (close to GPT-5 at 25.32%)
Key Insights
- Small, high-quality memory works best: Retrieval K=4 yields peak F1/PM
- Planning + CBR consistently improves performance
- Concise, structured planning outperforms verbose deliberation
🛠️ Development
Project Structure
Memento
Related Skills
node-connect
335.8kDiagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps
frontend-design
82.7kCreate distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.
openai-whisper-api
335.8kTranscribe audio via OpenAI Audio Transcriptions API (Whisper).
commit-push-pr
82.7kCommit, push, and open a PR
