Memento: Fine-tuning LLM Agents without Fine-tuning LLMs

A memory-based, continual-learning framework that helps LLM agents improve from experience without updating model weights.

Planner–Executor Architecture • Case-Based Reasoning • MCP Tooling • Memory-Augmented Learning

<table> <tr> <td align="center" width="50%"> <img src="Figure/f1_val_test.jpg" width="90%"/> Memento vs. Baselines on GAIA validation and test sets. </td> <td align="center" width="50%"> <img src="Figure/f1_tasks.jpg" width="90%"/> Ablation study of Memento across benchmarks. </td> </tr> <tr> <td align="center" width="50%"> <img src="Figure/f1_iteration.jpg" width="90%"/> Continual learning curves across memory designs. </td> <td align="center" width="50%"> <img src="Figure/f1_ood.jpg" width="90%"/> Memento’s accuracy improvement on OOD datasets. </td> </tr> </table>

📰 News

[2025.10.05] We’re excited to announce that our parametric Case-Based Reasoning inference code is now officially open-sourced! 🎉
[2025.09.05] We’ve added support to deploy a local LLM as the executor using vLLM, please see client/agent_local_server.py. 🎉
[2025.09.03] We’ve set up a WeChat group to make it easier to collaborate and exchange ideas on this project. Welcome to join the Group to share your thoughts, ask questions, or contribute your ideas! 🔥 🔥 🔥 Join our WeChat Group Now!
[2025.08.30] We’re excited to announce that our no-parametric Case-Based Reasoning inference code is now officially open-sourced! 🎉
[2025.08.28] We’ve created a Discord server to make discussions and collaboration around this project easier. Feel free to join and share your thoughts, ask questions, or contribute ideas! 🔥 🔥 🔥 Join our Discord!
[2025.08.27] Thanks for your interest in our work! We’ll release our CBR code next week and our Parametric Memory code next month. We’ll keep updating on our further development.
[2025.08.27] We add a new Crawler MCP in server/ai_crawler.py for web crawling and query-aware content compression to reduce token cost.
[2025.08.26] We add the SerpAPI (https://serpapi.com/search-api) MCP tool to help you avoid using the search Docker and speed up development.

🔥 Key Features

No LLM weight updates. Memento reframes continual learning as memory-based online reinforcement learning over a memory-augmented MDP. A neural case-selection policy guides actions; experiences are stored and reused via efficient Read/Write operations.
Two-stage planner–executor loop. A CBR-driven Planner decomposes tasks and retrieves relevant cases; an Executor runs each subtask as an MCP client, orchestrating tools and writing back outcomes.
Comprehensive tool ecosystem. Built-in support for web search, document processing, code execution, image/video analysis, and more through a unified MCP interface.
Strong benchmark performance. Achieves competitive results across GAIA, DeepResearcher, SimpleQA, and HLE benchmarks.

🧠 Core Concept

Learn from experiences, not gradients. Memento logs successful & failed trajectories into a Case Bank and retrieves by value to steer planning and execution—enabling low-cost, transferable, and online continual learning.

🏗️ Architecture

Core Components

Meta-Planner: Breaks down high-level queries into executable subtasks using GPT-4.1
Executor: Executes individual subtasks using o3 or other models via MCP tools
Case Memory: Stores final-step tuples (s_T, a_T, r_T) for experience replay
MCP Tool Layer: Unified interface for external tools and services

Tool Ecosystem

Web Research: Live search and controlled crawling via SearxNG
Document Processing: Multi-format support (PDF, Office, images, audio, video)
Code Execution: Sandboxed Python workspace with security controls
Data Analysis: Excel processing, mathematical computations
Media Analysis: Image captioning, video narration, audio transcription

🚀 Quick Start

Prerequisites

Python 3.11+
OpenAI API key (or compatible API endpoint)
SearxNG instance for web search
FFmpeg (system-level binary required for video processing)
PyTorch 2.0+ with CUDA support (for Parametric Memory)

📖 For detailed installation instructions, see INSTALL.md

Installation

Method 1: Using uv (Recommended - Fast & Modern)

# Clone repository
git clone https://github.com/Agent-on-the-Fly/Memento
cd Memento

# Install uv if not already installed
curl -LsSf https://astral.sh/uv/install.sh | sh

# Sync dependencies and create virtual environment automatically
uv sync

# Activate the virtual environment
source .venv/bin/activate  # On Windows: .venv\Scripts\activate

Method 2: Using pip with requirements.txt

# Clone repository
git clone https://github.com/Agent-on-the-Fly/Memento
cd Memento

# Create and activate virtual environment
python -m venv .venv
source .venv/bin/activate  # On Windows: .venv\Scripts\activate

# Install dependencies
pip install -r requirements.txt

PyTorch Installation

For GPU support (Recommended for Parametric Memory):

# CUDA 11.8
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118

# CUDA 12.1
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121

# CPU only
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cpu

For more PyTorch installation options, visit: https://pytorch.org/get-started/locally/

System Dependencies Installation

FFmpeg Installation (Required)

FFmpeg is required for video processing functionality. The ffmpeg-python package in our dependencies requires a system-level FFmpeg binary.

Windows:

# Option 1: Using Conda (Recommended for isolated environment)
conda install -c conda-forge ffmpeg

# Option 2: Download from official website
# Visit https://ffmpeg.org/download.html and add to PATH

macOS:

# Using Homebrew
brew install ffmpeg

Linux:

# Debian/Ubuntu
sudo apt-get update && sudo apt-get install ffmpeg

Web Scraping & Search Setup

# Install and setup crawl4ai
crawl4ai-setup
crawl4ai-doctor

# Install playwright browsers
playwright install

Environment Variables Configuration

After creating the .env file, you need to configure the following API keys and service endpoints:

#===========================================
# OpenAI API Configuration
#===========================================
USE_AZURE_OPENAI=False

OPENAI_API_KEY=your_openai_api_key_here
OPENAI_BASE_URL=https://api.openai.com/v1  # or your custom endpoint

AZURE_OPENAI_API_KEY=your_azure_openai_api_key_here
AZURE_OPENAI_API_VERSION=your_azure_openai_api_version_here
AZURE_OPENAI_ENDPOINT=your_azure_openai_endpoint_here

#===========================================
# Tools & Services API
#===========================================
# Chunkr API (https://chunkr.ai/)
CHUNKR_API_KEY=your_chunkr_api_key_here

# Jina API
JINA_API_KEY=your_jina_api_key_here

# ASSEMBLYAI API
ASSEMBLYAI_API_KEY=your_assemblyai_api_key_here

Note: Replace your_*_api_key_here with your actual API keys. Some services are optional depending on which tools you plan to use.

SearxNG Setup

For web search capabilities, set up SearxNG: You can follow https://github.com/searxng/searxng-docker/ to set the docker and use our setting.

# In a new terminal
cd ./Memento/searxng-docker
docker compose up -d

Basic Usage

Interactive Mode

python client/agent.py

Parametric Memory Mode (Advanced - With Memory Retriever)

Parametric Memory enables the agent to learn from past experiences using a trained neural retriever model.

Step 1: Train the Memory Retriever

First, you need to train the retriever model with initial training data:

cd memory

# Train the retriever model
python train_memory_retriever.py \
  --train training_data.jsonl \
  --output_dir ./ckpts/retriever \
  --use_plan \
  --val_ratio 0.1 \
  --batch_size 32 \
  --lr 2e-5 \
  --epochs 10 \
  --save_best

Step 2: Configure Environment Variables

Add the following to your .env file:

# Memory Configuration
MEMORY_JSONL_PATH=../memory/memory.jsonl
TRAINING_DATA_PATH=../memory/training_data.jsonl
RETRIEVER_MODEL_PATH=../memory/ckpts/retriever/best.pt
MEMORY_TOP_K=8
MEMORY_MAX_POS_EXAMPLES=8
MEMORY_MAX_NEG_EXAMPLES=8

Step 3: Run Parametric Memory Agent

cd client

python parametric_memory.py

🔧 Configuration

Model Selection

Planner Model: Defaults to gpt-4.1 for task decomposition
Executor Model: Defaults to o3 for task execution
Custom Models: Support for any OpenAI-compatible API

Tool Configuration

Search: Configure SearxNG instance URL
Code Execution: Customize import whitelist and security settings
Document Processing: Set cache directories and processing limits

📊 Performance

Benchmark Results

GAIA: 87.88% (Val, Pass@3 Top-1) and 79.40% (Test)
DeepResearcher: 66.6% F1 / 80.4% PM, with +4.7–9.6 absolute gains on OOD datasets
SimpleQA: 95.0%
HLE: 24.4% PM (close to GPT-5 at 25.32%)

Key Insights

Small, high-quality memory works best: Retrieval K=4 yields peak F1/PM
Planning + CBR consistently improves performance
Concise, structured planning outperforms verbose deliberation

🛠️ Development

Project Structure

Memento

Memento

Install / Use

README