MemSkill

MemSkill: Learning and Evolving Memory Skills for Self-Evolving Agents

Generate Convert Improve

Install / Use

/learn @ViktorAxelsen/MemSkill

About this skill

Quality Score

0/100

README

<div align="center"> <img src="assets/logo.png" alt="LLMRouter Logo" width="300"> </div> <h1 align="center">MemSkill: Learning and Evolving Memory Skills for Self-Evolving Agents</h1> <div align="center"> <p> <a href='https://viktoraxelsen.github.io/MemSkill/'><img src='https://img.shields.io/badge/Project-Page-00d9ff?style=for-the-badge&logo=github&logoColor=white'></a> <a href='https://arxiv.org/abs/2602.02474'><img src='https://img.shields.io/badge/arXiv-2602.02474-ff6b6b?style=for-the-badge&logo=arxiv&logoColor=white'></a> <a href="https://huggingface.co/papers/2602.02474"><img src="https://img.shields.io/badge/HuggingFace-Paper-FFD21E?style=for-the-badge&logo=huggingface&logoColor=FFD21E" alt="HuggingFace Paper"></a> <a href="https://huggingface.co/collections/XaiverZ/memskill"><img src="https://img.shields.io/badge/HuggingFace-Collection-FFD21E?style=for-the-badge&logo=huggingface&logoColor=FFD21E" alt="HuggingFace Collection"></a> <br> <a href="https://github.com/ViktorAxelsen/MemSkill/stargazers"><img src='https://img.shields.io/github/stars/ViktorAxelsen/MemSkill?color=f1e05a&style=for-the-badge&logo=star&logoColor=white' /></a> <a href="https://github.com/ViktorAxelsen/MemSkill/forks"><img src='https://img.shields.io/github/forks/ViktorAxelsen/MemSkill?color=2ea44f&style=for-the-badge&logo=git&logoColor=white' /></a> <a href="https://github.com/ViktorAxelsen/MemSkill/issues"><img src='https://img.shields.io/github/issues/ViktorAxelsen/MemSkill?color=d73a49&style=for-the-badge&logo=github&logoColor=white' /></a> <a href="https://deepwiki.com/ViktorAxelsen/MemSkill"><img src="https://img.shields.io/badge/DeepWiki-MemSkill-6B4FBB?style=for-the-badge&logo=readthedocs&logoColor=white" alt="DeepWiki"></a>   <a href="LICENSE"><img src="https://img.shields.io/badge/LICENSE-Apache-2EA44F?style=for-the-badge" alt="License"></a> </p> </div>

🧩 Overview

MemSkill is a framework for learning and evolving memory skills for long-horizon agents. It replaces static, hand-designed memory operations with a data-driven loop where skills are learned, refined, and reused from task feedback, enabling more adaptive memory construction across settings.

❗The skills evolved by MemSkill are NOT experiential/procedural memory/insights themselves. Rather, they are a form of meta-memory that focuses on what kinds of memory to extract, how to remember and where to focus, and what to preserve or forget. This is also why we call them memory skills: they capture the way or skill of remembering, rather than the remembered content itself.

Highlights

Skill-conditioned memory construction: Compose a small set of relevant skills for each span and construct memories in one pass.
Skill evolution from hard cases: Periodically mine challenging examples to refine existing skills and propose new ones.
Reusable skill bank: Maintain a shared, evolving skill bank that supports transfer across datasets and base models.
High-throughput evaluation: Multi-API-key round-robin for stable, parallel calls.
Scalable training and runs: Multi-threading and multi-processing for large-scale training and evaluation.

📰 News

🚀 [2026-03]: --locomo-train-query-sampling-ratio is now available for training-time stratified test-query sampling on LoCoMo. It significantly reduces evaluation cost during training by sampling LoCoMo test queries by category, while leaving the full-evaluation protocol unchanged for eval-only and formal testing. For more details, please refer to Commonly Used Configs.
🛠️ [2026-03]: We have added support for interrupted training recovery. You can now resume training in the train_*.sh scripts by passing --load-checkpoint, which restores key training state such as the controller/optimizer, operation bank, designer state (for example, the rolling failure-case pool), and other resume-critical metadata. At the moment, recovery is supported only from checkpoints saved at outer-epoch boundaries. By default, resumed runs continue logging to the original W&B run; if you prefer a fresh run for logging, use --resume-new-wandb-run instead. For more details, please refer to Commonly Used Configs.
🚀 [2026-03]: We have improved the parallel memory extraction pipeline for evaluation and cache building, making MemSkill noticeably faster in large-scale runs. We also added clearer controls for concurrency with --inference-workers at the sample level and --inference-session-workers within each sample at the chunk/span level, which together can significantly accelerate memory extraction. For more details, please refer to Commonly Used Configs.
⭐ [2026-03]: We have released the MemSkill controller weights in our Hugging Face collection, which can now be used directly for inference on suitable datasets. Please note that differences in experimental environments and settings may require some adaptation; when necessary, we recommend retraining and tuning key hyperparameters on a held-out validation split, especially chunk_size and the number of selected skills during inference (action_top_k), to ensure reliable performance. We hope these resources help advance self-evolving agent memory systems, and we'd be glad to hear from the community.
🔥 [2026-02]: We are honored to be featured in the 🤗 HuggingFace #3 Paper of the day
🚀 [2026-02]: MemSkill is officially released — a new paradigm for agent memory that learns reusable skills 🔁 and evolves them from data over time 🧠, improving memory quality and generalization across long, open-ended interactions ✨.

🚀 Get Started

Installation

# Clone the repository
git clone https://github.com/ViktorAxelsen/MemSkill
cd MemSkill

# Create and activate virtual environment
conda create -n memskill python=3.10
conda activate memskill

# vllm
pip install vllm==0.6.3
# PyTorch
pip install torch==2.6.0 torchvision==0.21.0 torchaudio==2.6.0 --index-url https://download.pytorch.org/whl/cu124
# Flash-Attn (or you can specify `--disable-flash-attn` in the .sh scripts to disable it)
pip install flash-attn --no-build-isolation
# Others
pip install -r requirements.txt

📊 Preparing Training Data

MemSkill builds training and evaluation data from the datasets below. Please download data from the official sources and place them under data/. Unless otherwise noted, splits are already configured in our codebase.

1) LoCoMo

Download LoCoMo from the official repo: LoCoMo
Splits: LoCoMo splits are already configured in main.py (no extra split file needed).
Put the downloaded files under:
- data/locomo10.json

2) LongMemEval

We use LongMemEval-S from: LongMemEval
Important: LongMemEval-S is used for transfer evaluation only. That is, skills trained on LoCoMo are directly evaluated on LongMemEval-S without additional training.
Put the downloaded files under:
- data/longmemeval_s_cleaned.json
Use our split file:
- data/longmemeval_s_splits.json (We use test split only)

3) HotpotQA

Download HotpotQA from: HotpotQA-Modified (Source: HotpotQA)
We evaluate on three test files:
- data/eval_50.json
- data/eval_100.json
- data/eval_200.json

These correspond to increasing context length, where each query context is constructed by concatenating 50 / 100 / 200 documents (following the long-context evaluation protocol we adopt in our experiments).

4) ALFWorld

Please follow the official instructions to install dependencies and download assets: ALFWorld

We use offline expert trajectories as the interaction corpus for memory construction. We provide a one-command script to collect and save trajectories:

# Collect expert trajectories for train / seen / unseen splits
python alfworld_replay.py --split train --output ./data/alfworld_train_offline.json
python alfworld_replay.py --split eval_in_distribution --output ./data/alfworld_expert_eval_in_distribution.json
python alfworld_replay.py --split eval_out_of_distribution --output ./data/alfworld_expert_eval_out_of_distribution.json

Note that:

We collect seen and unseen expert plans only to keep data formats consistent and make evaluation easier. They are not used for training.
The saved trajectories will be saved under data/ by default.

ALFWorld Training Data Preparation Workflow

We separate data into two batches with different roles.

Batch A: Offline expert trajectories (memory construction batch)
We first collect expert rollouts (the

Related Skills

YC-Killer

2.7k

A library of enterprise-grade AI agents designed to democratize artificial intelligence and provide free, open-source alternatives to overvalued Y Combinator startups. If you are excited about democratizing AI access & AI agents, please star ⭐️ this repository and use the link in the readme to join our open source AI research team.

groundhog

399

Groundhog's primary purpose is to teach people how Cursor and all these other coding agents work under the hood. If you understand how these coding assistants work from first principles, then you can drive these tools harder (or perhaps make your own!).

last30days-skill

18.8k

AI agent skill that researches any topic across Reddit, X, YouTube, HN, Polymarket, and the web - then synthesizes a grounded summary

sec-edgar-agentkit

AI agent toolkit for accessing and analyzing SEC EDGAR filing data. Build intelligent agents with LangChain, MCP-use, Gradio, Dify, and smolagents to analyze financial statements, insider trading, and company filings.

ViktorAxelsen

View profile

View on GitHub

GitHub Stars391

CategoryEducation

Updated38m ago

Forks23

ViktorAxelsen/MemSkill

Languages

Python

Security Score

95/100

Audited on Apr 6, 2026

No findings