AgentCPM
An End-to-End Infrastructure for Training and Evaluating Various LLM Agents
Install / Use
/learn @OpenBMB/AgentCPMREADME
Latest News
-
[2026-01-20] 🚀🚀🚀 We have open-sourced AgentCPM-Report, built on MiniCPM4.1-8B, which can rival top closed-source commercial systems for report generation such as Gemini-2.5-pro-DeepResearch.
-
[2026-01-12] 🚀🚀🚀 We have open-sourced AgentCPM-Explore—an agent LLM with only 4B parameters—along with all code for training, inference, and the tool sandbox environment. It successfully made it onto eight classic long-horizon and challenging agent leaderboards, including GAIA, HLE, and BrowseComp. Its SOTA performance at this scale enables longer action chains and more accurate Deep Research, breaking the performance barrier for on-device agents.
Table of Contents
Overview
AgentCPM is a series of open-source LLM agents jointly developed by THUNLP (Tsinghua NLP Lab), Renmin University of China, ModelBest, and the OpenBMB community. To address challenges faced by agents in real-world applications—such as limited long-horizon capability, autonomy, and generalization—we propose a series of model-building approaches. Recently, the team has focused on comprehensively building deep research capabilities for agents, releasing AgentCPM-Explore, a deep-search LLM agent, and AgentCPM-Report, a deep-research LLM agent.
Model List
| Model | Download Links | Open-Sourced Content | Technical Report | How to Use | |------------------|-----------------------------------------------------------------------------------------------------------------------------------------------|------------|-----------|-----------| | AgentCPM-Explore | 🤗 Hugging Face<br> 🤖 ModelScope | AgentDock: unified tool sandbox management & scheduling platform <br> AgentRL: fully asynchronous agent reinforcement learning framework <br> AgentToLeaP: one-click evaluation framework for agent tool-learning capability | AgentCPM-Explore: Realizing Long-Horizon Deep Exploration for Edge-Scale Agents | README.md | AgentCPM-Report | 🤗 Hugging Face<br> 🤖 ModelScope | UltraRAG: low-code RAG framework | AgentCPM-Report: Interleaving Drafting and Deepening for Open-Ended Deep Research | README.md
AgentCPM-Explore
The AgentCPM team has focused on systematically building agents’ deep research capabilities and released AgentCPM-Explore, a deep-search LLM agent. AgentCPM-Explore is the first open-source agent model with 4B parameters to appear on eight widely used long-horizon agent benchmarks such as GAIA, XBench, etc.
Key highlights:
-
SOTA at 4B Scale: Best-in-class among same-size models, matches or surpasses 8B models, rivals some 30B+ and closed-source LLMs.
-
Deep Exploration: 100+ turns of continuous interaction with multi-source cross-validation and dynamic strategy adjustment.
-
End-to-End Open Source: Complete training and evaluation infrastructure for community development and custom extensions.
Demo
Demo examples (speed up):
https://github.com/user-attachments/assets/f2b3bb20-ccd5-4b61-8022-9f6e90992baa
QuickStart
-
Multi-model, multi-tool collaborative environment setup: First, start the AgentDock tool sandbox platform to provide unified MCP (Model Context Protocol) tool services. When working with API-based models, configure the model’s
BASE_URLandAPI_KEY. When working with locally hosted models, ensure the model service is accessible. Configure the required tool parameters in theconfig.tomlfile. -
Launch the environment: Out of the box, one-click startup. The AgentDock unified tool sandbox platform supports launching all services with a single
docker compose up -dcommand, including the management dashboard, database, and tool nodes. -
Run execution: Quickly experience the core capabilities of the framework via the QuickStart script, allowing you to run a complete Agent task without complex configuration.
-
Prepare Evaluation Environment (Recommended):
We provide a Docker image with all evaluation dependencies pre-installed. It is recommended to pull the image and run it directly:# 1. Enter the project folder cd AgentCPM-Explore # 2. Pull the image (Supports amd64/arm64 architectures) docker pull yuyangfu/agenttoleap-eval:v2.0 # 3. Start the container (Adjust the -v path as needed) docker run -dit --name agenttoleap --gpus all --network host -v $(pwd):/workspace yuyangfu/agenttoleap-eval:v2.0 # 4. Enter the container docker exec -it agenttoleap /bin/bash cd /workspace -
Configure and run:
Openquickstart.pyand make simple configurations in the[USER CONFIGURATION]section:
- Custom task: Modify the
QUERYvariable to the instruction you want to test (e.g., “Check the results of last night’s UEFA Champions League matches”). - Model information: Provide your LLM
API_KEY,MODEL_NAME, andBASE_URL. - Tool service: Set
MANAGER_URLto the address of your MCP tool server (e.g.,http://localhost:8000; make sure the service is already running).
After configuration, run:
python quickstart.py
The script will automatically create a demo task (by default, querying today’s arXiv computer science papers), generate the execution workflow, and start the evaluation process.
- View Results
After execution completes, results will be saved under the outputs/quickstart_results/ directory. You can inspect dialog.json to obtain the full interaction trace, including tool calls and reasoning chains.
Note: In QuickStart mode, automatic scoring is skipped by default and is intended only to demonstrate the Agent’s execution capabilities.
AgentCPM-Report
Introduction
AgentCPM-Report is built on the 8B-parameter base model MiniCPM4.1. It takes user instructions as input and autonomously generates long-form reports. Highlights include:
- Extreme Performance, Minimal Footprint: Through an average of 40 rounds of deep retrieval and nearly 100 rounds of chain-of-thought reasoning, it achieves comprehensive information mining and restructuring, enabling edge-side models to produce logically rigorous, deeply insightful long-form articles with tens of thousands of words. With just 8 billion parameters, it delivers performance on par with top-tier closed-source systems in deep research tasks.
- Physical Isolation, Local Security: Specifically designed for high-privacy scenarios, it supports fully offline and agile local deployment, completely eliminating the risk of cloud data leaks. Leveraging our UltraRAG framework, it efficiently mounts and understands your local private knowledge base, securely transforming core confidential data into highly valuable professional decision-making reports without ever leaving its domain.
Demo
<div align="center"> <a href="https://www.youtube.com/watch?v=d5XWONt0PWo"><img src="https://img.youtube.com/vi/d5XWONt0PWo/0.jpg", width=70%></a> </div>QuickStart
Docker Deployment
<div align="center"> <a href="https://www.youtube.com/watch?v=ze8qJRrass4"><img src="https://img.youtube.com/vi/ze8qJRrass4/0.jpg", width=70%></a> </div>We provide a minimal one-click docker-compose deployment integrated into UltraRAG, which includes the RAG framework UltraRAG2.0, the model inference framework vllm, and the Milvus vector database. If you want CPU inference, we also provide a llama.cpp-based version for GGUF-format models—simply replace docker-compose.yml with docker-compose.cpu.yml.
git clone git@github.com:OpenBMB/UltraRAG.git
cd UltraRAG
git checkout agentcpm-report-demo
cd agentcpm-report-demo
cp env.example .env
docker-compose -f docker-compose.yml up -d --build
docker-compose -f docker-compose.yml logs -f ultrarag-ui
The first startup needs to pull images, download models, and set up the environment, which may take about 30 minutes.
Then open http://localhost:5050. If you can see the GUI, the deployment is successful.
Follow the UI instructions to upload local files, chunk them, and build the index. Then, in the Chat panel, select AgentCPM-Report in th
Related Skills
tmux
338.0kRemote-control tmux sessions for interactive CLIs by sending keystrokes and scraping pane output.
blogwatcher
338.0kMonitor blogs and RSS/Atom feeds for updates using the blogwatcher CLI.
Unla
2.1k🧩 MCP Gateway - A lightweight gateway service that instantly transforms existing MCP Servers and APIs into MCP servers with zero code changes. Features Docker deployment and management UI, requiring no infrastructure modifications.
github-trending
Multi-agent orchestration system for infrastructure monitoring, incident response, and load testing with autonomous AI agents
