RoboBrain2.5
RoboBrain 2.5: Advanced version of RoboBrain. Depth in Sight, Time in Mind. 🎉🎉🎉
Install / Use
/learn @FlagOpen/RoboBrain2.5README
💬 If you have any questions, feel free to contact us via WeChat or RedNote.
<div align="center"> <img src="./assets/wechat.png", width=750 /> </div>🔥 Overview
RoboBrain-2.5 is a next-generation embodied AI foundation model that advances general perception, spatial reasoning, and temporal modeling through extensive training on high-quality spatiotemporal supervision. Building upon its predecessor, RoboBrain 2.5 introduces two major capability upgrades. Specifically, it unlocks Precise 3D Spatial Reasoning by shifting from 2D pixel-relative grounding to depth-aware coordinate prediction and absolute metric constraint comprehension, generating complete 3D manipulation traces as ordered keypoint sequences under physical constraints. Complementing this spatial precision, the model establishes Dense Temporal Value Estimation that provides dense, step-aware progress prediction and execution state understanding across varying viewpoints, producing stable feedback signals for downstream learning. Together, these upgrades extend the framework toward more physically grounded and execution-aware embodied intelligence for complex, fine-grained manipulation.
<div align="center"> <img src="./assets/teasor.png" /> </div>RoboBrain 2.0, the previously most powerful open-source embodied brain model. Compared to its predecessor, RoboBrain 1.0, our latest version are designed to unify perception, reasoning, and planning for complex embodied tasks in physical environments. It comes in two variants: a lightweight 7B model and a full-scale 32B model, featuring a heterogeneous architecture with a vision encoder and a language model. Despite its compact size, RoboBrain 2.0 achieves strong performance across a wide spectrum of embodied reasoning tasks. On both spatial and temporal benchmarks, the 32B variant achieves leading results in most cases, surpassing prior open-source and proprietary models. In particular, it supports key real-world embodied intelligence capabilities, including spatial understanding (e.g., affordance prediction, spatial referring, trajectory forecasting) and temporal decision-making (e.g., closed-loop interaction, multi-agent long-horizon planning, and real-time scene memory). This report details the model architecture, data construction, multi-stage training strategies, infrastructure and practical applications.
<div align="center"> <img src="./assets/results.png" /> </div>🗞️ News
2026-03-01: 🤗 RoboBrain 2.5-4B model checkpoint has been released in Huggingface.2026-01-09: 🤗 We released the RoboBrain 2.5-8B checkpoints on Hugging Face: RoboBrain 2.5-8B-NV and RoboBrain 2.5-8B-MT. The two variants share the same architecture and training data with similar performance, but were trained on different clusters: NV on NVIDIA GPU cluster, and MT on Moore-Threads GPU cluster.2025-12-30: 🔥 We released Robo-Dopamine, a deep research on Dense Temperal Value Estimation Capability for RoboBrain 2.5.2025-12-16: 🔥 We released RoboTracer, a deep research on Native 3D Spatial Reasoning for RoboBrain 2.5.2025-09-29: 🤖 We released a unified cross-embodiment VLA model RoboBrain-X0-Preview based on RoboBrain 2.0 (3B version) on CoRL 2025.2025-09-18: 🔥 Reason-RFT (Core Post-Training Strategy for RoboBrain2.0) gets accepted to NeurIPS 2025.2025-07-23: 🤗 RoboBrain 2.0-3B model checkpoint has been also released in Huggingface.2025-07-03: 🤗 RoboBrain 2.0-32B model checkpoint has been released in Huggingface.2025-06-11: 💡 We optimized the inference pipeline for multi-task applications in RoboBrain 2.0. Please refer to Simple Inference for quick usage (general & embodied).2025-06-07: 🎉 We highlight the training framework (FlagScale) developed by BAAI Framework R&D team, and the evaluation framework (FlagEvalMM) by BAAI FlagEval team. Both are used for RoboBrain 2.0.2025-06-06: 🤗 RoboBrain 2.0-7B model checkpoint has been released in Huggingface.2025-06-06: 🔥 We're excited to announce the release of our more powerful RoboBrain 2.0.2025-04-11: 🎉 RoboBrain 1.0 was selected for CVPR 2025's official Embodied AI Trends Commentary.2025-02-27: 🔥 RoboBrain 1.0 was accepted to CVPR 2025.
📆 Todo
- [x] Release model checkpoint for RoboBrain 2.0-3B
- [x] Release model checkpoint for RoboBrain 2.0-7B
- [x] Release model checkpoint for RoboBrain 2.0-32B
- [x] Release quick inference example for RoboBrain 2.0
- [x] Release training and evaluation codes for RoboBrain 2.0
- [x] Release model checkpoint for RoboBrain 2.5-4B
- [x] Release model checkpoint for RoboBrain 2.5-8B
- [ ] Release model checkpoint for RoboBrain 2.5-32B
🚀 Key Highlights
1. Comprehensive Upgrade in ✨ Percise 3D Spatial Reasoning ✨
Compared to version 2.0, RoboBrain-2.5 achieves a leap in spatial perception and reasoning capabilities:
- From 2D to 3D: Upgraded from predicting coordinate points on 2D images to predicting coordinate points with depth information in 3D space (3D Spatial Referring).
- Relative to Absolute: Evolved from understanding relative spatial relationships to measuring absolute 3D spatial metric information (3D Spatial Measuring). The model can comprehend precise physical constraint instructions (e.g., "hovering 1-5 cm above").
- Point to Trace: Advanced from predicting a single target point for pick-and-place to predicting a series of key points that describe the complete manipulation process (3D Spatial Trace), naturally possessing spatial planning capabilities with 3D absolute metrics.
2. Breakthrough in ✨ Dense Temporal Value Estimation ✨
RoboBrain-2.5 makes significant progress in temporal modeling by constructing a General Reward Model (GRM):
- Dense Progress Prediction: Capable of multi-granularity task progress prediction across different tasks, viewpoints, and embodiments.
- Execution State Estimation: Understands task goals and estimates various states during execution (e.g., success, failure, error occurrence).
- Empowering VLA Reinforcement Learning: Provides real-time, dense feedback signals and rewards for VLA (Vision-Language-Action) reinforcement learning. With only one demonstration, it achieves a task success rate of 95%+ in complex, fine-grained manipulations.
3. More Powerful Core Capabilities from previous version 2.0
RoboBrain 2.5 also maintains the three core capabilities of version 2.0, which supports interactive reasoning with long-horizon planning and closed-loop feedback, spatial perception for precise point and bbox prediction from complex instructions, temporal perception for future trajectory estimation, and scene reasoning through real-time structured memory construction and update.
<div align="center"> <img src="./assets/visualization.png" /> </div>🤗 Model Zoo
| Models | Checkpoint | Description
