[CVPR 25] RoboBrain: A Unified Brain Model for Robotic Manipulation from Abstract to Concrete.

<p align="center"> </a>&nbsp&nbsp⭐️ <a href="https://superrobobrain.github.io/">Project</a></a>&nbsp&nbsp | &nbsp&nbsp🤗 <a href="https://huggingface.co/BAAI/RoboBrain/">Hugging Face</a>&nbsp&nbsp | &nbsp&nbsp🤖 <a href="https://www.modelscope.cn/models/BAAI/RoboBrain/files/">ModelScope</a>&nbsp&nbsp | &nbsp&nbsp🌎 <a href="https://github.com/FlagOpen/ShareRobot">Dataset</a>&nbsp&nbsp | &nbsp&nbsp📑 <a href="http://arxiv.org/abs/2502.21257">Paper</a>&nbsp&nbsp | &nbsp&nbsp💬 <a href="./assets/wechat.png">WeChat</a> </p> <p align="center"> </a>&nbsp&nbsp🔥🔥 <a href="https://github.com/FlagOpen/RoboBrain2.0"><strong>RoboBrain 2.0</strong></a><strong>: More powerful version of RoboBrain: See Better. Think Harder. Do Smarter.</strong> </p> <p align="center"> </a>&nbsp&nbsp🎯 <a href="https://github.com/FlagOpen/RoboOS">RoboOS</a>: An Efficient Open-Source Multi-Robot Coordination System for RoboBrain. </p> <p align="center"> </a>&nbsp&nbsp🎯 <a href="https://github.com/tanhuajie/Reason-RFT">Reason-RFT</a>: Exploring Efficient RFT Paradigm to Enhance RoboBrain's Visual Reasoning Capabilities. </p>

🔥 Overview

Recent advancements in Multimodal Large Language Models (MLLMs) have shown remarkable capabilities across various multimodal contexts. However, their application in robotic scenarios, particularly for long-horizon manipulation tasks, reveals significant limitations. These limitations arise from the current MLLMs lacking three essential robotic brain capabilities: (1) Planning Capability, which involves decomposing complex manipulation instructions into manageable sub-tasks; (2) Affordance Perception, the ability to recognize and interpret the affordances of interactive objects; and (3) Trajectory Prediction, the foresight to anticipate the complete manipulation trajectory necessary for successful execution. To enhance the robotic brain's core capabilities from abstract to concrete, we introduce ShareRobot, a high-quality heterogeneous dataset that labels multi-dimensional information such as task planning, object affordance, and end-effector trajectory. ShareRobot's diversity and accuracy have been meticulously refined by three human annotators. Building on this dataset, we developed RoboBrain, an MLLM-based model that combines robotic and general multi-modal data, utilizes a multi-stage training strategy, and incorporates long videos and high-resolution images to improve its robotic manipulation capabilities. Extensive experiments demonstrate that RoboBrain achieves state-of-the-art performance across various robotic tasks, highlighting its potential to advance robotic brain capabilities.

🚀 Features

This repository supports:

Data Preparation: Please refer to Dataset Preparation for how to prepare the dataset.
Training for RoboBrain: Please refer to Training Section for the usage of training scripts.
Support HF/VLLM Inference: Please see Inference Section, now we support inference with VLLM.
Evaluation for RoboBrain: Please refer to Evaluation Section for how to prepare the benchmarks.
ShareRobot Generation: Please refer to ShareRobot for details.

🗞️ News

2025-06-06: 🤗 RoboBrain 2.0-7B model checkpoint has been released in Huggingface..
2025-06-06: 🔥 We're excited to announce the release of our more powerful RoboBrain 2.0.
2025-04-11: 🎉 RoboBrain was selected for CVPR 2025's official Embodied AI Trends Commentary.
2025-04-04: 🤗 We have released Trajectory Checkpoint (T-LoRA) in Huggingface.
2025-03-29: 🤗 We have released Affordance Checkpoint (A-LoRA) in Huggingface.
2025-03-27: 🤗 We have released Planning Checkpoint in Huggingface.
2025-03-26: 🔥 We have released the RoboBrain repository.
2025-02-27: 🌍 Our RoboBrain was accepted to CVPR2025.

📆 Todo

[x] Release scripts for model training and inference.
[x] Release Planning checkpoint.
[x] Release Affordance checkpoint.
[x] Release ShareRobot dataset.
[x] Release Trajectory checkpoint.
[x] Release more powerful Robobrain 2.0.

🤗 Models

Base Planning Model: The model was trained on general datasets in Stages 1–2 and on the Robotic Planning dataset in Stage 3, which is designed for Planning prediction.
A-LoRA for Affordance: Based on the Base Planning Model, Stage 4 involves LoRA-based training with our Affordance dataset to predict affordance.
T-LoRA for Trajectory: Based on the Base Planning Model, Stage 4 involves LoRA-based training with our Trajectory dataset to predict trajectory.

| Models | Checkpoint | Description | |----------------------|----------------------------------------------------------------|------------------------------------------------------------| | Planning Model | 🤗 Planning CKPTs | Used for Planning prediction in our paper | | Affordance (A-LoRA) | 🤗 Affordance CKPTs | Used for Affordance prediction in our paper | | Trajectory (T-LoRA) | 🤗 Trajectory CKPTs | Used for Trajectory prediction in our paper | | RoboBrain 2.0 7B | 🤗 BAAI/RoboBrain2.0-7B | 7B parameter version of the RoboBrain2.0 | | RoboBrain 2.0 32B | 🤗 BAAI/RoboBrain2.0-32B | 32B parameter version of the RoboBrain2.0 | | RoboBrain 2.0 3B | 🤗 BAAI/RoboBrain2.0-3B | 3B parameter version of the RoboBrain2.0 |

Note: Please refer to RoboBrain 2.0 Github for the usage of RoboBrain 2.0

🛠️ Setup

# clone repo.
git clone https://github.com/FlagOpen/RoboBrain.git
cd RoboBrain

# build conda env.
conda create -n robobrain python=3.10
conda activate robobrain
pip install -r requirements.txt

<a id="Training"> 🤖 Training</a>

1. Data Preparation

# Modify datasets for Stage 1, please refer to:
- yaml_path: scripts/train/yaml/stage_1_0.yaml

# Modify datasets for Stage 1.5, please refer to:
- yaml_path: scripts/train/yaml/stage_1_5.yaml

# Modify datasets for Stage 2_si, please refer to:
- yaml_path: scripts/train/yaml/stage_2_si.yaml

# Modify datasets for Stage 2_ov, please refer to:
- yaml_path: scripts/train/yaml/stage_2_ov.yaml

# Modify datasets for Stage 3_plan, please refer to:
- yaml_path: scripts/train/yaml/stage_3_planning.yaml

# Modify datasets for Stage 4_aff, please refer to:
- yaml_path: scripts/train/yaml/stage_4_affordance.yaml

# Modify datasets for Stage 4_traj, please refer to:
- yaml_path: scripts/train/yaml/stage_4_trajectory.yaml

Note: The sample format in each json file should be like:

{
    "id": "xxxx",
    "image": [
        "image1.png",
        "image2.png",
    ],
    "conversations": [
        {
            "from": "human",
            "value": "<image>\n<image>\nAre there numerous dials near the bottom left of the tv?"
        },
        {
            "from": "gpt",
            "value": "Yes. The sun casts shadows ... a serene, clear sky."
        }
    ]
},

2. Training

# Training on Stage 1:
bash scripts/train/stage_1_0_pretrain.sh

# Training on Stage 1.5:
bash scripts/train/stage_1_5_direct_finetune.sh

# Training on Stage 2_si:
bash scripts/train/stage_2_0_resume_finetune_si.sh

# Training on Stage 2_ov:
bash scripts/train/stage_2_0_resume_finetune_ov.sh

# Training on Stage 3_plan:
bash scripts/train/stage_3_0_resume_finetune_robo.sh

# Training on Stage 4_aff:
bash scripts/train/stage_4_0_resume_finetune_lora_a.sh

# Training on Stage 4_traj:
bash scripts/train/stage_4_0_resume_finetune_lora_t.sh

Note: Please change the environment variables (e.g. DATA_PATH, IMAGE_FOLDER, PREV_STAGE_CHECKPOINT) in the script to your own.

3. Convert original weights to HF weights

# Planning Model
python model/llava_utils/convert_robobrain_to_hf.py --model_dir /path/to/original/checkpoint/ --dump_path /path/to/output/

# A-LoRA & T-RoRA
python model/llava_utils/convert_lora_weights_to_hf.py --model_dir /path/to/original/checkpoint/ --dump_path /path/to/output/

(Option) 4. Compress Model

Model compression by Flagscale, the model is compressed to W8A16, reducing the model size by more than 40%. The inference speed can be accelerated by up to 50%, but the generated results after compression may differ slightly from those before compression.

git clone https://github.com/FlagOpen/FlagScale.git
cd FlagScale

python run.py --config-path examples/llava_onevision/conf --config-name  config_c

RoboBrain

Install / Use

README