Molmoact
Official Repository for MolmoAct
Install / Use
/learn @allenai/MolmoactREADME
Updates
- [2025/12/5] 🔥 Tips on zero-shot evaluation of allenai/MolmoAct-7B-D-0812 on Franka setup with MolmoAct mid-training at 5.4 Zero-shot evaluation.
- [2025/11/30] 🔥 Code for steering experiment of MolmoAct in SimplerEnv has been released at 5.3 Steer-SimplerEnv.
- [2025/10/24] 🔥 Code for fine-tuning and data processing have been released! Everything is fully open-source.
- [2025/08/30] 🔥 Code for replicating MolmoAct's training pipeline has been released
- [2025/08/15] 🔥 Code for MolmoAct Evaluation on SimplerEnv has been released at allenai/SimplerEnv
- [2025/08/12] 🔥 Datasets used for our pre-training and mid-training have been released
- [2025/08/12] 🔥 Models have been released
Table of Contents
- Overview
- Release Notes
2.1 Datasets
2.2 Models - Installation
- Training
4.1 Train Your Own MolmoAct
4.1.1 Data Processing
4.1.2 Fine-tuning (Post-training)
4.1.3 Merge LoRA
4.1.4 Inference
4.1.5 Visualization
4.2 Training Replication
4.2.1 Pre-training
4.2.2 Mid-training
4.2.3 Post-training (LIBERO) - Evaluation
5.1 SimplerEnv
5.2 LIBERO
5.3 Steer-SimplerEnv
5.4 Real-world - License and Use
- Model and Hardware Safety
- Citation
- Contacts
1. Overview
MolmoAct is a repository for training and using Ai2’s open-sourced Action Reasoning Model that can reason in space.
2. Release Notes
2.1 Datasets
| Data | Description | Dataset Path | |------------------------------------|----------------------------------------------------------------------------------------------------------------------------------------------|---------------------------------------------------------------------------| | MolmoAct Dataset | MolmoAct dataset in LeRobot format. All contents were collected in-house by Ai2. | https://huggingface.co/datasets/allenai/MolmoAct-Dataset | | MolmoAct Pre-training Mixture | Data mixture for MolmoAct pre-training. Contains a subset of OXE formulated as Action Reasoning data, auxiliary robot data, and web data. | https://huggingface.co/datasets/allenai/MolmoAct-Pretraining-Mixture | | MolmoAct Mid-training Mixture | Data mixture for MolmoAct mid-training. Contains MolmoAct Dataset formulated as Action Reasoning data. | https://huggingface.co/datasets/allenai/MolmoAct-Midtraining-Mixture |
2.2 Models
| Model | Use Case | Description | Checkpoint Path | |----------------------------|--------------|-------------------------------------------------------------------------------------------------------------|----------------------------------------------------------------| | MolmoAct-7B-D | Fine-tuning | Best/demo MolmoAct; adapt to real robots by fine-tuning on your datasets. | https://huggingface.co/allenai/MolmoAct-7B-D-0812 | | MolmoAct-7B-O | Fine-tuning | Most open MolmoAct; adapt to real robots by fine-tuning on your datasets. | https://huggingface.co/allenai/MolmoAct-7B-O-0812 | | MolmoAct-7B-D-Pretrain | Inference | Checkpoint to replicate zero-shot results on SimplerEnv (Google Robot). | https://huggingface.co/allenai/MolmoAct-7B-D-Pretrain-0812 | | MolmoAct-7B-D-Pretrain-RT-1| Inference | Checkpoint to replicate RT-1 fine-tuned results on SimplerEnv (Google Robot). | https://huggingface.co/allenai/MolmoAct-7B-D-Pretrain-RT-1-0812|
3. Installation
We provide the Dockerfile to build the docker, where we ran all our training experiments on. We strongly recommand to build the same docker on your own and run training on that.
If you want to install environment on your own, first install python 3.11, then install PyTorch according to the instructions specific to your operating system.
Next, in both cases, go to your working molmoact folder, and run:
git clone https://github.com/allenai/molmoact.git
cd molmoact
pip install -e .[all]
4. Training
We provide instructions on both how to train your own MolmoAct and how to replicate all of our training stages:
4.1 Train Your Own MolmoAct
4.1.1 Data Processing
Installation for Data Processing
Command
git clone https://github.com/DepthAnything/Depth-Anything-V2.git
cd Depth-Anything-V2 &&
pip install -r requirements.txt &&
pip uninstall -y opencv-python opencv-python-headless opencv-contrib-python &&
pip install opencv-python-headless --no-cache-dir &&
pip install lerobot==0.3.3
Download Depth Anything V2 Checkpoint
Command
wget https://huggingface.co/allenai/MolmoAct-7B-D-0812/resolve/main/depth_anything_v2_vitb.pth
mv <path/to/depth_anything_v2_vitb.pth> <path/to/Depth-Anything-V2/checkpoints>
Download MolmoAct VQVAE Checkpoint
Command
wget https://huggingface.co/allenai/MolmoAct-7B-D-0812/resolve/main/vae-final.pt
To preprocess conventional lerobot dataset format into Action Reasoning Data, first run the preprocessing command:
Command
export DEPTH_CHECKPOINT_DIR="<path/to/Depth-Anything-V2/checkpoints>"
export VQVAE_MODEL_PATH="<path/to/vqvae.pt>"
python preprocess/action_reasoning_data.py \
--dataset-path <lerobot/repo_id> \
--output-path <path/to/processed_dataset> \
--depth-encoder vitb \
--line-length 5 \
--process-actions \
--action-bins 256 \
--action-chunk-size 8
4.1.2 Fine-tuning (Post-training)
Note that after you finished the data processing before, you should get a folder /path/to/processed_dataset where it has all the data and dataset_statistics.json. Then, you need to change finetune:/path/to/processed_dataset with the actual path in launch_scripts/train_multitask_model.py. To run the training, the following script is provided, which should work well on 8 A100/H100 GPUs. You should customize the gloabal batch size under your GPU setup to avoid OOM.
WANDB_API_KEY=<your_wandb_api_key> torchrun \
--nnodes=1 --nproc-per-node=8 \
--node_rank="${RANK}" --master_addr="${ADDR}" --master_port="${PORT}" \
launch_scripts/train_multitask_model.py \
robot-finetune allenai/MolmoAct-7B-D-0812 \
--wandb.name=<name> --wandb.entity=<entity> --wandb.project=<project> \
--norm_stats_path /path/to/dataset_statistics.json \
--save_folder=checkpoints/<exp_name> \
--save_overwrite \
--duration 10000 \
--ft_embedding all \
--depth_tokens \
--global_batch_size 16 \
--lr_connector 5e-4 \
--lr_vit 5e-4 \
--lr_llm 5e-4 \
--save_interval 2000 \
--save_num_checkpoints_to_keep 5 \
--max_images 2 \
--lora_enable --lora_rank 32 --lora_alpha 16 --lora_dropout 0.0 \
--img_aug
Note that during fine-tuning, we by default disable the high-resolution crops by downsizing all training images to sizes smaller than 378x378, as all of our training stages doesn't enable this feature. For more details on these flags, please refer to section 4.2 Training Replication.
4.1.3 Merge LoRA
If you perform LoRA fine-tuning instead of full-parameter fine-tuning, which is what we did for most of our post-training experiments, we need to merge the adapters with the original model weights. When training with LoRA, Our checkpointer will save sharded checkpoints and LoRA adapters (named with stepXXX and `s
