MolmoBot
Code and website for "MolmoB0T: Large-Scale Simulation Enables Zero-Shot Manipulation".
Install / Use
/learn @allenai/MolmoBotREADME
Code and website for "MolmoB0T: Large-Scale Simulation Enables Zero-Shot Manipulation".
Getting started
MolmoBot policies have strong demonstrated sim-to-real transfer to a wide variety of novel scenes, objects, and camera viewpoints. Try it out for yourself on your DROID platform with MolmoBot-DROID!
MolmoBot-DROID uses only the wrist camera and 1 exo camera. Don't worry about camera placement, MolmoBot policies are robust to arbitrary camera viewpoints!
Trying it out in simulation
See here to try out MolmoBot interactively! Modify the scene and task to test policy behavior.
Set up and run MolmoBot-DROID
-
Set up MolmoBot-DROID by following the installation instructions.
-
See these instructions for detailed instructions on setting up and running the policy on your DROID! Any existing DROID or polymetis setups will work easily.
Briefly, after starting the polymetis robot and gripper servers:
# In one terminal cd MolmoBot/MolmoBot source .venv/bin/activate PYTHONPATH=. python launch_scripts/serve_molmo.py --hf-repo allenai/MolmoBot-DROID --action-type joint_pos# in another terminal cd MolmoBot/robot_eval conda activate molmobot python scripts/droid/run_policy.py robot.robot_host=<nuc_ip> robot.cameras.wrist_camera.id=<wrist_id> robot.cameras.exo_camera_1.id=<exo_id> task="put the red mug in the black bowl"
Using MolmoBot Data
To use MolmoBot-Data for training experiments, you will need to download it from hugging face using bulk_download.py.
Data postprocessing
Before using any dataset implementations in this repo, you will need to run a postprocessing script. This filters out any corrupted trajectories, and can optionally check for visibility of certain objects in a given camera. Below is some example usage of the script.
Example usage:
python validate_trajectories.py RBY1OpenDataGenConfig/part0/train --check-visibility head_camera door_handle
python validate_trajectories.py RBY1PickAndPlaceDataGenConfig/part0/train --check-visibility head_camera pickup_obj --check-visibility head_camera place_receptacle
python validate_trajectories.py FrankaPickAndPlaceOmniCamConfig/part0/train --check-visibility droid_shoulder_light_randomization pickup_obj --check-visibility droid_shoulder_light_randomization place_receptacle
Data statistics
Before training (and after data postprocessing), you should also calculate aggregate statistics with calculate_stats.py. Example usage:
python calculate_stats.py FrankaPickAndPlaceOmniCamConfig/part0/train --keys actions obs/agent/qpos
python calculate_stats.py RBY1OpenDataGenConfig/part0/train --keys actions obs/agent/qpos
python calculate_stats.py RBY1PickAndPlaceDataGenConfig/part0/train --keys actions obs/agent/qpos
BibTeX
@misc{deshpande2026molmobot,
title={MolmoB0T: Large-Scale Simulation Enables Zero-Shot Manipulation},
author={Abhay Deshpande and Maya Guru and Rose Hendrix and Snehal Jauhri and Ainaz Eftekhar and Rohun Tripathi and Max Argus and Jordi Salvador and Haoquan Fang and Matthew Wallingford and Wilbert Pumacay and Yejin Kim and Quinn Pfeifer and Ying-Chun Lee and Piper Wolters and Omar Rayyan and Mingtong Zhang and Jiafei Duan and Karen Farley and Winson Han and Eli Vanderbilt and Dieter Fox and Ali Farhadi and Georgia Chalvatzaki and Dhruv Shah and Ranjay Krishna},
year={2026},
eprint={2603.16861},
archivePrefix={arXiv},
primaryClass={cs.RO},
url={https://arxiv.org/abs/2603.16861},
}
Related Skills
node-connect
343.1kDiagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps
frontend-design
90.0kCreate distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.
openai-whisper-api
343.1kTranscribe audio via OpenAI Audio Transcriptions API (Whisper).
qqbot-media
343.1kQQBot 富媒体收发能力。使用 <qqmedia> 标签,系统根据文件扩展名自动识别类型(图片/语音/视频/文件)。
