OpenHelix

OpenHelix: An Open-source Dual-System VLA Model for Robotic Manipulation

Generate Convert Improve

Install / Use

/learn @OpenHelix-Team/OpenHelix

About this skill

Quality Score

0/100

README

🚀 OpenHelix: An Open-source Dual-System VLA Model for Robotic Manipulation

OpenHelix Team: Can Cui*, Pengxiang Ding*, Wenxuan Song, Shuanghao Bai, Xinyang Tong, Zirui Ge, Runze Suo, and others.

This is our re-implementation of Helix.

We will provide long-term maintenance for this repository.

If you have any questions, please contact us via email!

🗞️ News

[2025/04] Initial release of OpenHelix codebase! 🎉
[2025/05] We released our paper on arXiv. 📄
[2025/05] We released the checkpoints of OpenHelix on Hugging Face 🤗.
[2025/06] We evaluated OpenHelix on CALVIN ABC-D (EP_LEN=360), which is a mainstream setting, and found that OpenHelix achieves SOTA performance among dual-system VLA models. A more powerful version of OpenHelix is on the way！ — Stay tuned!

📌 TODO list

[x] Release checkpoints for reproduction (Scheduled Release Date: Mid-May, 2025)
[ ] Update the model until all effects on the robotic arm are satisfied. (Long-term maintenance)
[ ] Deploying on real robots.
[ ] Deploying on humanoid robots.
[ ] Realizing collaboration between humanoid robots.

🛠️ Installation

Create a conda environment with the following commands:

# Initiate conda env
conda update conda
conda create -n openhelix python=3.8 -y
conda activate openhelix

# Install CALVIN locally
git clone --recurse-submodules https://github.com/mees/calvin.git
export CALVIN_ROOT=$(pwd)/calvin
cd calvin
cd calvin_env; git checkout main
cd ..
pip install setuptools==57.5.0
./install.sh; cd ..

# Clone OpenHelix repo and install
git clone git@github.com:OpenHelix-robot/OpenHelix.git
cd OpenHelix
pip install -e .

# Install diffuser
pip install diffusers["torch"]

# Install DGL (https://www.dgl.ai/pages/start.html)
pip install  dgl -f https://data.dgl.ai/wheels/torch-2.2/cu118/repo.html

# Install FlashAttention (https://github.com/Dao-AILab/flash-attention#installation-and-features)
pip install packaging
pip install ninja
pip install flash-attn==2.5.9.post1 --no-build-isolation

📦 Data Preparation

Prepare data on CALVIN

Download the play demonstrations from Calvin repo.

> cd calvin/dataset
> sh download_data.sh ABC

Package the demonstrations for training

> python data_preprocessing/package_calvin.py --split training
> python data_preprocessing/package_calvin.py --split validation

Expected directory layout

./calvin/dataset/task_ABC_D
                     |------- training/
                     |------- validation/

./data/calvin/packaged_ABC_D
                     |------- training/
                     |            |------- A+0/
                     |            |          |------- ann_1.dat
                     |            |          |------- ...
                     |            |
                     |            |------- B+0/
                     |            |------- C+0/
                     |
                     |------- validation/
                                  |------- D+0/

🗂️ (Optional) Encode Language Instructions

We provide scripts for encoding language instructions with a CLIP Text Encoder on CALVIN.
Alternatively, you can directly download pre-encoded instructions from here.

# Encode validation instructions
python data_preprocessing/preprocess_calvin_instructions.py \
  --output instructions/calvin_task_ABC_D/validation.pkl \
  --model_max_length 16 \
  --annotation_path ./calvin/dataset/task_ABC_D/validation/lang_annotations/auto_lang_ann.npy

# Encode training instructions
python data_preprocessing/preprocess_calvin_instructions.py \
  --output instructions/calvin_task_ABC_D/training.pkl \
  --model_max_length 16 \
  --annotation_path ./calvin/dataset/task_ABC_D/training/lang_annotations/auto_lang_ann.npy

📍 Checkpoints

We uploaded the model weights on Hugging Face.

| MLLM(PT) + Policy(P) | MLLM(PT) + Aux + Policy(P) | |----------------------|-----------------------------| | Weights | Weights |

Notably, you only need to merge the safetensors in the direcctory, e.g. "prompt_tuning_aux/llava_ckpt_safetensors", into a single pytorch_model.bin file. Here is the code:

import torch
from safetensors.torch import load_file
import os

shard_folder = "/openhelix/prompt_tuning_aux/llava_ckpt_safetensors"
output_file = "/openhelix/prompt_tuning_aux/pytorch_model.bin"

shard_files = sorted([
    os.path.join(shard_folder, f)
    for f in os.listdir(shard_folder)
    if f.endswith(".safetensors")
])

merged_state_dict = {}


for shard_file in shard_files:
    shard_dict = load_file(shard_file)
    merged_state_dict.update(shard_dict)
    print(f"Loaded {shard_file} with {len(shard_dict)} tensors")


torch.save(merged_state_dict, output_file)
print(f"\nMerged model saved as: {output_file}")

This will generate a file named pytorch_model.bin. Copy the path of this file, along with the path to the policy.pth file in the download directory from huggingface, into the "test_trajectory_lcb_pt_act_simple_asy10.sh" script as shown below:

torchrun --nproc_per_node $ngpus --master_port $RANDOM \
    online_evaluation_calvin/evaluate_policy_lcb_pt_act_simple_asy10.py \
    --calvin_dataset_path /calvin/task_ABC_D \
    --calvin_model_path /3d_diffuser_actor/calvin/calvin_models \
    --text_encoder clip \
    --text_max_length 16 \
    --tasks A B C D\
    --backbone $backbone \
    --gripper_loc_bounds $gripper_loc_bounds \
    --gripper_loc_bounds_buffer $gripper_buffer \
    --calvin_gripper_loc_bounds /calvin/task_ABC_D/validation/statistics.yaml \
    --embedding_dim $C \
    --action_dim 7 \
    --use_instruction 1 \
    --rotation_parametrization 6D \
    --diffusion_timesteps $diffusion_timesteps \
    --interpolation_length $interpolation_length \
    --num_history $num_history \
    --relative_action $relative_action \
    --fps_subsampling_factor $fps_subsampling_factor \
    --lang_enhanced $lang_enhanced \
    --save_video 0 \
    --base_log_dir train_logs/${main_dir}/${run_log_dir}/eval_logs_pt_1000_0324_sr1_task_latent_lcb_pt_auxin2stage_asy10/ \
    --quaternion_format $quaternion_format \
    --checkpoint /openhelix_huggingface/openhelix/prompt_tuning_aux/policy.pth \  #Here is the path of policy.pth !!!!!!!!!!!!
    --llm_ckpt /openhelix_huggingface/openhelix/prompt_tuning_aux  #Here is the path of pytorch_model.bin !!!!!!!!!!!!!!!!!!!!

The --checkpoint argument should be set to the path of policy.pth, and the --llm_ckpt argument should be set to the path of pytorch_model.bin.

The results on CALVIN ABC-D. MLLM (PT) denotes our proposed prompt tuning method for MLLM training. Policy(P) indicates loading from a pretrained policy model. Asy(10) represents inference with a 10-step time delay. AUX denotes the additionally introduced auxiliary tasks.

| Method | 1 | 2 | 3 | 4 | 5 | Avg. Len. ↑ | |----------------------------------------------------------|-------|-------|-------|-------|-------|--------------| | Only Policy | 92.2 | 78.7 | 63.9 | 51.2 | 41.2 | 3.27 | | MLLM (PT) + Policy(P) (EP_LEN=60) | 92.2 | 79.2 | 65.0 | 52.9 | 40.9 | 3.30 | | MLLM (PT) + AUX + Policy(P) + Asy(10) (EP_LEN=60) | 93.3 | 81.8 | 67.9 | 56.6 | 46.0 | 3.45 | | MLLM (PT) + Policy(P) (EP_LEN=360) | 96.3 | 87.3 | 77.5 | 66.5 | 55.5 | 3.83 | | MLLM (PT) + AUX + Policy(P) + Asy(10) (EP_LEN=360) | 97.1 | 91.4 | 82.8 | 72.6 | 64.1 | 4.08 | | Robodual | 94.4 | 82.7 | 72.1 | 62.4 | 54.4 | 3.66 | | UniVLA | 95.5 | 85.8 | 75.4 | 66.9 | 56.5 | 3.80 | | Seer | 94.4 | 87.2 | 79.9 | 72.2 | 64.3 | 3.98 | | GR-MG | 96.8 | 89.3 | 81.5 | 72.7 | 64.4 | 4.04 |

🎮 Getting Started

Train Openhelix on CALVIN:

> bash scripts/train_trajectory_lcb_pt_act_simple.sh

To evaluate pre-trained weights:

First, download the weights and place them under `train_logs

Related Skills

node-connect

354.2k

Diagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps

frontend-design

112.2k

Create distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.

openai-whisper-api

354.2k

Transcribe audio via OpenAI Audio Transcriptions API (Whisper).

qqbot-media

354.2k

QQBot 富媒体收发能力。使用 <qqmedia> 标签，系统根据文件扩展名自动识别类型（图片/语音/视频/文件）。