HinFlow
Official Code Implementation of Translating Flow to Policy via Hindsight Online Imitation
Install / Use
/learn @yzc0731/HinFlowREADME
[ICLR 2026] Translating Flow to Policy via Hindsight Online Imitation
Yitian Zheng*, Zhangchen Ye*, Weijun Dong*, Shengjie Wang, Yuyang Liu, Chongjie Zhang, Chuan Wen<sup>✉</sup>, Yang Gao<sup>✉</sup>

Installation
git clone --recursive git@github.com:yzc0731/HinFlow.git
cd HinFlow
conda env create -f environment.yml
conda activate hinflow
pip install -e third_party/robosuite/
pip install -e third_party/robomimic/
pip install -e third_party/maniskill/
Dataset
We provide the preprocessed dataset to reproduce the results in our paper. You can download it from Hugging Face Hub.
Or you can collect and preprocess the dataset yourself by following instructions below.
Collect Dataset
For LIBERO tasks, you can download raw LIBERO dataset by running download_libero_datasets, do SpaceMouse teleoperation, or develop your own scripted policy. For more details, please refer to CREATE YOUR OWN DATASETS in LIBERO Docs.
For ManiSkill tasks, please refer to ManiSkill Data Collection. Our method require control mode to be pd_ee_delta_pose and observation to be rgb+segmentation.
Because the ManiSkill data format is different from LIBERO, we provide a script to convert here.
Dataset Preprocessing
Dataset need to be preprocessed with Cotracker:
python -m scripts.preprocess \
--source_hdf5=path/to/raw/data.hdf5 \
--target_dir=path/to/preprocessed/data.hdf5 \
--sampler=SegmentSampler \
--use_points=1 \
--sampler_cfg=path/to/preprocess/task.yaml \
--env_type=maniskill
Training
To replicate the results in our paper, use the following task names: libero_butter, libero_book, libero_chocolate, libero_microwave, maniskill_pokecube, maniskill_pullcubetool, and maniskill_placesphere.
The training of our method includes two stages:
Stage 1: High Level Planner
We have provided the checkpoints of High Level Planner to reproduce the results in our paper. You can download it from Hugging Face Hub. Or you can do it yourself by following instructions below.
First, split the datasets into training and validation sets.
python -m scripts.split_trainval --folder=data/planner_dataset/${task}
The High Level Planner training can be executed by this command:
python -m scripts.train_planner --task=${task}
Stage 2: Low Level Policy with Hindsight Online Imitation
Our policy can be trained with:
python -m scripts.train_hinflow_policy --task=${task} --gpu=${gpu_id} --planner=${planner_path}
Here planner_path is the path to the folder of the trained high level planner, it should contain model_best.ckpt and config.yaml.
Baseline
To replicate the results in our paper, we provide 3 mode choices: bc, atm_grid, and atm_seg. The planner used in atm_grid and atm_seg baseline is the same as our method. In the training and evaluation of bc, --planner is required as a placeholder but will not be used.
Before training the baseline, process the dataset in data/policy_dataset/${task} using this script:
python -m scripts.label_points --task=${task} --mode=${mode}
Training scripts:
python -m scripts.train_baseline --task=${task} --planner=${planner_path} --mode=${mode}
Evaluation scripts:
python -m scripts.eval_baseline --task=${task} --exp-dir=path/to/your/exp/dir --planner=${planner_path} --mode=${mode}
Acknowledgement
Thanks to these excellent open source projects:
Citation
If you find our codebase is useful for your research, please cite our paper with this bibtex:
@inproceedings{zheng2026translating,
title={Translating Flow to Policy via Hindsight Online Imitation},
author={Zheng, Yitian and Ye, Zhangchen and Dong, Weijun and Wang, Shengjie and Liu, Yuyang and Zhang, Chongjie and Wen, Chuan and Gao, Yang},
booktitle={The Fourteenth International Conference on Learning Representations},
year={2026}
}
