GAP
[ICLR 2026] When would Vision-Proprioception Policies Fail in Robotic Manipulation?
Install / Use
/learn @GeWu-Lab/GAPREADME
When would Vision-Proprioception Policies Fail in Robotic Manipulation?
Authors: Jingxian Lu*, Wenke Xia*, Yuxuan Wu, Zhiwu Lu, Di Hu✉
Accepted By: 2026 International Conference on Learning Representations (ICLR)
Resources:[Project Page], [Paper]
If you have any questions, please open an issue or send an email to jingxianlu1122@gmail.com.
Introduction
This is the PyTorch code of our paper: When would Vision-Proprioception Policies Fail in Robotic Manipulation?
In this work, we investigate when vision-proprioception policies would fail in robotic manipulation by conducting temporally controlled experiments. We found that during task sub-phases that robot's motion transitions, which require target localization, the vision modality of the vision-proprioception policy plays a limited role. Further analysis reveals that the policy naturally gravitates toward concise proprioceptive signals that offer faster loss reduction when training, thereby dominating the optimization and suppressing the learning of the visual modality during motion-transition phases.

As shown, we propose the Gradient Adjustment with Phase-guidance (GAP) algorithm that adaptively modulates the optimization of proprioception, enabling dynamic collaboration within the vision-proprioception policy. Specifically, we leverage proprioception to capture robotic states and estimate the probability of each timestep in the trajectory belonging to motion-transition phases. During policy learning, we apply fine-grained adjustment that reduces the magnitude of proprioception's gradient based on estimated probabilities, leading to robust and generalizable vision-proprioception policies.
Setup
This code has been tested on Ubuntu 18.04 with PyTorch 2.1.0+cu121.
Create Environment
conda env create -f environment.yml
Configuration
- Update the corresponding paths in the
cfgsdirectory. - Copy
costdirection.pyto thecostsfolder in the ruptures library. - Add the following import statement to
__init__.pyof ruptures to enable the custom cost function:
from .costdirection import CostDirection
Training
Train the model using the following command:
python gap/gap.py task=assembly method=gap image=true proprio=true
You can configure the target task and input modalities through command-line arguments. The example above trains on the assembly task with both image and proprioception modalities.
Evaluation
Run inference on the trained model:
python gap/inference.py task=assembly method=gap image=true proprio=true
Citation
If you find this work useful, please cite our paper:
@misc{lu2026visionproprioceptionpoliciesfailrobotic,
title={When would Vision-Proprioception Policies Fail in Robotic Manipulation?},
author={Jingxian Lu and Wenke Xia and Yuxuan Wu and Zhiwu Lu and Di Hu},
year={2026},
eprint={2602.12032},
archivePrefix={arXiv},
primaryClass={cs.RO},
url={https://arxiv.org/abs/2602.12032},
}
Related Skills
node-connect
350.8kDiagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps
frontend-design
110.4kCreate distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.
openai-whisper-api
350.8kTranscribe audio via OpenAI Audio Transcriptions API (Whisper).
qqbot-media
350.8kQQBot 富媒体收发能力。使用 <qqmedia> 标签,系统根据文件扩展名自动识别类型(图片/语音/视频/文件)。
