GAP

[ICLR 2026] When would Vision-Proprioception Policies Fail in Robotic Manipulation?

Generate Convert Improve

Install / Use

/learn @GeWu-Lab/GAP

About this skill

Quality Score

0/100

README

When would Vision-Proprioception Policies Fail in Robotic Manipulation?

Authors: Jingxian Lu*, Wenke Xia*, Yuxuan Wu, Zhiwu Lu, Di Hu✉

Accepted By: 2026 International Conference on Learning Representations (ICLR)

Resources:[Project Page], [Paper]

If you have any questions, please open an issue or send an email to jingxianlu1122@gmail.com.

Introduction

This is the PyTorch code of our paper: When would Vision-Proprioception Policies Fail in Robotic Manipulation?

In this work, we investigate when vision-proprioception policies would fail in robotic manipulation by conducting temporally controlled experiments. We found that during task sub-phases that robot's motion transitions, which require target localization, the vision modality of the vision-proprioception policy plays a limited role. Further analysis reveals that the policy naturally gravitates toward concise proprioceptive signals that offer faster loss reduction when training, thereby dominating the optimization and suppressing the learning of the visual modality during motion-transition phases.

As shown, we propose the Gradient Adjustment with Phase-guidance (GAP) algorithm that adaptively modulates the optimization of proprioception, enabling dynamic collaboration within the vision-proprioception policy. Specifically, we leverage proprioception to capture robotic states and estimate the probability of each timestep in the trajectory belonging to motion-transition phases. During policy learning, we apply fine-grained adjustment that reduces the magnitude of proprioception's gradient based on estimated probabilities, leading to robust and generalizable vision-proprioception policies.

Setup

This code has been tested on Ubuntu 18.04 with PyTorch 2.1.0+cu121.

Create Environment

conda env create -f environment.yml

Configuration

Update the corresponding paths in the cfgs directory.
Copy costdirection.py to the costs folder in the ruptures library.
Add the following import statement to __init__.py of ruptures to enable the custom cost function:

from .costdirection import CostDirection

Training

Train the model using the following command:

python gap/gap.py task=assembly method=gap image=true proprio=true

You can configure the target task and input modalities through command-line arguments. The example above trains on the assembly task with both image and proprioception modalities.

Evaluation

Run inference on the trained model:

python gap/inference.py task=assembly method=gap image=true proprio=true

Citation

If you find this work useful, please cite our paper:

@misc{lu2026visionproprioceptionpoliciesfailrobotic,
      title={When would Vision-Proprioception Policies Fail in Robotic Manipulation?}, 
      author={Jingxian Lu and Wenke Xia and Yuxuan Wu and Zhiwu Lu and Di Hu},
      year={2026},
      eprint={2602.12032},
      archivePrefix={arXiv},
      primaryClass={cs.RO},
      url={https://arxiv.org/abs/2602.12032}, 
}

Related Skills

node-connect

350.8k

Diagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps

frontend-design

110.4k

Create distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.

openai-whisper-api

350.8k

Transcribe audio via OpenAI Audio Transcriptions API (Whisper).

qqbot-media

350.8k

QQBot 富媒体收发能力。使用 <qqmedia> 标签，系统根据文件扩展名自动识别类型（图片/语音/视频/文件）。