Seer
[ICLR 2025 Oral] Seer: Predictive Inverse Dynamics Models are Scalable Learners for Robotic Manipulation
Install / Use
/learn @InternRobotics/SeerREADME
Predictive Inverse Dynamics Models are Scalable Learners for Robotic Manipulation
</div> <h3 align="center"> <a href="https://arxiv.org/pdf/2412.15109">Arxiv</a> | <a href="https://nimolty.github.io/Seer/">Webpage</a> </h3>https://github.com/user-attachments/assets/49036e84-c397-4589-9024-efb05b14efa0
<br><br>
:books: Table of Contents:
:fire: Highlights <a name="high"></a>
<img width="1000" alt="seer" src="assets/seer_method.jpg">- :trophy: SOTA simulation performance Seer achieves state-of-the-art performance on simulation benchmarks CALVIN ABC-D and LIBERO-LONG.
- :muscle: Impressive Real-World performance Seer demonstrates strong effectiveness and generalization across diverse real-world downstream tasks.
:door: Getting Started <a name="start"></a>
We provide step-by-step guidance for running Seer in simulations and real-world experiments. Follow the specific instructions for a seamless setup.
Simulation <a name="simulation"></a>
CALVIN ABC-D <a name="calvin abc-d"></a>
LIBERO LONG <a name="libero long"></a>
Real-World<a name="real-world"></a>
Real-World (Quick Training w & w/o pre-training)<a name="real-world-qs"></a>
For users aiming to train Seer from scratch or fine-tune it, we provide comprehensive instructions for environment setup, downstream task data preparation, training, and deployment.
Real-World (Pre-training)<a name="real-world-fv"></a>
This section details the pre-training process of Seer in real-world experiments, including environment setup, dataset preparation, and training procedures. Downstream task processing and fine-tuning are covered in Real-World (Quick Training w & w/o pre-training).
:pencil2: Checkpoints <a name="checkpoints"></a>
Relevant checkpoints are available on the website. |Model|Checkpoint| |:------:|:------:| |CALVIN ABC-D|Seer (Avg.Len. : 3.98) / Seer Large (Avg.Len. : 4.30)| |Real-World|Seer (Droid Pre-trained)|
📆 TODO <a name="todos"></a>
- [x] Release real-world expriment code.
- [x] Release CALVIN ABC-D experiment code (Seer).
- [x] Release the evaluation code of Seer-Large on CALVIN ABC-D experiment.
- [x] Release the training code of Seer-Large on CALVIN ABC-D experiment.
- [x] Release LIBERO-LONG experiment code.
- [ ] Release simpleseer, a quick scratch training & deploying code.
License <a name="license"></a>
All assets and code are under the Apache 2.0 license unless specified otherwise.
Citation <a name="citation"></a>
If you find the project helpful for your research, please consider citing our paper:
@article{tian2024predictive,
title={Predictive Inverse Dynamics Models are Scalable Learners for Robotic Manipulation},
author={Tian, Yang and Yang, Sizhe and Zeng, Jia and Wang, Ping and Lin, Dahua and Dong, Hao and Pang, Jiangmiao},
journal={arXiv preprint arXiv:2412.15109},
year={2024}
}
Acknowledgment <a name="acknowledgment"></a>
This project builds upon GR-1 and Roboflamingo. We thank these teams for their open-source contributions.
Related Skills
node-connect
344.1kDiagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps
frontend-design
96.8kCreate distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.
openai-whisper-api
344.1kTranscribe audio via OpenAI Audio Transcriptions API (Whisper).
qqbot-media
344.1kQQBot 富媒体收发能力。使用 <qqmedia> 标签,系统根据文件扩展名自动识别类型(图片/语音/视频/文件)。
