Seer

[ICLR 2025 Oral] Seer: Predictive Inverse Dynamics Models are Scalable Learners for Robotic Manipulation

Generate Convert Improve

Install / Use

/learn @InternRobotics/Seer

About this skill

Quality Score

0/100

README

Predictive Inverse Dynamics Models are Scalable Learners for Robotic Manipulation

</div> <h3 align="center"> <a href="https://arxiv.org/pdf/2412.15109">Arxiv</a> | <a href="https://nimolty.github.io/Seer/">Webpage</a> </h3>

https://github.com/user-attachments/assets/49036e84-c397-4589-9024-efb05b14efa0

:books: Table of Contents:

Highlights
Getting Started
- Simulation
- Real-World
Checkpoints
TODO List
License
Citation.
Acknowledgment

:fire: Highlights <a name="high"></a>

:trophy: SOTA simulation performance Seer achieves state-of-the-art performance on simulation benchmarks CALVIN ABC-D and LIBERO-LONG.
:muscle: Impressive Real-World performance Seer demonstrates strong effectiveness and generalization across diverse real-world downstream tasks.

:door: Getting Started <a name="start"></a>

We provide step-by-step guidance for running Seer in simulations and real-world experiments. Follow the specific instructions for a seamless setup.

Simulation <a name="simulation"></a>

CALVIN ABC-D <a name="calvin abc-d"></a>

LIBERO LONG <a name="libero long"></a>

Real-World<a name="real-world"></a>

Real-World (Quick Training w & w/o pre-training)<a name="real-world-qs"></a>

For users aiming to train Seer from scratch or fine-tune it, we provide comprehensive instructions for environment setup, downstream task data preparation, training, and deployment.

Real-World (Pre-training)<a name="real-world-fv"></a>

This section details the pre-training process of Seer in real-world experiments, including environment setup, dataset preparation, and training procedures. Downstream task processing and fine-tuning are covered in Real-World (Quick Training w & w/o pre-training).

:pencil2: Checkpoints <a name="checkpoints"></a>

📆 TODO <a name="todos"></a>

[x] Release real-world expriment code.
[x] Release CALVIN ABC-D experiment code (Seer).
[x] Release the evaluation code of Seer-Large on CALVIN ABC-D experiment.
[x] Release the training code of Seer-Large on CALVIN ABC-D experiment.
[x] Release LIBERO-LONG experiment code.
[ ] Release simpleseer, a quick scratch training & deploying code.

License <a name="license"></a>

All assets and code are under the Apache 2.0 license unless specified otherwise.

Citation <a name="citation"></a>

If you find the project helpful for your research, please consider citing our paper:

@article{tian2024predictive,
  title={Predictive Inverse Dynamics Models are Scalable Learners for Robotic Manipulation},
  author={Tian, Yang and Yang, Sizhe and Zeng, Jia and Wang, Ping and Lin, Dahua and Dong, Hao and Pang, Jiangmiao},
  journal={arXiv preprint arXiv:2412.15109},
  year={2024}
}

Acknowledgment <a name="acknowledgment"></a>

This project builds upon GR-1 and Roboflamingo. We thank these teams for their open-source contributions.

Related Skills

node-connect

344.1k

Diagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps

frontend-design

96.8k

Create distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.

openai-whisper-api

344.1k

Transcribe audio via OpenAI Audio Transcriptions API (Whisper).

qqbot-media

344.1k

QQBot 富媒体收发能力。使用 <qqmedia> 标签，系统根据文件扩展名自动识别类型（图片/语音/视频/文件）。