SkillAgentSearch skills...

VisionProTeleop

VisionOS App + Python Library to stream hand tracking data from Vision Pro, video/audio stream to Vision Pro.

Install / Use

/learn @Improbable-AI/VisionProTeleop
About this skill

Quality Score

0/100

Supported Platforms

Universal

README

<!-- omit in toc -->

VisionProTeleop

<div align="center"> <img width="340" src="assets/vptv2.png"> </div> <p align="center"> <a href="https://pypi.org/project/avp_stream/"> <img src="https://img.shields.io/pypi/v/avp_stream" alt="CI"> </a> <a href="https://pypi.org/project/avp_stream/"> <img src="https://img.shields.io/pypi/dm/avp_stream" alt="CI"> </a> <a href="https://opensource.org/licenses/MIT"> <img src="https://img.shields.io/badge/License-MIT-yellow.svg" alt="CI"> </a> </p>

A complete ecosystem for using Apple Vision Pro in robotics research — from real-world teleoperation to simulation teleoperation to egocentric dataset recording. Stream hand/head tracking from Vision Pro, send video/audio/simulation back, and record everything to the cloud.

For a more detailed explanation, check out this short paper.

The recently updated App Store version of Tracking Streamer requires python library avp_stream over 2.50.0. It will show a warning message on the VisionOS side if the python library is outdated. You can upgrade the library by running pip install --upgrade avp_stream.

<!-- omit in toc -->

Table of Contents

Overview

This project provides:

  1. Tracking Streamer: A VisionOS app that
    • streams hand/head tracking data to Python client
    • receive stereo/mono video/audio streams from Python client
    • present simulation scenes (MuJoCo and Isaac Lab) and its updates with native AR rendering using RealityKit
    • record egocentric video with hand tracking with arbitrary UVC camera connected to Vision Pro
    • (optionally) record every sessions to user's personal cloud storage
  2. avp_stream: A Python library for
    • receiving tracking data from Vision Pro
    • streaming video/audio/simulation back to Vision Pro
  3. Tracking Manager: A companion iOS app for
    • managing and viewing recordings on their personal cloud storage
    • configuring app settings for VisionOS app
    • calibrating camera mounted on Vision Pro
    • sharing recorded datasets with others
    • viewing publicly shared datasets

Together, they enable three major workflows for robotics research:

<table> <tr> <th colspan="2">Use Case</th> <th>Description</th> <th>Primary Tools</th> </tr> <tr> <td colspan="2"><a href="#use-case-1-real-world-teleoperation"><b>Real-World Teleoperation</b></a></td> <td>Control physical robots with hand tracking while viewing robot camera feeds</td> <td><code>avp_stream</code> + WebRTC streaming <code>configure_video()</code> (or direct USB connection) of Physical Camera</td> </tr> <tr> <td rowspan="2"><a href="#use-case-2-simulation-teleoperation"><b>Simulation Teleoperation</b></a></td> <td><i>2D Renderings</i></td> <td>Control simulated robots with hand tracking while viewing 2D renderings from simulation</td> <td><code>avp_stream</code> + WebRTC streaming of 2D simulation rendering <code>configure_video()</code> </td> </tr> <tr> <td><i>AR</i></td> <td>Control simulated robots with MuJoCo/Isaac Lab with scenes directly presented in AR</td> <td><code>avp_stream</code> + MuJoCo/Isaac Lab streaming <code>configure_mujoco()</code> <code>configure_isaac()</code></td> </tr> <tr> <td colspan="2"><a href="#use-case-3-egocentric-video-dataset-recording"><b>Egocentric Human Video Recording</b></a></td> <td>Record first-person manipulation videos with synchronized tracking</td> <td>UVC camera + Developer Strap</td> </tr> </table>

Installations

Installing is easy: install it from the App Store and PyPI. | Component | Installation | |-----------|-------------| | Tracking Streamer (VisionOS) | Install from App Store | | Tracking Manager (iOS) | Install from App Store | | avp_stream (Python) | pip install --upgrade avp_stream |

No other network configurations are required. Everything should work out of the box after installation. An easy way to get onboarded is to go through the examples folder. All examples should work out of the box without any extra configurations required.

Note: Some examples demonstrate teleoperating things within IsaacLab world; since IsaacLab is an extremely heavy dependency, I did not include that as a dependency for avp_stream. If you're trying to run examples including IsaacLab as a simulation backend, you should install things according to their official installation guide.


External Network (Remote) Mode 🆕

So far, Vision Pro and your Python client had to be on the same local network (e.g., same WiFi) for them to communicate. With External Network Mode with v2.5 release, you can make bilateral connection from anywhere over the internet! It's extremely useful when your robot is in a lab (likely behind school/company network's firewall) and you're working remotely using your home WiFi outside the school network.

| Mode | Connection Method | Use Case | |------|-------------------|----------| | Local Network | IP address (e.g., "192.168.1.100") | Same WiFi/LAN | | External Network | Room code (e.g., "ABC-1234") | Different networks, over internet |

How It Works

External Network Mode uses WebRTC with TURN relay servers for NAT traversal:

  1. Vision Pro generates a room code and connects to a signaling server
  2. Python client connects using the same room code
  3. Signaling server facilitates the initial handshake (SDP offer/answer, ICE candidates)
  4. TURN servers relay media when direct peer-to-peer connection isn't possible
  5. Once connected, all streaming works the same as local mode

Usage

from avp_stream import VisionProStreamer

# Instead of IP address, use the room code shown on Vision Pro
s = VisionProStreamer(ip="ABC-1234")

# Everything else works exactly the same
s.configure_video(device="/dev/video0", format="v4l2", size="1280x720", fps=30)
s.start_webrtc()

while True:
    r = s.get_latest()
    # ...

Notes

  • Latency: Expect slightly higher latency compared to local network due to relay routing
  • Signaling/TURN server: We provide a Cloudflare-hosted signaling and TURN server for now by default. If we detect extreme usage or abuse, we may introduce usage limits or require a paid tier in the future.

Use Case 1: Real-World Teleoperation

Stream your robot's camera feed to Vision Pro while receiving hand/head tracking data for control. Perfect for teleoperating physical robots with visual feedback.

Video & Audio Streaming

from avp_stream import VisionProStreamer

avp_ip = "10.31.181.201"  # Vision Pro IP (shown in the app)
s = VisionProStreamer(ip=avp_ip)

# Configure video streaming from robot camera
s.configure_video(device="/dev/video0", format="v4l2", size="1280x720", fps=30)
s.start_webrtc()

while True:
    r = s.get_latest()
    # Use tracking data to control your robot
    head_pose = r['head']
    right_wrist = r['right_wrist']
    right_fingers = r['right_fingers']

Video Configuration Examples

Camera with overlay processing:

def add_overlay(frame):
    return cv2.putText(frame, "Robot View", (50, 50), 
                       cv2.FONT_HERSHEY_SIMPLEX, 1, (0, 255, 0), 2)

s = VisionProStreamer(ip=avp_ip)
s.register_frame_callback(add_overlay)
s.configure_video(device="/dev/video0", format="v4l2", size="640x480", fps=30)
s.start_webrtc()

Stereo camera (side-by-side 3D):

s = VisionProStreamer(ip=avp_ip)
s.configure_video(device="/dev/video0", format="v4l2", size="1920x1080", fps=30, stereo=True)
s.start_webrtc()

Synthetic video (generated frames):

s = VisionProStreamer(ip=avp_ip)
s.register_frame_callback(render_visualization)  # Your rendering function
s.configure_video(size="1280x720", fps=60)  # No device = synthetic mode
s.start_webrtc()

Audio Configuration Examples

With microphone input:

s = Vi
View on GitHub
GitHub Stars747
CategoryContent
Updated1h ago
Forks59

Languages

C++

Security Score

95/100

Audited on Mar 27, 2026

No findings