VisionProTeleop
VisionOS App + Python Library to stream hand tracking data from Vision Pro, video/audio stream to Vision Pro.
Install / Use
/learn @Improbable-AI/VisionProTeleopREADME
VisionProTeleop
<div align="center"> <img width="340" src="assets/vptv2.png"> </div> <p align="center"> <a href="https://pypi.org/project/avp_stream/"> <img src="https://img.shields.io/pypi/v/avp_stream" alt="CI"> </a> <a href="https://pypi.org/project/avp_stream/"> <img src="https://img.shields.io/pypi/dm/avp_stream" alt="CI"> </a> <a href="https://opensource.org/licenses/MIT"> <img src="https://img.shields.io/badge/License-MIT-yellow.svg" alt="CI"> </a> </p>A complete ecosystem for using Apple Vision Pro in robotics research — from real-world teleoperation to simulation teleoperation to egocentric dataset recording. Stream hand/head tracking from Vision Pro, send video/audio/simulation back, and record everything to the cloud.
For a more detailed explanation, check out this short paper.
<!-- omit in toc -->The recently updated App Store version of Tracking Streamer requires python library
avp_streamover 2.50.0. It will show a warning message on the VisionOS side if the python library is outdated. You can upgrade the library by runningpip install --upgrade avp_stream.
Table of Contents
- Overview
- Installations
- External Network (Remote) Mode 🆕
- Use Case 1: Real-World Teleoperation
- Use Case 2: Simulation Teleoperation
- Use Case 3: Egocentric Video Dataset Recording
- Recording & Cloud Storage
- App Settings & Customization
- API Reference
- Performance
- Examples
- Appendix
Overview
This project provides:
- Tracking Streamer: A VisionOS app that
- streams hand/head tracking data to Python client
- receive stereo/mono video/audio streams from Python client
- present simulation scenes (MuJoCo and Isaac Lab) and its updates with native AR rendering using RealityKit
- record egocentric video with hand tracking with arbitrary UVC camera connected to Vision Pro
- (optionally) record every sessions to user's personal cloud storage
- avp_stream: A Python library for
- receiving tracking data from Vision Pro
- streaming video/audio/simulation back to Vision Pro
- Tracking Manager: A companion iOS app for
- managing and viewing recordings on their personal cloud storage
- configuring app settings for VisionOS app
- calibrating camera mounted on Vision Pro
- sharing recorded datasets with others
- viewing publicly shared datasets
Together, they enable three major workflows for robotics research:
<table> <tr> <th colspan="2">Use Case</th> <th>Description</th> <th>Primary Tools</th> </tr> <tr> <td colspan="2"><a href="#use-case-1-real-world-teleoperation"><b>Real-World Teleoperation</b></a></td> <td>Control physical robots with hand tracking while viewing robot camera feeds</td> <td><code>avp_stream</code> + WebRTC streaming <code>configure_video()</code> (or direct USB connection) of Physical Camera</td> </tr> <tr> <td rowspan="2"><a href="#use-case-2-simulation-teleoperation"><b>Simulation Teleoperation</b></a></td> <td><i>2D Renderings</i></td> <td>Control simulated robots with hand tracking while viewing 2D renderings from simulation</td> <td><code>avp_stream</code> + WebRTC streaming of 2D simulation rendering <code>configure_video()</code> </td> </tr> <tr> <td><i>AR</i></td> <td>Control simulated robots with MuJoCo/Isaac Lab with scenes directly presented in AR</td> <td><code>avp_stream</code> + MuJoCo/Isaac Lab streaming <code>configure_mujoco()</code> <code>configure_isaac()</code></td> </tr> <tr> <td colspan="2"><a href="#use-case-3-egocentric-video-dataset-recording"><b>Egocentric Human Video Recording</b></a></td> <td>Record first-person manipulation videos with synchronized tracking</td> <td>UVC camera + Developer Strap</td> </tr> </table>Installations
Installing is easy: install it from the App Store and PyPI.
| Component | Installation |
|-----------|-------------|
| Tracking Streamer (VisionOS) | Install from App Store |
| Tracking Manager (iOS) | Install from App Store |
| avp_stream (Python) | pip install --upgrade avp_stream |
No other network configurations are required. Everything should work out of the box after installation. An easy way to get onboarded is to go through the examples folder. All examples should work out of the box without any extra configurations required.
Note: Some examples demonstrate teleoperating things within IsaacLab world; since IsaacLab is an extremely heavy dependency, I did not include that as a dependency for avp_stream. If you're trying to run examples including IsaacLab as a simulation backend, you should install things according to their official installation guide.
External Network (Remote) Mode 🆕
So far, Vision Pro and your Python client had to be on the same local network (e.g., same WiFi) for them to communicate. With External Network Mode with v2.5 release, you can make bilateral connection from anywhere over the internet! It's extremely useful when your robot is in a lab (likely behind school/company network's firewall) and you're working remotely using your home WiFi outside the school network.
| Mode | Connection Method | Use Case |
|------|-------------------|----------|
| Local Network | IP address (e.g., "192.168.1.100") | Same WiFi/LAN |
| External Network | Room code (e.g., "ABC-1234") | Different networks, over internet |
How It Works
External Network Mode uses WebRTC with TURN relay servers for NAT traversal:
- Vision Pro generates a room code and connects to a signaling server
- Python client connects using the same room code
- Signaling server facilitates the initial handshake (SDP offer/answer, ICE candidates)
- TURN servers relay media when direct peer-to-peer connection isn't possible
- Once connected, all streaming works the same as local mode
Usage
from avp_stream import VisionProStreamer
# Instead of IP address, use the room code shown on Vision Pro
s = VisionProStreamer(ip="ABC-1234")
# Everything else works exactly the same
s.configure_video(device="/dev/video0", format="v4l2", size="1280x720", fps=30)
s.start_webrtc()
while True:
r = s.get_latest()
# ...
Notes
- Latency: Expect slightly higher latency compared to local network due to relay routing
- Signaling/TURN server: We provide a Cloudflare-hosted signaling and TURN server for now by default. If we detect extreme usage or abuse, we may introduce usage limits or require a paid tier in the future.
Use Case 1: Real-World Teleoperation
Stream your robot's camera feed to Vision Pro while receiving hand/head tracking data for control. Perfect for teleoperating physical robots with visual feedback.
Video & Audio Streaming
from avp_stream import VisionProStreamer
avp_ip = "10.31.181.201" # Vision Pro IP (shown in the app)
s = VisionProStreamer(ip=avp_ip)
# Configure video streaming from robot camera
s.configure_video(device="/dev/video0", format="v4l2", size="1280x720", fps=30)
s.start_webrtc()
while True:
r = s.get_latest()
# Use tracking data to control your robot
head_pose = r['head']
right_wrist = r['right_wrist']
right_fingers = r['right_fingers']
Video Configuration Examples
Camera with overlay processing:
def add_overlay(frame):
return cv2.putText(frame, "Robot View", (50, 50),
cv2.FONT_HERSHEY_SIMPLEX, 1, (0, 255, 0), 2)
s = VisionProStreamer(ip=avp_ip)
s.register_frame_callback(add_overlay)
s.configure_video(device="/dev/video0", format="v4l2", size="640x480", fps=30)
s.start_webrtc()
Stereo camera (side-by-side 3D):
s = VisionProStreamer(ip=avp_ip)
s.configure_video(device="/dev/video0", format="v4l2", size="1920x1080", fps=30, stereo=True)
s.start_webrtc()
Synthetic video (generated frames):
s = VisionProStreamer(ip=avp_ip)
s.register_frame_callback(render_visualization) # Your rendering function
s.configure_video(size="1280x720", fps=60) # No device = synthetic mode
s.start_webrtc()
Audio Configuration Examples
With microphone input:
s = Vi
