QuestCameraKit

QuestVisionKit is a collection of template and reference projects demonstrating how to use Meta Quest’s new Passthrough Camera API for advanced AR/VR vision, tracking, and shader effects.

Generate Convert Improve

Install / Use

/learn @xrdevrob/QuestCameraKit

About this skill

Quality Score

0/100

README

QuestCameraKit is a collection of template and reference projects demonstrating how to use Meta Quest’s new Passthrough Camera API (PCA) for advanced AR/VR vision, tracking, and shader effects.

PCA Samples
Update Notes
Getting Started with PCA
Troubleshooting & Known Issues
Community Contributions
News
Acknowledgements & Credits
License
Contact

PCA Samples

1. 🎨 Color Picker

Purpose: Convert a 3D point in space to its corresponding 2D image pixel.
Description: This sample shows the mapping between 3D space and 2D image coordinates using the Passthrough Camera API. We use MRUK's EnvironmentRaycastManager to determine a 3D point in our environment and map it to the location on our WebcamTexture. We then extract the pixel on that point, to determine the color of a real world object.

<details> <summary>How to run this sample</summary>

Open the ColorPicker scene.
Build the scene and run the APK on your headset.
Aim the ray onto a surface in your real space and press the A button or pinch your fingers to observe the cube changing its color to the color in your real environment.

</details>

Color Picker

2. 🍎 Object Detection with Unity Inference Engine

Purpose: Convert 2D screen coordinates into their corresponding 3D points in space.
Description: Use the Unity Inference Engine framework to infer different ML models to detect and track objects. Learn how to convert detected image coordinates (e.g. bounding boxes) back into 3D points for dynamic interaction within your scenes. In this sample you will also see how to filter labels. This means e.g. you can only detect humans and pets, to create a more safe play-area for your VR game. The sample video below is filtered to monitor, person and laptop. The sample is running at around 60 fps.

<details> <summary>How to run this sample</summary>

Open the ObjectDetection scene.
Install Unity AI Inference (use com.unity.ai.inference@2.3.0).
Select the labels you want to track. Leaving the list empty tracks all objects. <details> <summary>Show all available labels</summary> <table> <tr><td>person</td><td>bicycle</td><td>car</td><td>motorbike</td><td>aeroplane</td><td>bus</td><td>train</td><td>truck</td></tr> <tr><td>boat</td><td>traffic light</td><td>fire hydrant</td><td>stop sign</td><td>parking meter</td><td>bench</td><td>bird</td><td>cat</td></tr> <tr><td>dog</td><td>horse</td><td>sheep</td><td>cow</td><td>elephant</td><td>bear</td><td>zebra</td><td>giraffe</td></tr> <tr><td>backpack</td><td>umbrella</td><td>handbag</td><td>tie</td><td>suitcase</td><td>frisbee</td><td>skis</td><td>snowboard</td></tr> <tr><td>sports ball</td><td>kite</td><td>baseball bat</td><td>baseball glove</td><td>skateboard</td><td>surfboard</td><td>tennis racket</td><td>bottle</td></tr> <tr><td>wine glass</td><td>cup</td><td>fork</td><td>knife</td><td>spoon</td><td>bowl</td><td>banana</td><td>apple</td></tr> <tr><td>sandwich</td><td>orange</td><td>broccoli</td><td>carrot</td><td>hot dog</td><td>pizza</td><td>donut</td><td>cake</td></tr> <tr><td>chair</td><td>sofa</td><td>pottedplant</td><td>bed</td><td>diningtable</td><td>toilet</td><td>tvmonitor</td><td>laptop</td></tr> <tr><td>mouse</td><td>remote</td><td>keyboard</td><td>cell phone</td><td>microwave</td><td>oven</td><td>toaster</td><td>sink</td></tr> <tr><td>refrigerator</td><td>book</td><td>clock</td><td>vase</td><td>scissors</td><td>teddy bear</td><td>hair drier</td><td>toothbrush</td></tr> </table> </details>
Build and deploy to Quest. Use the trigger to scan the environment; markers will appear for detections above the configured confidence threshold.

</details>

Object Detection

3. 📱 QR Code Tracking with ZXing

Purpose: Detect and track QR codes in real time. Open webviews or log-in to 3rd party services with ease.
Description: Similarly to the object detection sample, get QR code coordinated and projects them into 3D space. Detect QR codes and call their URLs. You can select between a multiple or single QR code mode. The sample is running at around 70 fps for multiple QR codes and a stable 72 fps for a single code. Users are able to choose between CenterOnly and PerCorner raycasting modes via an enum in the inspector. This enables more accurate rotation tracking for use cases that require it (PerCorner), while preserving a faster fallback (CenterOnly).

<details> <summary>How to run this sample</summary>

Open the QRCodeTracking scene.
Ensure ZXing DLLs are present (the editor script auto-adds the ZXING_ENABLED define).
Choose Single or Multiple detection mode and the raycast mode (CenterOnly vs PerCorner).
Build to Quest, point the headset toward QR codes, and interact with the spawned markers.

</details>

QR Code Tracking

4. 🪟 Shader Samples

Purpose: Apply stereo passthrough camera-mapped shader effects to virtual surfaces.
Description: The shader sample is now consolidated into one scene that uses left/right passthrough feeds and per-eye calibration data. Current materials and shaders included in this flow are StereoPassthroughCameraMapping, StereoPassthroughFrostedGlass, and StereoPassthroughWavyPortal.

<details> <summary>How to run this sample</summary>

Open the CameraMappingForShaders scene.
Make sure your passthrough setup is active and the scene has both left and right PassthroughCameraAccess components.
Build to Quest and run. Interact with the sample objects using the stereo shader materials to test camera mapping, frosted glass, and wavy portal effects.

</details>

Shader Samples

5. 🧠 OpenAI vision model

Purpose: Ask OpenAI's vision model (or any other multi-modal LLM) for context of your current scene.
Description: We use a the OpenAI Speech to text API to create a command. We then send this command together with a screenshot to the Vision model. Lastly, we get the response back and use the Text to speech API to turn the response text into an audio file in Unity to speak the response. The user can select different speakers, models, and speed. For the command we can add additional instructions for the model, as well as select an image, image & text, or just a text mode. The whole loop takes anywhere from 2-6 seconds, depending on the internet connection.

<details> <summary>How to run this sample</summary>

Open the ImageLLM scene.
Create an OpenAI API key and enter it on the OpenAI Manager prefab.
Select your desired model and optionally give the LLM additional instructions.
Ensure your Quest headset is connected to a fast/stable network.
Build the scene and run the APK on your headset.
Use the voice input (controller or hand gesture) to issue commands; the headset captures a PCA frame and plays back the LLM response via TTS.

[!NOTE] File uploads are currently limited to 25 MB and the following input formats are supported: mp3, mp4, mpeg, mpga, m4a, wav, webm.

You can send commands and receive results in any of these languages:

<details> <summary>Show all supported languages</summary> <table> <tr> <td>Afrikaans</td> <td>Arabic</td> <td>Armenian</td> <td>Azerbaijani</td> <td>Belarusian</td> <td>Bosnian</td> <td>Bulgarian</td> <td>Catalan</td> <td>Chinese</td> </tr> <tr> <td>Croatian</td> <td>Czech</td> <td>Danish</td> <td>Dutch</td> <td>English</td> <td>Estonian</td> <td>Finnish</td> <td>French</td> <td>Galician</td> </tr> <tr> <td>German</td> <td>Greek</td> <td>Hebrew</td> <td>Hindi</td> <td>Hungarian</td> <td>Icelandic</td> <td>Indonesian</td> <td>Italian</td> <td>Japanese</td> </tr> <tr> <td>Kannada</td> <td>Kazakh</td> <td>Korean</td> <td>Latvian</td> <td>Lithuanian</td> <td>Macedonian</td> <td>Malay</td> <td>Marathi</td> <td>Maori</td> </tr> <tr> <td>Nepali</td> <td>Norwegian</td> <td>Persian</td> <td>Polish</td> <td>Portuguese</td> <td>Romanian</td> <td>Russian</td> <td>Serbian</td> <td>Slovak</td> </tr> <tr> <td>Slovenian</td> <td>Spanish</td> <td>Swahili</td> <td>Swedish</td> <td>Tagalog</td> <td>Tamil</td> <td>Thai</td> <td>Turkish</td> <td>Ukrainian</td> </tr> <tr> <td>Urdu</td> <td>Vietnamese</td> <td>Welsh</td> <td></td> <td></td> <td></td> <td></td> <td></td> <td></td> </tr> </table> </details> </details>

https://github.com/user-attachments/assets/a4cfbfc2-0306-40dc-a9a3-cdccffa7afea

6. 🎥 WebRTC video streaming

Purpose: Stream the Passthrough Camera stream over WebRTC to another client using WebSockets.
Description: This sampl

Related Skills

node-connect

334.5k

Diagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps

frontend-design

82.2k

Create distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.

openai-whisper-api

334.5k

Transcribe audio via OpenAI Audio Transcriptions API (Whisper).

commit-push-pr

82.2k

Commit, push, and open a PR

xrdevrob

View profile

View on GitHub

GitHub Stars522

CategoryDevelopment

Updated3d ago

Forks114

xrdevrob/QuestCameraKit

Languages

Security Score

100/100

Audited on Mar 21, 2026

No findings

QuestCameraKit

Install / Use

README

Table of Contents

PCA Samples

1. 🎨 Color Picker

2. 🍎 Object Detection with Unity Inference Engine

3. 📱 QR Code Tracking with ZXing

4. 🪟 Shader Samples

5. 🧠 OpenAI vision model

6. 🎥 WebRTC video streaming

Related Skills