VisionClaw
Real-time AI assistant for Meta Ray-Ban smart glasses -- voice + vision + agentic actions via Gemini Live and OpenClaw
Install / Use
/learn @Intent-Lab/VisionClawREADME
VisionClaw

A real-time AI assistant for Meta Ray-Ban smart glasses. See what you see, hear what you say, and take actions on your behalf -- all through voice.

Built on Meta Wearables DAT SDK (iOS) / DAT Android SDK (Android) + Gemini Live API + OpenClaw (optional).
Supported platforms: iOS (iPhone) and Android (Pixel, Samsung, etc.)
What It Does
Put on your glasses, tap the AI button, and talk:
- "What am I looking at?" -- Gemini sees through your glasses camera and describes the scene
- "Add milk to my shopping list" -- delegates to OpenClaw, which adds it via your connected apps
- "Send a message to John saying I'll be late" -- routes through OpenClaw to WhatsApp/Telegram/iMessage
- "Search for the best coffee shops nearby" -- web search via OpenClaw, results spoken back
The glasses camera streams at ~1fps to Gemini for visual context, while audio flows bidirectionally in real-time.
How It Works

Meta Ray-Ban Glasses (or phone camera)
|
| video frames + mic audio
v
iOS / Android App (this project)
|
| JPEG frames (~1fps) + PCM audio (16kHz)
v
Gemini Live API (WebSocket)
|
|-- Audio response (PCM 24kHz) --> App --> Speaker
|-- Tool calls (execute) -------> App --> OpenClaw Gateway
| |
| v
| 56+ skills: web search,
| messaging, smart home,
| notes, reminders, etc.
| |
|<---- Tool response (text) <----- App <-------+
|
v
Gemini speaks the result
Key pieces:
- Gemini Live -- real-time voice + vision AI over WebSocket (native audio, not STT-first)
- OpenClaw (optional) -- local gateway that gives Gemini access to 56+ tools and all your connected apps
- Phone mode -- test the full pipeline using your phone camera instead of glasses
- WebRTC streaming -- share your glasses POV live to a browser viewer
Quick Start (iOS)
1. Clone and open
git clone https://github.com/sseanliu/VisionClaw.git
cd VisionClaw/samples/CameraAccess
open CameraAccess.xcodeproj
2. Add your secrets
Copy the example file and fill in your values:
cp CameraAccess/Secrets.swift.example CameraAccess/Secrets.swift
Edit Secrets.swift with your Gemini API key (required) and optional OpenClaw/WebRTC config.
3. Build and run
Select your iPhone as the target device and hit Run (Cmd+R).
4. Try it out
Without glasses (iPhone mode):
- Tap "Start on iPhone" -- uses your iPhone's back camera
- Tap the AI button to start a Gemini Live session
- Talk to the AI -- it can see through your iPhone camera
With Meta Ray-Ban glasses:
First, enable Developer Mode in the Meta AI app:
- Open the Meta AI app on your iPhone
- Go to Settings (gear icon, bottom left)
- Tap App Info
- Tap the App version number 5 times -- this unlocks Developer Mode
- Go back to Settings -- you'll now see a Developer Mode toggle. Turn it on.

Then in VisionClaw:
- Tap "Start Streaming" in the app
- Tap the AI button for voice + vision conversation
Quick Start (Android)
1. Clone and open
git clone https://github.com/sseanliu/VisionClaw.git
Open samples/CameraAccessAndroid/ in Android Studio.
2. Configure GitHub Packages (DAT SDK)
The Meta DAT Android SDK is distributed via GitHub Packages. You need a GitHub Personal Access Token with read:packages scope.
- Go to GitHub > Settings > Developer Settings > Personal Access Tokens and create a classic token with
read:packagesscope - In
samples/CameraAccessAndroid/local.properties, add:
github_token=YOUR_GITHUB_TOKEN
Tip: If you have the
ghCLI installed, you can rungh auth tokento get a valid token. Make sure it hasread:packagesscope -- if not, rungh auth refresh -s read:packages.Note: GitHub Packages requires authentication even for public repositories. The 401 error means your token is missing or invalid.
3. Add your secrets
cd samples/CameraAccessAndroid/app/src/main/java/com/meta/wearable/dat/externalsampleapps/cameraaccess/
cp Secrets.kt.example Secrets.kt
Edit Secrets.kt with your Gemini API key (required) and optional OpenClaw/WebRTC config.
4. Build and run
- Let Gradle sync in Android Studio (it will download the DAT SDK from GitHub Packages)
- Select your Android phone as the target device
- Click Run (Shift+F10)
Wireless debugging: You can also install via ADB wirelessly. Enable Wireless debugging in your phone's Developer Options, then pair with
adb pair <ip>:<port>.
5. Try it out
Without glasses (Phone mode):
- Tap "Start on Phone" -- uses your phone's back camera
- Tap the AI button (sparkle icon) to start a Gemini Live session
- Talk to the AI -- it can see through your phone camera
With Meta Ray-Ban glasses:
Enable Developer Mode in the Meta AI app (same steps as iOS above), then:
- Tap "Start Streaming" in the app
- Tap the AI button for voice + vision conversation
Setup: OpenClaw (Optional)
OpenClaw gives Gemini the ability to take real-world actions: send messages, search the web, manage lists, control smart home devices, and more. Without it, Gemini is voice + vision only.
1. Install and configure OpenClaw
Follow the OpenClaw setup guide. Make sure the gateway is enabled:
In ~/.openclaw/openclaw.json:
{
"gateway": {
"port": 18789,
"bind": "lan",
"auth": {
"mode": "token",
"token": "your-gateway-token-here"
},
"http": {
"endpoints": {
"chatCompletions": { "enabled": true }
}
}
}
}
Key settings:
bind: "lan"-- exposes the gateway on your local network so your phone can reach itchatCompletions.enabled: true-- enables the/v1/chat/completionsendpoint (off by default)auth.token-- the token your app will use to authenticate
2. Configure the app
iOS -- In Secrets.swift:
static let openClawHost = "http://Your-Mac.local"
static let openClawPort = 18789
static let openClawGatewayToken = "your-gateway-token-here"
Android -- In Secrets.kt:
const val openClawHost = "http://Your-Mac.local"
const val openClawPort = 18789
const val openClawGatewayToken = "your-gateway-token-here"
To find your Mac's Bonjour hostname: System Settings > General > Sharing -- it's shown at the top (e.g., Johns-MacBook-Pro.local).
Both iOS and Android also have an in-app Settings screen where you can change these values at runtime without editing source code.
3. Start the gateway
openclaw gateway restart
Verify it's running:
curl http://localhost:18789/health
Now when you talk to the AI, it can execute tasks through OpenClaw.
Architecture
Key Files (iOS)
All source code is in samples/CameraAccess/CameraAccess/:
| File | Purpose |
|------|---------|
| Gemini/GeminiConfig.swift | API keys, model config, system prompt |
| Gemini/GeminiLiveService.swift | WebSocket client for Gemini Live API |
| Gemini/AudioManager.swift | Mic capture (PCM 16kHz) + audio playback (PCM 24kHz) |
| Gemini/GeminiSessionViewModel.swift | Session lifecycle, tool call wiring, transcript state |
| OpenClaw/ToolCallModels.swift | Tool declarations, data types |
| OpenClaw/OpenClawBridge.swift | HTTP client for OpenClaw gateway |
| OpenClaw/ToolCallRouter.swift | Routes Gemini tool calls to OpenClaw |
| iPhone/IPhoneCameraManager.swift | AVCaptureSession wrapper for iPhone camera mode |
| WebRTC/WebRTCClient.swift | WebRTC peer connection + SDP negotiation |
| WebRTC/SignalingClient.swift | WebSocket signaling for WebRTC rooms |
Key Files (Android)
All source code is in samples/CameraAccessAndroid/app/src/main/java/.../cameraaccess/:
| File | Purpose |
|------|---------|
| gemini/GeminiConfig.kt | API keys, model config, system prompt |
| gemini/GeminiLiveService.kt | OkHttp WebSocket client for Gemini Live API |
| gemini/AudioManager.kt | AudioRecord (16kHz) + AudioTrack (24kHz) |
| gemini/GeminiSessionViewModel.kt | Session lifecycle, tool call wiring, UI state |
| openclaw/ToolCallModels.kt | Tool declarations, data classes |
| openclaw/OpenClawBridge.kt | OkHttp HTTP client for OpenClaw gateway |
| openclaw/ToolCallRouter.kt | Routes Gemini tool calls to OpenClaw |
| phone/PhoneCameraManager.kt | CameraX wrapper for phone camera mode |
| webrtc/WebRTCClient.kt | WebRTC peer connection (stream-webrtc-android) |
| webrtc/SignalingClient.kt | OkHttp WebSocket signaling for WebRTC rooms |
| settings/SettingsManager.kt | SharedPreferences with Secrets.kt fallback |
Audio Pipeline
- Input: Phone mic -> AudioManager (PCM Int16, 16kHz mono, 100ms chunks) -> Gemini WebSocket
- Output: Gemini WebSocket -> AudioManager playback queue -> Phone speaker
- iOS iPhone mode: Uses
.voiceChataudio session for echo cancellation + mic gating during AI speech - iOS Glasses mode: Uses
.videoChataudio session (mic is on glasses, speaker is on phone -- no echo) - Android: Uses
VOICE_COMMUNICATIONaudio source for
Related Skills
node-connect
339.5kDiagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps
frontend-design
83.9kCreate distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.
openai-whisper-api
339.5kTranscribe audio via OpenAI Audio Transcriptions API (Whisper).
commit-push-pr
83.9kCommit, push, and open a PR
Security Score
Audited on Mar 29, 2026
