Matcha
Agent-native voice + vision OS for wearables. Voice-first, agent-first framework for smart glasses, earbuds, and beyond.
Install / Use
/learn @Intent-Lab/MatchaREADME
🍵 Matcha
An agent-native voice-and-vision framework. Turn any audio/visual device -- earbuds, smart glasses, pendants, phones -- into an always-on AI companion that can perceive, understand, and act on your behalf.
Built by Intentlabs.
Supported platforms: iOS (iPhone) and Android
The Problem
Today's voice AI apps (ChatGPT Voice, Gemini Live, Sesame) are conversational but not agentic. They can talk to you, but they cannot act for you. When they try to do complex tasks (search, multi-step workflows, API calls), they go silent for 10-30 seconds -- broken UX.
Meanwhile, agent frameworks (OpenClaw, Manus, Claude Code) can execute complex tasks but have no real-time voice interface.
No consumer product today combines real-time voice conversation with general-purpose agent execution. Matcha fills this gap.
Core Architecture: Dual-Agent System
Matcha separates real-time voice interaction from asynchronous task execution, allowing both to run simultaneously without blocking each other.
+-----------------------------+
| MATCHA CORE |
| |
User ---- Audio ------> | +---------------------+ |
Device Stream | | VOICE AGENT | |
(glasses, | | (synchronous) | |
earbuds, | | | |
pendant, | | Real-time voice | |
phone) | | conversation. | |
<-- Audio --- | | Always responsive. | |
Response | | Never blocked. | |
| +----------+-----------+ |
| | |
| delegates tasks |
| | |
| +----------v-----------+ |
| | ACTION AGENT | |
| | (asynchronous) | |
User ---- Video ------> | | | |
Device Frames | | Web search, API | |
(camera (~1fps) | | calls, messaging, | |
on | | smart home, etc. | |
glasses, | | | |
phone) | | Reports results | |
| | back to Voice | |
| | Agent when ready. | |
| +----------------------+ |
| |
+-----------------------------+
Voice Agent -- maintains real-time bidirectional audio with the user. Sub-second latency. Never blocked by tasks. Powered by Gemini Live API or OpenAI Realtime API.
Action Agent -- receives task delegations from Voice Agent. Executes complex, multi-step tasks in the background via either E2B cloud sandboxes (Claude Agent SDK) or OpenClaw (56+ skills: web search, messaging, smart home, notes, reminders, etc.). Reports results back to Voice Agent when ready.
Example flow:
- User: "Find me the best ramen places in SF that are open late"
- Voice Agent: "Sure, let me search for late-night ramen spots."
- Action Agent begins web search in background
- User: "Oh also, I want somewhere with vegetarian options"
- Voice Agent: "Got it, I'll filter for vegetarian-friendly places too."
- Action Agent returns results
- Voice Agent speaks the answer conversationally
The user is never left in silence. The agent is never limited to shallow answers.
Supported Hardware
Matcha is device-agnostic. It connects to any audio I/O device:
| Device | Audio In | Audio Out | Video In | Status | |--------|----------|-----------|----------|--------| | Phone (built-in) | Mic | Speaker | Camera | Working | | AirPods / earbuds | Mic | Speaker | -- | Working | | Meta Ray-Ban glasses | Mic | Speaker | Camera (via DAT SDK) | Working | | Any Bluetooth audio | Mic | Speaker | -- | Working | | Sesame glasses | Mic | Speaker | Camera | Planned | | Apple glasses | Mic | Speaker | Camera | Planned | | Pendant devices | Mic | Speaker | Camera | Planned |
Supported Voice Models
Matcha is model-agnostic:
| Provider | Model | Status | |----------|-------|--------| | Google | Gemini 2.0 Flash (Live API) | Working | | OpenAI | GPT-4o Realtime API | Planned |
Quick Start (iOS)
1. Clone and open
git clone https://github.com/Intent-Lab/matcha.git
cd matcha/samples/CameraAccess
open CameraAccess.xcodeproj
2. Add your secrets
cp CameraAccess/Secrets.swift.example CameraAccess/Secrets.swift
Edit Secrets.swift with your Gemini API key (required) and optional E2B/OpenClaw/WebRTC config.
3. Build and run
Select your iPhone as the target device and hit Run (Cmd+R).
4. Try it out
Without glasses (iPhone mode):
- Tap "Start on iPhone" -- uses your iPhone's back camera
- Tap the AI button to start a voice session
- Talk to the AI -- it can see through your iPhone camera and execute tasks
With Meta Ray-Ban glasses:
First, enable Developer Mode in the Meta AI app:
- Open the Meta AI app on your iPhone
- Go to Settings (gear icon, bottom left)
- Tap App Info
- Tap the App version number 5 times -- this unlocks Developer Mode
- Go back to Settings -- you'll now see a Developer Mode toggle. Turn it on.
Then in the app:
- Tap "Start Streaming"
- Tap the AI button for voice + vision conversation
Quick Start (Android)
1. Clone and open
git clone https://github.com/Intent-Lab/matcha.git
Open samples/CameraAccessAndroid/ in Android Studio.
2. Configure GitHub Packages (DAT SDK)
The Meta DAT Android SDK is distributed via GitHub Packages. You need a GitHub Personal Access Token with read:packages scope.
- Go to GitHub > Settings > Developer Settings > Personal Access Tokens and create a classic token with
read:packagesscope - In
samples/CameraAccessAndroid/local.properties, add:
github_token=YOUR_GITHUB_TOKEN
3. Add your secrets
cd samples/CameraAccessAndroid/app/src/main/java/com/meta/wearable/dat/externalsampleapps/cameraaccess/
cp Secrets.kt.example Secrets.kt
Edit Secrets.kt with your Gemini API key (required) and optional E2B/OpenClaw/WebRTC config.
4. Build and run
- Let Gradle sync in Android Studio
- Select your Android phone as the target device
- Click Run (Shift+F10)
5. Try it out
Without glasses (Phone mode):
- Tap "Start on Phone" -- uses your phone's back camera
- Tap the AI button to start a voice session
- Talk to the AI -- it can see through your phone camera and execute tasks
With Meta Ray-Ban glasses:
Enable Developer Mode in the Meta AI app (same steps as iOS above), then:
- Tap "Start Streaming" in the app
- Tap the AI button for voice + vision conversation
Agent Backends
Matcha supports two agent backends for task execution. You can switch between them at runtime in the in-app Settings > Agent Backend picker.
| Backend | Description | Best for |
|---------|-------------|----------|
| E2B | Cloud-hosted sandbox (E2B + Claude Agent SDK). Deploy the agent/ directory to Vercel. Supports streaming tool progress. | Production, multi-user |
| OpenClaw | Local Mac gateway with 56+ skills. Runs on your local network. | Development, personal use |
Without either backend configured, the AI is voice + vision only (no task execution).
Setup: E2B Agent (Optional)
The E2B backend runs a Claude Agent SDK sandbox in the cloud. It supports real-time streaming of tool execution progress (which tools are running, their results, etc.).
1. Deploy the agent
Deploy the agent/ directory to Vercel:
cd agent
vercel deploy
2. Configure the app
iOS -- In Secrets.swift:
static let agentBaseURL = "https://your-deployment.vercel.app"
static let agentToken = "your-shared-secret-token"
Android -- In Secrets.kt:
const val agentBaseURL = "https://your-deployment.vercel.app"
const val agentToken = "your-shared-secret-token"
3. Select the backend
Open Settings in the app and set Agent Backend to E2B.
Setup: OpenClaw (Optional)
OpenClaw gives Matcha the ability to take real-world actions: send messages, search the web, manage lists, control smart home devices, and more.
1. Install and configure OpenClaw
Follow the OpenClaw setup guide. Make sure the gateway is enabled:
In ~/.openclaw/openclaw.json:
{
"gateway": {
"port": 18789,
"bind": "lan",
"auth": {
"mode": "token",
"token": "your-gateway-token-here"
},
"http": {
"endpoints": {
"chatCompletions": { "enabled": true }
}
}
}
}
2. Configure the app
iOS -- In Secrets.swift:
static let openClawHost = "http://Your-Mac.local"
static let openClawPort = 18789
static let openClawGatewayToken = "your-gateway-token-here"
Android -- In Secrets.kt:
const val openClawHost = "http://Your-Mac.local"
const val openClawPort = 18789
const val openClawGatewayToken = "your-gateway-token-here"
3. Select the backend
Open Settings in the app and set Agent Backend to OpenClaw. You can use the Test Connection button to verify connectivity.
4. Start the gateway
openclaw gateway restart
Architecture
Project Structure (iOS)
samples/CameraAccess/
Related Skills
node-connect
343.3kDiagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps
frontend-design
92.1kCreate distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.
openai-whisper-api
343.3kTranscribe audio via OpenAI Audio Transcriptions API (Whisper).
qqbot-media
343.3kQQBot 富媒体收发能力。使用 <qqmedia> 标签,系统根据文件扩展名自动识别类型(图片/语音/视频/文件)。
