PhoneAgent
An AI agent that can get things done across iPhone apps.
Install / Use
/learn @rounak/PhoneAgentREADME
PhoneAgent
PhoneAgent is an experimental mobile automation project with two operating modes:
- In-app iPhone agent (SwiftUI app + XCTest runner + OpenAI Responses API)
- External bridge to let Codex/OpenClaw control iOS/Android devices
The bridge supports both:
- iOS (XCTest-hosted actions against simulator or physical iPhone)
- Android (adb + UiAutomator + input/screencap actions against emulator or device)
Demo
- Self contained iOS app
- OpenClaw controlling an iPhone
- OpenClaw controlling an Android phone
- Codex controlling an iPhone
What This Repo Includes
- iOS app UI: API key entry, prompt input, microphone support, settings, always-on wake-word mode
- iOS test-hosted RPC server: newline-delimited JSON-RPC on port
45678 - Android RPC server: localhost JSON-RPC bridge backed by
adbcommands - Helper scripts:
- iOS bridge launcher + physical-device localhost forwarding
- Android bridge launcher (with adb auto-discovery)
- generic RPC CLI (
rpc.py) for both platforms
Capabilities
Shared RPC action surface (iOS + Android)
get_treeget_screen_imageget_contextset_api_keyopen_apptaptap_elemententer_textscrollswipestop
iOS-only RPC method
submit_prompt
submit_prompt powers the in-app iPhone agent loop.
In-app iPhone agent features
- OpenAI API key stored in Keychain
- Prompt submission from keyboard or microphone
- Optional always-on mode with custom wake word
- Notification completion + quick-reply follow-up loop
Requirements
macOS host
- Xcode (for iOS app/UITest bridge)
- Python 3
- Android SDK tools (
adb) for Android bridge
Devices
- iOS simulator or physical iPhone (Developer setup)
- Android emulator or physical Android device (USB debugging or wireless debugging)
Quick Start (In-App iPhone Agent)
- Open
/Users/rounak/Developer/PhoneAgent-cli/PhoneAgent.xcodeprojin Xcode. - Run the
PhoneAgentscheme on an iPhone/simulator. - Enter OpenAI API key when prompted.
- Submit tasks via keyboard or microphone.
Quick Start (AI Agents)
For Codex/OpenClaw usage, use the skill docs:
Send RPC calls
# iOS bundle identifier example
./.agents/skills/phoneagent/scripts/rpc.py open-app com.apple.Preferences
# Android package example
./.agents/skills/phoneagent/scripts/rpc.py open-app com.android.settings
# Fetch tree
./.agents/skills/phoneagent/scripts/rpc.py get-tree
# Capture screenshot (writes PNG under /tmp/phoneagent-artifacts)
./.agents/skills/phoneagent/scripts/rpc.py get-screen-image --print-metadata
The CLI supports --host and --port if you need non-default endpoint settings.
RPC Notes
- Transport: newline-delimited JSON-RPC objects
- Endpoint:
127.0.0.1:45678by default open_apprequest parameter isbundle_identifier:- iOS: pass bundle identifier (e.g.
com.apple.Preferences) - Android: pass package name (e.g.
com.android.settings)
- iOS: pass bundle identifier (e.g.
tap_element/enter_textuse coordinate rectangles in format{{x, y}, {w, h}}
Common App Identifiers
iOS
- Settings:
com.apple.Preferences - Camera:
com.apple.camera - Photos:
com.apple.mobileslideshow - Messages:
com.apple.MobileSMS - Home Screen:
com.apple.springboard
Wireless Android (No USB)
# Pair (from Wireless debugging screen)
adb pair <PHONE_IP:PAIRING_PORT>
# Connect (from Wireless debugging screen)
adb connect <PHONE_IP:ADB_PORT>
# Verify
adb devices -l
Then start Android bridge with that network serial:
./.agents/skills/phoneagent/scripts/start_android_rpc_bridge_local.sh --serial <PHONE_IP:ADB_PORT>
Security Model
- RPC bridge is localhost-oriented (
127.0.0.1) - iOS physical-device workflow uses localhost forwarding
- Android bridge executes only through selected
adbserial - In-app API key is stored in iOS Keychain
Repository Pointers
- iOS app entry:
PhoneAgent/PhoneAgentApp.swift - iOS app UI/state:
PhoneAgent/ContentView.swift,PhoneAgent/PromptView.swift,PhoneAgent/SettingsView.swift - iOS bridge server:
PhoneAgentUITests/SimulatorRPCServer.swift,PhoneAgentUITests/PhoneAgent.swift - RPC CLI:
.agents/skills/phoneagent/scripts/rpc.py - iOS bridge launcher:
.agents/skills/phoneagent/scripts/start_rpc_bridge_local.sh - Android bridge launcher:
.agents/skills/phoneagent/scripts/start_android_rpc_bridge_local.sh - Android bridge server:
.agents/skills/phoneagent/scripts/android_rpc_bridge.py
Known Limitations
- Android bridge does not yet implement
submit_promptagent loop - UI tree snapshots can be noisy/stale during animations
- Keyboard/text reliability can vary by app and platform
- Long-running tasks may require explicit polling/retries
Disclaimer
- Experimental software
- Personal project
- App contents may be sent to OpenAI API when using agent flow
- Model/tool actions can be incorrect; verify important operations
Related Skills
node-connect
343.1kDiagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps
frontend-design
90.0kCreate distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.
openai-whisper-api
343.1kTranscribe audio via OpenAI Audio Transcriptions API (Whisper).
qqbot-media
343.1kQQBot 富媒体收发能力。使用 <qqmedia> 标签,系统根据文件扩展名自动识别类型(图片/语音/视频/文件)。
