Droidclaw
turn old phones into ai agents - give it a goal in plain english. it reads the screen, thinks about what to do, taps and types via adb, and repeats until the job is done.
Install / Use
/learn @unitedbyai/DroidclawREADME
droidclaw
an ai agent that controls your android phone. give it a goal in plain english — it figures out what to tap, type, and swipe.
Download Android APK (v0.5.3) | Dashboard | Discord
i wanted to turn my old android devices into ai agents. after a few hours reverse engineering accessibility trees and playing with tailscale.. it worked.
think of it this way — a few years back, we could automate android with predefined flows. now imagine that automation layer has an llm brain. it can read any screen, understand what's happening, decide what to do, and execute. you don't need api's. you don't need to build integrations. just install your favourite apps and tell the agent what you want done.
one of the coolest things it can do right now is delegate incoming requests to chatgpt, gemini, or google search on the device... and bring the result back. no api keys for those services needed — it just uses the apps like a human would.
$ bun run src/kernel.ts
enter your goal: open youtube and search for "lofi hip hop"
--- step 1/30 ---
think: i'm on the home screen. launching youtube.
action: launch (842ms)
--- step 2/30 ---
think: youtube is open. tapping search icon.
action: tap (623ms)
--- step 3/30 ---
think: search field focused.
action: type "lofi hip hop" (501ms)
--- step 4/30 ---
action: enter (389ms)
--- step 5/30 ---
think: search results showing. done.
action: done (412ms)
how it works
the core idea is dead simple — a perception → reasoning → action loop that repeats until the goal is done (or it runs out of steps).
┌─────────────────────────────────────────┐
│ your goal │
│ "send good morning to mom on whatsapp"│
└────────────────┬────────────────────────┘
│
▼
┌─────────────────────────────────────────────────┐
│ │
│ ┌──────────────┐ │
│ │ 1. perceive │ │
│ └──────┬───────┘ │
│ │ │
│ dump accessibility tree via adb │
│ parse xml → interactive ui elements │
│ diff with previous screen (detect changes) │
│ optionally capture screenshot │
│ │ │
│ ▼ │
│ ┌──────────────┐ │
│ │ 2. reason │ │
│ └──────┬───────┘ │
│ │ │
│ send screen state + goal + history to llm │
│ llm returns { think, plan, action } │
│ "i see the search icon at (890, 156). │
│ i should tap it." │
│ │ │
│ ▼ │
│ ┌──────────────┐ │
│ │ 3. act │ │
│ └──────┬───────┘ │
│ │ │
│ execute via adb: tap, type, swipe, etc. │
│ feed result back to llm on next step │
│ check if goal is done │
│ │ │
│ ▼ │
│ done? ─────── yes ──→ exit │
│ │ │
│ no │
│ │ │
│ └─────── loop back to perceive │
│ │
└─────────────────────────────────────────────────┘
what makes it not fall apart
llms controlling ui's sounds fragile. and it is, if you don't handle the failure modes. here's what droidclaw does:
- stuck loop detection — if the screen doesn't change for 3 steps, recovery hints get injected into the prompt. context-aware hints based on what type of action is failing (tap vs swipe vs wait).
- repetition tracking — a sliding window of recent actions catches retry loops even across screen changes. if the agent taps the same coordinates 3+ times, it gets told to stop and try something else.
- drift detection — if the agent spams navigation actions (swipe, back, wait) without interacting with anything, it gets nudged to take direct action.
- vision fallback — when the accessibility tree is empty (webviews, flutter apps, games), a screenshot gets sent to the llm instead, with coordinate-based tap suggestions.
- action feedback — every action result (success/failure + message) gets fed back to the llm on the next step. the agent knows whether its last move worked.
- multi-turn memory — conversation history is maintained across steps so the llm has context about what it already tried.
setup
quick install
curl -fsSL https://droidclaw.ai/install.sh | sh
this installs bun and adb if missing, clones the repo, and sets up .env.
manual install
prerequisites:
- bun (required — node/npm won't work. droidclaw uses bun-specific apis like
Bun.spawnSyncand native.envloading) - adb (android debug bridge — comes with android sdk platform tools)
- an android phone with usb debugging enabled
- an llm provider api key (or ollama for fully local)
# install adb
# macos:
brew install android-platform-tools
# linux:
sudo apt install android-tools-adb
# windows:
# download from https://developer.android.com/tools/releases/platform-tools
# install bun
curl -fsSL https://bun.sh/install | bash
# clone and setup
git clone https://github.com/unitedbyai/droidclaw.git
cd droidclaw
bun install
cp .env.example .env
configure your llm
edit .env and pick a provider. fastest way to start is groq (free tier):
LLM_PROVIDER=groq
GROQ_API_KEY=gsk_your_key_here
or run fully local with ollama (no api key, no internet needed):
ollama pull llama3.2
# then in .env:
LLM_PROVIDER=ollama
OLLAMA_MODEL=llama3.2
connect your phone
- go to settings → about phone → tap "build number" 7 times to enable developer options
- go to settings → developer options → enable "usb debugging"
- plug in via usb and tap "allow" on the phone when prompted
adb devices # should show your device
run it
bun run src/kernel.ts
# type your goal and press enter
three ways to use it
droidclaw has three modes, each for a different use case:
┌─────────────────────────────────────────────────────────────────────┐
│ │
│ interactive mode workflows flows │
│ ───────────────── ───────────────── ───────────────── │
│ │
│ type a goal and chain goals fixed sequences │
│ the agent figures across multiple of taps and types. │
│ it out on the fly. apps with ai. no llm, instant. │
│ │
│ $ bun run --workflow --flow │
│ src/kernel.ts file.json file.yaml │
│ │
│ best for: best for: best for: │
│ one-off tasks, multi-app tasks, things you do │
│ exploration, recurring routines, exactly the same │
│ quick commands morning briefings way every time │
│ │
│ uses llm: yes uses llm: yes uses llm: no │
│ │
└─────────────────────────────────────────────────────────────────────┘
interactive mode
just type what you want:
bun run src/kernel.ts
# enter your goal: open settings and turn on dark mode
workflows (ai-powered, multi-app)
workflows are json files describing a sequence of sub-goals. each step can optionally switch to a different app. the llm decides how to navigate, what to tap, what to type.
bun run src/kernel.ts --workflow examples/workflows/research/weather-to-whatsapp.json
{
"name": "weather to whatsapp",
"steps": [
{
"app": "com.google.android.googlequicksearchbox",
"goal": "search for chennai weather today"
},
{
"goal": "share the result to whatsapp contact Sanju"
}
]
}
you can inject specific data into steps using formData:
