Vibium
Browser automation for AI agents and humans
Install / Use
/learn @VibiumDev/VibiumREADME
Vibium
Browser automation for AI agents and humans.
Vibium gives AI agents a browser. Install the vibium skill and your agent can navigate pages, fill forms, click buttons, and take screenshots — all through simple CLI commands. Also available as an MCP server and as JS/TS, Python, and Java client libraries.
New here? Get started in JavaScript, Python, or Java — zero to hello world in 5 minutes.
Why Vibium?
- AI-native. Install as a skill — your agent learns the full browser automation toolkit instantly.
- Zero config. One install, browser downloads automatically, visible by default.
- Standards-based. Built on WebDriver BiDi, not proprietary protocols controlled by large corporations.
- Lightweight. Single ~10MB binary. No runtime dependencies.
- Flexible. Use as a CLI skill, MCP server, or JS/Python/Java library.
Agent Setup
npm install -g vibium
npx skills add https://github.com/VibiumDev/vibium --skill vibe-check
The first command installs Vibium and the vibium binary, and downloads Chrome. The second installs the skill to {project}/.agents/skills/vibium.
skillsis the open agent skills CLI — a package manager for AI agent skills. No global install needed;npxruns it directly.
CLI Quick Reference
# Map & interact (the core workflow)
vibium go https://var.parts # navigate to URL
vibium map # map interactive elements → @e1, @e2, ...
vibium click @e1 # click using ref
vibium diff map # see what changed
# Find elements (semantic — no CSS needed)
vibium find text "Sign In" # find by visible text
vibium find label "Email" # find by form label
vibium find placeholder "Search" # find by placeholder
vibium find role button # find by ARIA role
# Read & capture
vibium text # get all page text
vibium screenshot -o page.png # capture screenshot
vibium screenshot --annotate -o a.png # annotated with element labels
vibium pdf -o page.pdf # save page as PDF
vibium eval "document.title" # run JavaScript
# Wait for things
vibium wait ".modal" # wait for element to appear
vibium wait url "/dashboard" # wait for URL change
vibium wait text "Success" # wait for text on page
# Record sessions
vibium record start # record with screenshots
vibium record stop # stop and save to record.zip
# Forms & input
vibium fill @e2 "hello@example.com" # fill input using ref
vibium select @e3 "US" # pick dropdown option
vibium check @e4 # check a checkbox
vibium press Enter # press a key
Full command list: SKILL.md
Alternative: MCP server (for structured tool use instead of CLI):
claude mcp add vibium -- npx -y vibium mcp # Claude Code
gemini mcp add vibium npx -y vibium mcp # Gemini CLI
See MCP setup guide for options and troubleshooting.
Language APIs
npm install vibium # JavaScript/TypeScript
pip install vibium # Python
Java (Gradle):
implementation 'com.vibium:vibium:26.3.18'
Java (Maven):
<dependency>
<groupId>com.vibium</groupId>
<artifactId>vibium</artifactId>
<version>26.3.18</version>
</dependency>
This installs the Vibium binary and downloads Chrome automatically. No manual browser setup required.
JS/TS Client
Async API:
import { browser } from 'vibium'
const bro = await browser.start()
const vibe = await bro.page()
await vibe.go('https://example.com')
const png = await vibe.screenshot()
await fs.writeFile('screenshot.png', png)
const link = await vibe.find('a')
await link.click()
await bro.stop()
Sync API:
const { browser } = require('vibium/sync')
const fs = require('fs')
const bro = browser.start()
const vibe = bro.page()
vibe.go('https://example.com')
const png = vibe.screenshot()
fs.writeFileSync('screenshot.png', png)
const link = vibe.find('a')
link.click()
bro.stop()
Python Client
# Async
from vibium.async_api import browser
# Sync (default)
from vibium import browser
Async API:
import asyncio
from vibium.async_api import browser
async def main():
bro = await browser.start()
vibe = await bro.page()
await vibe.go("https://example.com")
png = await vibe.screenshot()
with open("screenshot.png", "wb") as f:
f.write(png)
link = await vibe.find("a")
await link.click()
await bro.stop()
asyncio.run(main())
Sync API:
from vibium import browser
bro = browser.start()
vibe = bro.page()
vibe.go("https://example.com")
png = vibe.screenshot()
with open("screenshot.png", "wb") as f:
f.write(png)
link = vibe.find("a")
link.click()
bro.stop()
Java Client
var bro = Vibium.start();
var vibe = bro.page();
vibe.go("https://example.com");
var png = vibe.screenshot();
Files.write(Path.of("screenshot.png"), png);
var link = vibe.find("a");
link.click();
bro.stop();
Architecture
┌──────────────────────────────────────┐
│ LLM / Agent │
│ (Claude Code, Codex, Gemini, etc.) │
└──────────────────────────────────────┘
▲ ▲
│ CLI (Bash) │ MCP (stdio)
▼ ▼
┌───────────────────────────────────┐
│ Vibium binary │
│ │
│ ┌──────────────┐ ┌────────────┐ │
│ │ CLI Commands │ │ MCP Server │ │
│ └─────┬────────┘ └──────┬─────┘ │ ┌──────────────────┐
│ └───────▲─────────┘ │ │ │
│ │ │ │ │
│ ┌──────▼───────┐ │ BiDi │ Chrome Browser │
│ │ BiDi Proxy │ │◄──────►│ │
│ └──────────────┘ │ │ │
└───────────────────────────────────┘ └──────────────────┘
▲
│ WebSocket BiDi :9515
▼
┌──────────────────────────────────────┐
│ Client Libraries │
│ (js/ts | python | java) │
│ │
│ ┌─────────────────┐ ┌────────────┐ │
│ │ Async API │ │ Sync API │ │
│ │ await vibe.go() │ │ vibe.go() │ │
│ └─────────────────┘ └────────────┘ │
└──────────────────────────────────────┘
Platform Support
| Platform | Architecture | Status | |----------|--------------|--------| | Linux | x64 | ✅ Supported | | macOS | x64 (Intel) | ✅ Supported | | macOS | arm64 (Apple Silicon) | ✅ Supported | | Windows | x64 | ✅ Supported |
Contributing
See CONTRIBUTING.md for development setup and guidelines.
Roadmap
V1 focuses on the core loop: browser control via CLI, MCP, and client libraries.
See ROADMAP.md for planned features:
- Cortex (memory/navigation layer)
- Retina (recording extension)
- Video recording
- AI-powered locators
License
Apache 2.0
Related Skills
node-connect
344.4kDiagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps
frontend-design
99.2kCreate distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.
openai-whisper-api
344.4kTranscribe audio via OpenAI Audio Transcriptions API (Whisper).
qqbot-media
344.4kQQBot 富媒体收发能力。使用 <qqmedia> 标签,系统根据文件扩展名自动识别类型(图片/语音/视频/文件)。
