OpenAdapt

Open Source Generative Process Automation (i.e. Generative RPA). AI-First Process Automation with Large ([Language (LLMs) / Action (LAMs) / Multimodal (LMMs)] / Visual Language (VLMs)) Models

Generate Convert Improve

Install / Use

/learn @OpenAdaptAI/OpenAdapt

About this skill

Quality Score

0/100

README

OpenAdapt: AI-First Process Automation with Large Multimodal Models (LMMs)

OpenAdapt is the open source software adapter between Large Multimodal Models (LMMs) and traditional desktop and web GUIs.

Record GUI demonstrations, train ML models, and evaluate agents - all from a unified CLI.

Join us on Discord | Documentation | OpenAdapt.ai

Architecture

OpenAdapt v1.0+ uses a modular meta-package architecture. The main openadapt package provides a unified CLI and depends on focused sub-packages via PyPI:

| Package | Description | Repository | |---------|-------------|------------| | openadapt | Meta-package with unified CLI | This repo | | openadapt-capture | Event recording and storage | openadapt-capture | | openadapt-ml | ML engine, training, inference | openadapt-ml | | openadapt-evals | Benchmark evaluation | openadapt-evals | | openadapt-viewer | HTML visualization | openadapt-viewer | | openadapt-grounding | UI element localization | openadapt-grounding | | openadapt-retrieval | Multimodal demo retrieval | openadapt-retrieval | | openadapt-privacy | PII/PHI scrubbing | openadapt-privacy | | openadapt-wright | Dev automation | openadapt-wright | | openadapt-herald | Social media from git history | openadapt-herald | | openadapt-crier | Telegram approval bot | openadapt-crier | | openadapt-consilium | Multi-model consensus | openadapt-consilium | | openadapt-desktop | Desktop GUI application | openadapt-desktop | | openadapt-tray | System tray app | openadapt-tray | | openadapt-agent | Production execution engine | openadapt-agent | | openadapt-telemetry | Error tracking | openadapt-telemetry |

Installation

Install what you need:

pip install openadapt              # Minimal CLI only
pip install openadapt[capture]     # GUI capture/recording
pip install openadapt[ml]          # ML training and inference
pip install openadapt[evals]       # Benchmark evaluation
pip install openadapt[privacy]     # PII/PHI scrubbing
pip install openadapt[all]         # Everything

Requirements: Python 3.10+

Quick Start

1. Record a demonstration

openadapt capture start --name my-task
# Perform actions in your GUI, then press Ctrl+C to stop

2. Train a model

openadapt train start --capture my-task --model qwen3vl-2b

3. Evaluate

openadapt eval run --checkpoint training_output/model.pt --benchmark waa

4. View recordings

openadapt capture view my-task

Ecosystem

Core Platform Components

Applications and Tools

| Package | Description | Repository | |---------|-------------|------------| | openadapt-desktop | Desktop GUI application | openadapt-desktop | | openadapt-tray | System tray app | openadapt-tray | | openadapt-agent | Production execution engine | openadapt-agent | | openadapt-wright | Dev automation | openadapt-wright | | openadapt-herald | Social media from git history | openadapt-herald | | openadapt-crier | Telegram approval bot | openadapt-crier | | openadapt-consilium | Multi-model consensus | openadapt-consilium | | openadapt-telemetry | Error tracking | openadapt-telemetry |

CLI Reference

openadapt capture start --name <name>    Start recording
openadapt capture stop                    Stop recording
openadapt capture list                    List captures
openadapt capture view <name>             Open capture viewer

openadapt train start --capture <name>    Train model on capture
openadapt train status                    Check training progress
openadapt train stop                      Stop training

openadapt eval run --checkpoint <path>    Evaluate trained model
openadapt eval run --agent api-claude     Evaluate API agent
openadapt eval mock --tasks 10            Run mock evaluation

openadapt serve --port 8080               Start dashboard server
openadapt version                         Show installed versions
openadapt doctor                          Check system requirements

How It Works

See the full Architecture Evolution for detailed documentation.

Three-Phase Pipeline

OpenAdapt follows a streamlined Demonstrate → Learn → Execute pipeline:

1. DEMONSTRATE (Observation Collection)

Capture: Record user actions and screenshots with openadapt-capture
Privacy: Scrub PII/PHI from recordings with openadapt-privacy
Store: Build a searchable demonstration library

2. LEARN (Policy Acquisition)

Retrieval Path: Embed demonstrations, index them, and enable semantic search
Training Path: Load demonstrations and fine-tune Vision-Language Models (VLMs)
Abstraction: Progress from literal replay to template-based automation

3. EXECUTE (Agent Deployment)

Observe: Take screenshots and gather accessibility information
Policy: Use demonstration context to decide actions via VLMs (Claude, GPT-4o, Qwen3-VL)
Ground: Map intentions to specific UI coordinates with openadapt-grounding
Act: Execute validated actions with safety gates
Evaluate: Measure success with openadapt-evals and feed results back for improvement

Core Approach: Trajectory-Conditioned Disambiguation

Zero-shot VLMs fail on GUI tasks not due to lack of capability, but due to ambiguity in UI affordances. OpenAdapt resolves this by conditioning agents on human demonstrations — "show, don't tell."

| | No Retrieval | With Retrieval | |---|---|---| | No Fine-tuning | 46.7% (zero-shot baseline) | 100% first-action (n=45, shared entry point) | | Fine-tuning | Standard SFT (baseline) | Demo-conditioned FT (planned) |

The bottom-right cell is OpenAdapt's unique value: training models to use demonstrations they haven't seen before, combining retrieval with fine-tuning for maximum accuracy. Phase 2 (retrieval-only prompting) is validated; Phase 3 (demo-conditioned fine-tuning) is in progress.

Validated result: On a controlled macOS benchmark (45 System Settings tasks sharing a common navigation entry point), demo-conditioned prompting improved first-action accuracy from 46.7% to 100%. A length-matched control (+11.1 pp only) confirms the benefit is semantic, not token-length. See the research thesis for methodology and the publication roadmap for limitations.

Industry validation: OpenCUA (NeurIPS 2025 Spotlight, XLANG Lab) reused OpenAdapt's macOS accessibility capture code in their AgentNetTool, but uses demos only for model training — not runtime conditioning. No open-source CUA framework currently does demo-conditioned inference, which remains OpenAdapt's architectural differentiator.

Key Concepts

Policy/Grounding Separation: The Policy decides what to do; Gr

Related Skills

node-connect

341.8k

Diagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps

claude-opus-4-5-migration

84.6k

Migrate prompts and code from Claude Sonnet 4.0, Sonnet 4.5, or Opus 4.1 to Opus 4.5

frontend-design

84.6k

Create distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.

model-usage

341.8k

Use CodexBar CLI local cost usage to summarize per-model usage for Codex or Claude, including the current (most recent) model or a full model breakdown. Trigger when asked for model-level usage/cost data from codexbar, or when you need a scriptable per-model summary from codexbar cost JSON.

OpenAdaptAI

View profile

View on GitHub

GitHub Stars1.5k

CategoryDevelopment

Updated13h ago

Forks225

OpenAdaptAI/OpenAdapt

Languages

Python

Security Score

100/100

Audited on Mar 30, 2026

No findings