SkillAgentSearch skills...

OpenAdapt

Open Source Generative Process Automation (i.e. Generative RPA). AI-First Process Automation with Large ([Language (LLMs) / Action (LAMs) / Multimodal (LMMs)] / Visual Language (VLMs)) Models

Install / Use

/learn @OpenAdaptAI/OpenAdapt

README

OpenAdapt: AI-First Process Automation with Large Multimodal Models (LMMs)

Build Status PyPI version Downloads License: MIT Python 3.10+ Discord

OpenAdapt is the open source software adapter between Large Multimodal Models (LMMs) and traditional desktop and web GUIs.

Record GUI demonstrations, train ML models, and evaluate agents - all from a unified CLI.

Join us on Discord | Documentation | OpenAdapt.ai


Architecture

OpenAdapt v1.0+ uses a modular meta-package architecture. The main openadapt package provides a unified CLI and depends on focused sub-packages via PyPI:

| Package | Description | Repository | |---------|-------------|------------| | openadapt | Meta-package with unified CLI | This repo | | openadapt-capture | Event recording and storage | openadapt-capture | | openadapt-ml | ML engine, training, inference | openadapt-ml | | openadapt-evals | Benchmark evaluation | openadapt-evals | | openadapt-viewer | HTML visualization | openadapt-viewer | | openadapt-grounding | UI element localization | openadapt-grounding | | openadapt-retrieval | Multimodal demo retrieval | openadapt-retrieval | | openadapt-privacy | PII/PHI scrubbing | openadapt-privacy | | openadapt-wright | Dev automation | openadapt-wright | | openadapt-herald | Social media from git history | openadapt-herald | | openadapt-crier | Telegram approval bot | openadapt-crier | | openadapt-consilium | Multi-model consensus | openadapt-consilium | | openadapt-desktop | Desktop GUI application | openadapt-desktop | | openadapt-tray | System tray app | openadapt-tray | | openadapt-agent | Production execution engine | openadapt-agent | | openadapt-telemetry | Error tracking | openadapt-telemetry |


Installation

Install what you need:

pip install openadapt              # Minimal CLI only
pip install openadapt[capture]     # GUI capture/recording
pip install openadapt[ml]          # ML training and inference
pip install openadapt[evals]       # Benchmark evaluation
pip install openadapt[privacy]     # PII/PHI scrubbing
pip install openadapt[all]         # Everything

Requirements: Python 3.10+


Quick Start

1. Record a demonstration

openadapt capture start --name my-task
# Perform actions in your GUI, then press Ctrl+C to stop

2. Train a model

openadapt train start --capture my-task --model qwen3vl-2b

3. Evaluate

openadapt eval run --checkpoint training_output/model.pt --benchmark waa

4. View recordings

openadapt capture view my-task

Ecosystem

Core Platform Components

| Package | Description | Repository | |---------|-------------|------------| | openadapt | Meta-package with unified CLI | This repo | | openadapt-capture | Event recording and storage | openadapt-capture | | openadapt-ml | ML engine, training, inference | openadapt-ml | | openadapt-evals | Benchmark evaluation | openadapt-evals | | openadapt-viewer | HTML visualization | openadapt-viewer | | openadapt-grounding | UI element localization | openadapt-grounding | | openadapt-retrieval | Multimodal demo retrieval | openadapt-retrieval | | openadapt-privacy | PII/PHI scrubbing | openadapt-privacy |

Applications and Tools

| Package | Description | Repository | |---------|-------------|------------| | openadapt-desktop | Desktop GUI application | openadapt-desktop | | openadapt-tray | System tray app | openadapt-tray | | openadapt-agent | Production execution engine | openadapt-agent | | openadapt-wright | Dev automation | openadapt-wright | | openadapt-herald | Social media from git history | openadapt-herald | | openadapt-crier | Telegram approval bot | openadapt-crier | | openadapt-consilium | Multi-model consensus | openadapt-consilium | | openadapt-telemetry | Error tracking | openadapt-telemetry |


CLI Reference

openadapt capture start --name <name>    Start recording
openadapt capture stop                    Stop recording
openadapt capture list                    List captures
openadapt capture view <name>             Open capture viewer

openadapt train start --capture <name>    Train model on capture
openadapt train status                    Check training progress
openadapt train stop                      Stop training

openadapt eval run --checkpoint <path>    Evaluate trained model
openadapt eval run --agent api-claude     Evaluate API agent
openadapt eval mock --tasks 10            Run mock evaluation

openadapt serve --port 8080               Start dashboard server
openadapt version                         Show installed versions
openadapt doctor                          Check system requirements

How It Works

See the full Architecture Evolution for detailed documentation.

Three-Phase Pipeline

OpenAdapt follows a streamlined Demonstrate → Learn → Execute pipeline:

1. DEMONSTRATE (Observation Collection)

  • Capture: Record user actions and screenshots with openadapt-capture
  • Privacy: Scrub PII/PHI from recordings with openadapt-privacy
  • Store: Build a searchable demonstration library

2. LEARN (Policy Acquisition)

  • Retrieval Path: Embed demonstrations, index them, and enable semantic search
  • Training Path: Load demonstrations and fine-tune Vision-Language Models (VLMs)
  • Abstraction: Progress from literal replay to template-based automation

3. EXECUTE (Agent Deployment)

  • Observe: Take screenshots and gather accessibility information
  • Policy: Use demonstration context to decide actions via VLMs (Claude, GPT-4o, Qwen3-VL)
  • Ground: Map intentions to specific UI coordinates with openadapt-grounding
  • Act: Execute validated actions with safety gates
  • Evaluate: Measure success with openadapt-evals and feed results back for improvement

Core Approach: Trajectory-Conditioned Disambiguation

Zero-shot VLMs fail on GUI tasks not due to lack of capability, but due to ambiguity in UI affordances. OpenAdapt resolves this by conditioning agents on human demonstrations — "show, don't tell."

| | No Retrieval | With Retrieval | |---|---|---| | No Fine-tuning | 46.7% (zero-shot baseline) | 100% first-action (n=45, shared entry point) | | Fine-tuning | Standard SFT (baseline) | Demo-conditioned FT (planned) |

The bottom-right cell is OpenAdapt's unique value: training models to use demonstrations they haven't seen before, combining retrieval with fine-tuning for maximum accuracy. Phase 2 (retrieval-only prompting) is validated; Phase 3 (demo-conditioned fine-tuning) is in progress.

Validated result: On a controlled macOS benchmark (45 System Settings tasks sharing a common navigation entry point), demo-conditioned prompting improved first-action accuracy from 46.7% to 100%. A length-matched control (+11.1 pp only) confirms the benefit is semantic, not token-length. See the research thesis for methodology and the publication roadmap for limitations.

Industry validation: OpenCUA (NeurIPS 2025 Spotlight, XLANG Lab) reused OpenAdapt's macOS accessibility capture code in their AgentNetTool, but uses demos only for model training — not runtime conditioning. No open-source CUA framework currently does demo-conditioned inference, which remains OpenAdapt's architectural differentiator.

Key Concepts

  • Policy/Grounding Separation: The Policy decides what to do; Gr

Related Skills

View on GitHub
GitHub Stars1.5k
CategoryDevelopment
Updated13h ago
Forks225

Languages

Python

Security Score

100/100

Audited on Mar 30, 2026

No findings