Acu

A curated list of resources about AI agents for Computer Use, including research papers, projects, frameworks, and tools.

Generate Convert Improve

Install / Use

/learn @trycua/Acu

About this skill

Quality Score

0/100

README

ACU - Awesome Agents for Computer Use

</h1> </div>

An AI Agent for Computer Use is an autonomous program that can reason about tasks, plan sequences of actions, and act within the domain of a computer or mobile device in the form of clicks, keystrokes, other computer events, command-line operations and internal/external API calls. These agents combine perception, decision-making, and control capabilities to interact with digital interfaces and accomplish user-specified goals independently.

A curated list of resources about AI agents for Computer Use, including research papers, projects, frameworks, and tools.

ACU - Awesome Agents for Computer Use

Articles

Papers

<details open> <summary><b>Surveys</b></summary>

Surveys

AI Agents for Computer Use: A Review of Instruction-based Computer Control, GUI Automation, and Operator Assistants (Jan. 2025)
- Comprehensive review establishing taxonomy of computer control agents (CCAs) from environment, interaction, and agent perspectives, analyzing 86 CCAs and 33 datasets
GUI Agents: A Survey (Dec. 2024)
- General survey of GUI agents
Large Language Model-Brained GUI Agents: A Survey (Nov. 2024)
- Focus on LLM-based approaches
- Website
GUI Agents with Foundation Models: A Comprehensive Survey (Nov. 2024)
- Comprehensive overview of foundation model-based GUI agents

<br/> </details> <details open> <summary><b>Frameworks & Models</b></summary>

Frameworks & Models

Reinforcement Learning for Long-Horizon Interactive LLM Agents (Feb. 2025)
- Novel RL approach (LOOP) for training IDAs directly in target environments
- 32B parameter agent outperforms OpenAI o1 by 9 percentage points on AppWorld
Large Action Models: From Inception to Implementation (Dec. 2024)
- Comprehensive framework for developing LAMs that can perform real-world actions beyond language generation
- Details key stages including data collection, model training, environment integration, grounding and evaluation
Guiding VLM Agents with Process Rewards at Inference Time for GUI Navigation (Dec. 2024)
- Novel reward-guided navigation approach
SpiritSight Agent: Advanced GUI Agent with One Look (Dec. 2024)
- Single-shot GUI interaction approach
AutoGUI: Scaling GUI Grounding with Automatic Functionality Annotations from LLMs (Dec. 2024)
- Novel approach for automatic GUI functionality annotation
Simulate Before Act: Model-Based Planning for Web Agents (Dec. 2024)
- Novel model-based planning approach using LLM world models
Proposer-Agent-Evaluator (PAE): Autonomous Skill Discovery For Foundation Model Internet Agents (Dec. 2024)
- Novel autonomous skill discovery framework for web agents
- Code
Learning to Contextualize Web Pages for Enhanced Decision Making by LLM Agents (Dec. 2024)
- Novel framework for contextualizing web pages to enhance LLM agent decision making
Digi-Q: Transforming VLMs to Device-Control Agents via Value-Based Offline RL (Dec. 2024)
- Novel value-based offline RL approach for training VLM device-control agents
Magentic-One (Nov. 2024)
- Multi-agent system with orchestrator-led coordination
- Strong performance on GAIA, WebArena, and AssistantBench
Agent Workflow Memory (Sep. 2024)
- Novel workflow memory framework for agents
- Code
The Impact of Element Ordering on LM Agent Performance (Sep. 2024)
- Novel study on element ordering's impact on agent performance
- Code
Agent Q: Advanced Reasoning and Learning for Autonomous AI Agents (Aug. 2024)
- Novel reasoning and learning framework
- Website
OpenWebAgent: An Open Toolkit to Enable Web Agents on Large Language Models (Aug. 2024)
- Open platform for web-based agent deployment
- Code
Agent-e: From autonomous web navigation to foundational design principles in agentic systems (Jul. 2024)
- Hierarchical architecture with flexible DOM distillation
- Novel denoising method for web navigation
Apple Intelligence Foundation Language Models (Jul. 2024)
- Vision-Language Model with Private Cloud Compute
- Novel foundation model architecture
Tree search for language model agents (Jul. 2024)
- Multi-step reasoning and planning with best-first tree search
- Novel approach for LLM-based agents
DigiRL: Training In-The-Wild Device-Control Agents with Autonomous Reinforcement Learning (Jun. 2024)
- Novel reinforcement learning approach
- Code
Mobile-Agent-v2: Mobile Device Operation Assistant with Effective Navigation via Multi-Agent Collaboration (Jun. 2024)
- Multi-agent collaboration for mobile device operation
- Code
Octopus Series: On-device Language Models for Computer Control (Apr. 2024)
- v4: Graph of language models with functional tokens integration (Apr. 2024)
- v3: Sub-billion parameter multimodal model for edge devices (Apr. 2024)
- v2: Super agent for Android and iOS (Apr. 2024)
- v1: Function calling of software APIs (Apr. 2024)
- Website
- Code
AutoWebGLM: Bootstrap and reinforce a large language model-based web navigating agent (Apr. 2024)
- Novel approach for real-world web navigation and bilingual benchmark
- Code
Cradle: Empowering Foundation Agents towards General Computer Control (Mar. 2024)
- Focus on general computer control using Red Dead Redemption II as a case study
- Code
Android in the Zoo: Chain-of-Action-Thought for GUI Agents (Mar. 2024)
- Novel Chain-of-Action-Thought framework for Android interaction
- Code
ScreenAgent: A Computer Control Agent Driven by Visual Language Large Model (Feb. 2024)
- Vision-language model for computer control
- Code
OS-Copilot: Towards Generalist Computer Agents with Self-Improvement (Feb. 2024)
- Vision-Language Model for PC interaction
- Code
UFO: A UI-Focused Agent for Windows OS Interaction (Feb. 2024)
- Specialized for Windows OS interaction
- Code
CoCo-Agent: A Comprehensive Cognitive MLLM Agent for Smartphone GUI Automation (Feb. 2024)
- Novel comprehensive environment perception (CEP) approach for exhaustive GUI perception
- Introduces conditional action prediction (CAP) for reliable action response
Intention-inInteraction (IN3): Tell Me More! (Feb. 2024)
- Novel benchmark for evaluating user intention understanding in agent designs
- Introduces model experts for robust user-agent interaction
[Dual-view visual co

Related Skills

proje

Interactive vocabulary learning platform with smart flashcards and spaced repetition for effective language acquisition.

YC-Killer

2.7k

A library of enterprise-grade AI agents designed to democratize artificial intelligence and provide free, open-source alternatives to overvalued Y Combinator startups. If you are excited about democratizing AI access & AI agents, please star ⭐️ this repository and use the link in the readme to join our open source AI research team.

API

A learning and reflection platform designed to cultivate clarity, resilience, and antifragile thinking in an uncertain world.

openclaw-plugin-loom

Loom Learning Graph Skill This skill guides agents on how to use the Loom plugin to build and expand a learning graph over time. Purpose - Help users navigate learning paths (e.g., Nix, German)

trycua

View profile

View on GitHub

GitHub Stars1.6k

CategoryEducation

Updated1d ago

Forks113

trycua/acu

Security Score

85/100

Audited on Mar 24, 2026

No findings

Acu