SkillAgentSearch skills...

AutoMate

Like Manus, Computer Use Agent(CUA) and Omniparser, we are computer-using agents.AI-driven local automation assistant that uses natural language to make computers work by themselves

Install / Use

/learn @yuruotong1/AutoMate
About this skill

Quality Score

0/100

Supported Platforms

Universal

README

<div align="center"><a name="readme-top"></a> <img src="./imgs/logo.png" width="120" height="120" alt="autoMate logo"> <h1>autoMate</h1> <p><b>🤖 AI-Powered Local Automation Tool | Let Your Computer Work for You</b></p>

中文 | 日本語

"Automate the tedious, give time back to life"

https://github.com/user-attachments/assets/bf27f8bd-136b-402e-bc7d-994b99bcc368

</div>

Special Note: The autoMate project is still in its early stages of rapid iteration, and we continue to explore and integrate the latest technologies. During this process, deeper design thinking, technical stack discussions, challenges and solutions encountered, as well as my ongoing research notes on AI+RPA, will be primarily shared and discussed in my Knowledge Planet "AI Tongmu and His Noble Friends".

If you're interested in the technical details behind autoMate, its development direction, or broader AI automation topics, feel free to scan the QR code to join and discuss with me and other friends, witnessing the growth of autoMate together!

<div align="center"> <figure> <a href="[Your Knowledge Planet Link]" target="_blank" rel="noopener noreferrer"><img src="./imgs/knowledge.png" width="150" height="150" alt="Knowledge Planet QR Code"></a> </figure> </div>

💫 Redefining Your Relationship with Computers

Unlike traditional RPA tools that are cumbersome to use, autoMate leverages the power of large language models to complete complex automation processes simply by describing tasks in natural language. Say goodbye to repetitive work and focus on what truly creates value!

Let automation create more possibilities for your life.

💡 Project Introduction

autoMate is a revolutionary AI+RPA automation tool built on OmniParser that can:

  • 📊 Understand your requirements and automatically plan tasks
  • 🔍 Intelligently comprehend screen content, simulating human vision and operations
  • 🧠 Make autonomous decisions, judging and taking actions based on task requirements
  • 💻 Support local deployment, protecting your data security and privacy

✨ Features

  • 🔮 No-Code Automation - Describe tasks in natural language, no programming knowledge required
  • 🖥️ Full Interface Control - Support operations on any visual interface, not limited to specific software
  • 🌐 Universal LLM Support - Works with OpenAI, Azure, OpenRouter, Groq, Ollama, DeepSeek and any OpenAI-compatible API
  • 🔌 MCP Server - Deploy as an MCP tool and call it from Claude Desktop, Cursor, Windsurf and more
  • 🚅 Simplified Installation - One-click deployment

🚀 Quick Start

📥 Direct Usage

You can directly download the executable file from github release.

📦 Installation

We strongly recommend installing miniConda first and using miniconda to install dependencies. There are many tutorials available online, or you can ask AI for help. Then follow these commands to set up the environment:

# Clone the project
git clone https://github.com/yuruotong1/autoMate.git
cd autoMate
# Create python3.12 environment
conda create -n "automate" python==3.12
# Activate environment
conda activate automate
# Install dependencies
python install.py

After installation, you can start the application using the command line:

python main.py

Then open http://localhost:7888/ in your browser to configure your API key and basic settings.

🔔 Note

autoMate supports any OpenAI-compatible API. Just set the Base URL, API Key, and Model in Settings:

| Provider | Base URL | Example Models | | --- | --- | --- | | OpenAI | https://api.openai.com/v1 | gpt-4o, gpt-4.1, o3 | | Azure OpenAI | your Azure endpoint | gpt-4o | | OpenRouter | https://openrouter.ai/api/v1 | claude-3.7-sonnet, gemini-2.5-pro, etc. | | DeepSeek | https://api.deepseek.com/v1 | deepseek-chat, deepseek-reasoner | | Groq | https://api.groq.com/openai/v1 | llama-3.3-70b-versatile | | Ollama (local) | http://localhost:11434/v1 | qwen2.5-vl, gemma3-tools:27b | | yeka (CN proxy) | https://api.2233.ai/v1 | gpt-4o, o1 |

Recommended: Use a multimodal model (vision support) for best results — e.g. gpt-4o, claude-3.7-sonnet via OpenRouter, or qwen2.5-vl locally via Ollama.

🔌 MCP Server

autoMate can be deployed as an MCP (Model Context Protocol) server, letting AI clients like Claude Desktop, Cursor, or Windsurf call it as a tool to control your local desktop.

Setup

1. Install dependencies

pip install -r requirements.txt

2. Add to your MCP client config

For Claude Desktop, edit ~/Library/Application Support/Claude/claude_desktop_config.json (macOS) or %APPDATA%\Claude\claude_desktop_config.json (Windows):

{
  "mcpServers": {
    "automate": {
      "command": "python",
      "args": ["/absolute/path/to/autoMate/mcp_server.py"],
      "env": {
        "OPENAI_API_KEY": "sk-...",
        "OPENAI_BASE_URL": "https://api.openai.com/v1",
        "OPENAI_MODEL": "gpt-4o"
      }
    }
  }
}

Restart Claude Desktop — you'll see two new tools: run_task and screenshot.

3. Use it

In Claude Desktop, just say:

"Use automate to open Chrome and search for the latest AI news"

Claude will call run_task and autoMate will control the desktop for you.

Available MCP Tools

| Tool | Description | | --- | --- | | run_task | Execute a desktop automation task described in natural language | | screenshot | Capture the screen (or a region) and return as base64 PNG |

📝 FAQ

What models are supported?

autoMate now supports any OpenAI-compatible API. The underlying architecture uses a 3-tier fallback (structured output → JSON mode → plain text extraction) to work across different providers.

Recommended: use a multimodal model with vision capability (the agent needs to see the screen). OpenAI gpt-4o, Claude via OpenRouter, and qwen2.5-vl via Ollama are all tested and working.

Why is my execution speed slow?

If your computer doesn't have an NVIDIA dedicated graphics card, it will run slower because we frequently call OCR for visual annotation, which consumes a lot of GPU resources. We are actively optimizing and adapting. We recommend using an NVIDIA graphics card with at least 4GB of VRAM, and the version should match your torch version:

  1. Run pip list to check torch version;
  2. Check supported cuda version from official website;
  3. Uninstall installed torch and torchvision;
  4. Copy the official torch installation command and reinstall torch suitable for your cuda version.

For example, if your cuda version is 12.4, you need to install torch using the following command:

pip3 uninstall -y torch torchvision
pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu124

🤝 Join Us

Every excellent open-source project embodies collective wisdom. The growth of autoMate is inseparable from your participation and contribution. Whether it's fixing bugs, adding features, or improving documentation, your every contribution will help thousands of people break free from repetitive work.

Join us in creating a more intelligent future.

<a href="https://github.com/yuruotong1/autoMate/graphs/contributors"> <img src="https://contrib.rocks/image?repo=yuruotong1/autoMate" /> </a>
<div align="center"> ⭐ Every Star is an encouragement to the creators and an opportunity for more people to discover and benefit from autoMate ⭐ Your support today is our motivation for tomorrow's progress </div>

Related Skills

View on GitHub
GitHub Stars3.8k
CategoryDevelopment
Updated16h ago
Forks477

Languages

Python

Security Score

100/100

Audited on Mar 31, 2026

No findings