SkillAgentSearch skills...

GenericAgent

AI-powered PC agent loop for desktop automation and intelligent task execution

Install / Use

/learn @lsdefine/GenericAgent
About this skill

Quality Score

0/100

Supported Platforms

Universal

README

<div align="center"> <img src="assets/images/bar.jpg" width="880"/> </div> <p align="center"> <a href="#english">English</a> | <a href="#chinese">中文</a> </p>

<a name="english"></a>

🌟 Overview

GenericAgent is a minimal, self-evolving autonomous agent framework. Its core is just ~3,300 lines of code. Through 7 atomic tools + a 92-line Agent Loop, it grants any LLM system-level control over a local computer — covering browser, terminal, filesystem, keyboard/mouse input, screen vision, and mobile devices (ADB).

Its design philosophy: don't preload skills — evolve them.

Every time GenericAgent solves a new task, it automatically crystallizes the execution path into an skill for direct reuse later. The longer you use it, the more skills accumulate — forming a skill tree that belongs entirely to you, grown from 3,300 lines of seed code.

🤖 Self-Bootstrap Proof — Everything in this repository, from installing Git and running git init to every commit message, was completed autonomously by GenericAgent. The author never opened a terminal once.

📋 Core Features

  • Self-Evolving: Automatically crystallizes each task into an skill. Capabilities grow with every use, forming your personal skill tree.
  • Minimal Architecture: ~3,300 lines of core code. Agent Loop is just 92 lines. No complex dependencies, zero deployment overhead.
  • Strong Execution: Injects into a real browser (preserving login sessions). 7 atomic tools take direct control of the system.
  • High Compatibility: Supports Claude / Gemini / Kimi and other major models. Cross-platform.

🧬 Self-Evolution Mechanism

This is what fundamentally distinguishes GenericAgent from every other agent framework.

[New Task] --> [Autonomous Exploration] (install deps, write scripts, debug & verify) -->
[Crystallize Execution Path into skill] --> [Write to Memory Layer] --> [Direct Recall on Next Similar Task]

| What you say | What the agent does the first time | Every time after | |---|---|---| | "Read my WeChat messages" | Install deps → reverse DB → write read script → save skill | one-line invoke | | "Monitor stocks and alert me" | Install mootdx → build selection flow → configure cron → save skill | one-line start | | "Send this file via Gmail" | Configure OAuth → write send script → save skill | ready to use |

After a few weeks, your agent instance will have a skill tree no one else in the world has — all grown from 3,300 lines of seed code.

🎯 Demo Showcase

| 🧋 Food Delivery Order | 📈 Quantitative Stock Screening | |:---:|:---:| | <img src="assets/demo/order_tea.gif" width="100%" alt="Order Tea"> | <img src="assets/demo/selectstock.gif" width="100%" alt="Stock Selection"> | | "Order me a milk tea" — Navigates the delivery app, selects items, and completes checkout automatically. | "Find GEM stocks with EXPMA golden cross, turnover > 5%" — Screens stocks with quantitative conditions. | | 🌐 Autonomous Web Exploration | 💰 Expense Tracking | 💬 Batch Messaging | | <img src="assets/demo/autonomous_explore.png" width="100%" alt="Web Exploration"> | <img src="assets/demo/alipay_expense.png" width="100%" alt="Alipay Expense"> | <img src="assets/demo/wechat_batch.png" width="100%" alt="WeChat Batch"> | | Autonomously browses and periodically summarizes web content. | "Find expenses over ¥2K in the last 3 months" — Drives Alipay via ADB. | Sends bulk WeChat messages, fully driving the WeChat client. |

📅 Latest News


🚀 Quick Start

Method 1: Standard Installation

# 1. Clone the repo
git clone https://github.com/lsdefine/GenericAgent.git
cd GenericAgent

# 2. Install minimal dependencies
pip install streamlit pywebview

# 3. Configure API Key
cp mykey_template.py mykey.py
# Edit mykey.py and fill in your LLM API Key

# 4. Launch
python launch.pyw

Method 2: Windows Portable Version (Recommended for beginners)

Download portable version (19MB, unzip and run)

Full guide: WELCOME_NEW_USER.md

Method 3: Android (Termux)

cd /sdcard/ga
python agentmain.py

🤖 Bot Interfaces (Optional)

QQ Bot

Uses qq-botpy WebSocket long connection — no public webhook required:

pip install qq-botpy

Add to mykey.py:

qq_app_id = "YOUR_APP_ID"
qq_app_secret = "YOUR_APP_SECRET"
qq_allowed_users = ["YOUR_USER_OPENID"]  # or ['*'] for public access
python frontends/qqapp.py
# or launch together with the desktop floating window
python launch.pyw --qq

Create a bot at the QQ Open Platform to get AppID / AppSecret. After the first message, user openid is logged in temp/qqapp.log.

Lark (Feishu)

pip install lark-oapi
python frontends/fsapp.py          # or python launch.pyw --feishu
fs_app_id = "cli_xxx"
fs_app_secret = "xxx"
fs_allowed_users = ["ou_xxx"]  # or ['*']

Inbound support: text, rich text post, images, files, audio, media, interactive cards / share cards Outbound support: streaming progress cards, image replies, file / media replies Vision model: Images are sent as true multimodal input to OpenAI Vision-compatible backends on the first turn

Full setup: assets/SETUP_FEISHU.md

WeCom (Enterprise WeChat)

pip install wecom_aibot_sdk
python frontends/wecomapp.py       # or python launch.pyw --wecom
wecom_bot_id = "your_bot_id"
wecom_secret = "your_bot_secret"
wecom_allowed_users = ["your_user_id"]
wecom_welcome_message = "Hello, I'm online."

DingTalk

pip install dingtalk-stream
python frontends/dingtalkapp.py    # or python launch.pyw --dingtalk
dingtalk_client_id = "your_app_key"
dingtalk_client_secret = "your_app_secret"
dingtalk_allowed_users = ["your_staff_id"]  # or ['*']

Telegram Bot

# mykey.py
tg_bot_token = 'YOUR_BOT_TOKEN'
tg_allowed_users = [YOUR_USER_ID]
python frontends/tgapp.py

📊 Comparison with Similar Tools

| Feature | GenericAgent | OpenClaw | Claude Code | |------|:---:|:---:|:---:| | Codebase | ~3,300 lines | ~530,000 lines | Open-sourced (large) | | Deployment | pip install + API Key | Multi-service orchestration | CLI + subscription | | Browser Control | Real browser (session preserved) | Sandbox / headless browser | Via MCP plugin | | OS Control | Mouse/kbd, vision, ADB | Multi-agent delegation | File + terminal | | Self-Evolution | Autonomous skill growth | Plugin ecosystem | Stateless between sessions | | Out of the Box | 10 .py files + 5 skills | Hundreds of modules | Rich CLI toolset |

🧠 How It Works

GenericAgent accomplishes complex tasks through Layered Memory × Minimal Toolset × Autonomous Execution Loop, continuously accumulating experience during execution.

1️⃣ Layered Memory System

Memory crystallizes throughout task execution, letting the agent build stable, efficient working patterns over time.

  • L0 — Meta Rules: Core behavioral rules and system constraints of the agent
  • L2 — Global Facts: Stable knowledge accumulated over long-term operation
  • L3 — Task Skillss: Workflows for completing specific task types

2️⃣ Autonomous Execution Loop

Perceive environment state → Task reasoning → Execute tools → Write experience to memory → Loop

The entire core loop is just 92 lines of code (agent_loop.py).

3️⃣ Minimal Toolset

GenericAgent provides only 7 atomic tools, forming the foundational capabilities for interacting with the outside world.

| Tool | Function | |------|------| | code_run | Execute arbitrary code | | file_read | Read files | | file_write | Write files | | file_patch | Patch / modify files | | web_scan | Perceive web content | | web_execute_js | Control browser behavior | | ask_user | Human-in-the-loop confirmation |

Additionally, 2 memory management tools (update_working_checkpoint, start_long_term_update) allow the agent to persist context and accumulate experience across sessions.

4️⃣ Capability Extension Mechanism

Capable of dynamically creating new tools.

Via code_run, GenericAgent can dynamically install Python packages, write new scripts, call external APIs, or control hardware at runtime — crystallizing temporary abilities into permanent tools.

<div align="center"> <img src="assets/images/workflow.jpg" alt="GenericAgent Workflow" width="400"/> <br><em>GenericAgent Workflow Diagram</em> </div>

⭐ Support

If this project helped you, please consider leaving a Star! 🙏

You're also welcome to join our GenericAgent Community Group for discussion, feedback, and co-building 👏

<div align="center"> <img src="assets/images/wechat_group.jpg" width="280"/> </div>

📄 License

MIT License — see LICENSE


<a name="chinese"></a>

🌟 项目简介

GenericAgent 是一个极简、可自我进化的自主 Agent 框架。核心仅 ~3,300 行代码,通过 7 个原子工具 + 92 行 Agent Loop,赋予任意 LLM 对本地计算机的系统级控制能力,覆盖浏览器、终端、文件系统、键鼠输入、屏幕视觉及移动设备。

它的设计哲学是:不预设技能,靠进化获得能力。

每解决一个新任务,GenericAgent 就将执行路径自动固化为 Skill,供后续直接调用。使用时间越长,沉淀的技能越多,形成一棵完全属于你、从 3,300 行种子代码生长出来的专属技能树。

🤖 自举实证 — 本仓库的一切,从安装 Git、git init 到每一条 commit message,均由 GenericAgent 自主完成。作者全程未打开过一次终端。

📋 核心特性

  • 自我进化: 每次任务自动沉淀 Skill,能力随使用持续增长,形成专属技能树
  • 极简架构: ~3,300 行核心代码,Agent Loop 仅 92 行,无复杂依赖,部署零负担
  • 强执行力
View on GitHub
GitHub Stars740
CategoryDevelopment
Updated5h ago
Forks124

Languages

Python

Security Score

95/100

Audited on Mar 24, 2026

No findings