NestOS

NestOS is a full-stack GUI automation agent. It's not just "chat-driven". Think of it as a digital teammate with a real screen.

Generate Convert Improve

Install / Use

/learn @Bigchx/NestOS

About this skill

Quality Score

0/100

README

NestOS

NestOS Preview

NestOS is a full-stack GUI automation agent. It's not just "chat-driven". Think of it as a digital teammate with a real screen.

It runs on a server with a full Linux desktop (GUI), so it can click, type, and handle browser tasks like a human, while still doing system-level ops work in the background.

Read this in other languages: English | 简体中文

How It Works

You send a task in chat.
The engine executes it inside a remote desktop environment.
If it hits blockers (captcha, 2FA, etc.), it pauses and asks for human help.
After you step in, it resumes automatically and reports results in real time.

Core Capabilities (Already Working)

1. Human-in-the-loop verification (Captcha Wall)

Most automation breaks on sliders, Cloudflare checks, or 2FA. NestOS handles this flow:

OpenClaw tries auto-login first.
If verification is too hard, it pauses and sends a screenshot.
You jump in via XRDP and solve it once.
OpenClaw detects success and continues the task.

With human takeover available when needed, login and interaction success rates go way up.

2. End-to-end project deployment from the web

NestOS is not limited to editing local files. It can build and ship projects directly:

Browse GitHub and read project docs.
Pull code with git clone.
Detect stack and install deps (npm install, pip install, docker compose up, etc.).
Configure ports and reverse proxy, then return a live URL.

3. Visual operations instead of blind CLI guessing

You can watch the whole process through XRDP and see exactly what it's doing on desktop Chrome.

4. Rich content reading (Powerful but token-heavy)

It can read content across the web, including sites without RSS, but token usage is higher.

Tech Stack (Full-Stack Ready)

Infrastructure: Linux (Ubuntu) + 1Panel + Docker/Compose
Multi-language runtime: PHP / Python / Node.js / Go / Java
Browser execution layer: OpenClaw Browser (Chrome)
GUI layer: XRDP + XFCE4
Control channels: Web / Discord / Telegram / Feishu / ... (supports all software that can integrate with OpenClaw, such as WhatsApp)
AI memory layer: qmd index + semantic retrieval + local logs

Problems It Solves

Captcha wall: combines AI execution with human assist when needed.
Environment silos: not tied to one language or framework.
Blind ops: XRDP gives you a clear visual execution path.
Not just a Mac mini story anymore: Linux can deliver equally strong (or stronger) GUI automation, and in practice you do not need extra skills beyond NestOS for core workflows.

Installation

Prerequisites

OS: Ubuntu 22.04 x64 (currently the only supported version; compatibility on other systems is unknown)
Recommended spec: 4 vCPU / 8 GB RAM or higher, with decent bandwidth
Recommendation: install this skill on a fresh environment when possible; if not, make a full backup first to avoid environment conflicts or crashes
Note: current version still has some rough edges and bugs

Step 1: Install OpenClaw

curl -fsSL https://openclaw.ai/install.sh | bash

Step 2: Install the NestOS skill via chat

After your OpenClaw API is configured, send this in chat (before install, you can set your local timezone; bootstrap_NestOS.sh defaults to Asia/Shanghai):

Go to github.com/bigchx/nestos, download nestos-bootstrap.skill, and install the skill by following the prompts. Before running installation, confirm timezone: if I do not specify one, keep default Asia/Shanghai; if I provide one (for example, America/Los_Angeles), install with that timezone.

Step 3: Track installation progress

Progress callbacks are still incomplete (cron reporting is not fully polished).
During install, it's normal to ask for progress updates manually.
Reference timing: on 4C8G + 6 Mbps, install usually takes more than 6 minutes.

Step 4: Confirm installation

The server will reboot once after install; a temporary disconnect is expected.
Visit http://<your-ip>:9000. If the page loads normally (for example Hello, nestos!), core setup is done.
If the page does not open, first confirm the corresponding ports in the returned config are allowed.

Step 5: First-time initialization (important)

In OpenClaw chat, ask it to return generated config details, then back them up and rotate sensitive defaults.
Open/allow the corresponding ports listed in that returned config.
Log in to XRDP once.
Open desktop Chrome and make sure it launches correctly.
Import the preloaded plugin from Downloads and verify it works.
Exit remote desktop.

Daily Use

After first-time initialization, you can mostly just send chat instructions to update websites or run browser tasks.

Known Use Cases

Social media ops: log in, schedule posts, monitor comments, and triage inboxes (with manual takeover when verification appears).
Prompt-to-site updates: use natural language to edit pages, replace content, tweak components, and run pre-publish checks.
RSS and web aggregation: collect RSS and non-RSS sources, then generate unified digests.
GitHub direct deployment: read repo docs, clone code, install deps, start services, and return a live URL.
Competitive and market monitoring: track keyword pages on a schedule and report content diffs.
Routine ops automation: batch back-office logins, config checks, screenshot archiving, and alerting.

TODO / Known Issues

[ ] Better install progress visibility and active status callbacks
[ ] Resource usage hints for heavy multi-tab sessions
[ ] More advanced cross-platform automation scenarios

Acknowledgements

This project stands on top of great open-source work. Big thanks to the maintainers and contributors.

If NestOS helps you, please consider giving it a Star. If you hit issues, open an Issue or send a PR. If this project genuinely helped you, feel free to visit bigchx.com and support our original 3D model designs (mostly published on MakerWorld).

Related Skills

node-connect

353.3k

Diagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps

frontend-design

111.7k

Create distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.

openai-whisper-api

353.3k

Transcribe audio via OpenAI Audio Transcriptions API (Whisper).

qqbot-media

353.3k

QQBot 富媒体收发能力。使用 <qqmedia> 标签，系统根据文件扩展名自动识别类型（图片/语音/视频/文件）。