Guaardvark

The self-hosted AI workstation. Autonomous screen agents, 3-tier neural routing, parallel agent swarms, video generation, 4K/8K upscaling, RAG, voice interface, 57-tool execution engine — all running locally on your hardware.

Generate Convert Improve

Install / Use

/learn @guaardvark/Guaardvark

About this skill

Quality Score

0/100

README

Guaardvark

Version 2.5.2 · guaardvark.com

The self-hosted AI workstation. Autonomous agents that see your screen and control your apps. A three-tier neural routing engine. Parallel agent swarms across isolated git worktrees. Video generation, image upscaling to 4K/8K, RAG over your documents, voice interface, and a 57-tool execution engine — all running locally on your hardware. Your machine. Your data. Your rules.

git clone https://github.com/guaardvark/guaardvark.git && cd guaardvark && ./start.sh

One command. Installs everything. Starts all services. Done.

AI-Generated Film — Made Entirely with Guaardvark

Every frame generated on a single desktop GPU. No cloud. No stock footage. No API keys.

What Makes This Different

AgentBrain — Three-Tier Neural Routing

Every message is routed through a three-tier decision engine that picks the fastest path to the right answer. Reflexes fire in under a millisecond. Instinct handles single-shot requests in one LLM call. Deliberation spins up a full ReACT reasoning loop when the problem demands it.

| Tier | Name | Latency | LLM Calls | When It Fires | |------|------|---------|-----------|---------------| | 1 | Reflex | <100ms | 0 | Greetings, farewells, media controls — pattern-matched, no inference | | 2 | Instinct | 1–3s | 1 | Single-shot questions, web searches, image generation, vision tasks | | 3 | Deliberation | 5–30s | 3–10 | Multi-step research, analysis chains, complex agent tasks |

Automatic escalation — Tier 2 can signal complexity and hand off to Tier 3 mid-response
BrainState singleton — pre-computes tool schemas, model capabilities, system prompts, and reflex tables at startup so routing adds zero overhead
Warm-up — background thread loads the active model into VRAM before the first request arrives

Autonomous Screen Agents

Guaardvark agents control a real virtual desktop (Xvfb + openbox at 1280x720). They see the screen through vision models, move the mouse, click buttons, type text, navigate browsers, and verify their own actions.

Unified vision brain — Gemma4 sees the screen and decides the next action in a single inference call. Qwen3-VL handles coordinate estimation. Both calibrated per-model with tracked scale factors.
Closed-loop servo targeting — three-attempt adaptive strategy: ballistic move → single correction with crosshair overlay → full corrections with zoom-cropped analysis around the cursor
45+ deterministic recipes — browser navigation, tabs, scroll, search, find, zoom, copy/paste — all execute instantly from a JSON recipe library, bypassing the vision loop entirely
Obstacle detection — handles popups, permission dialogs, and notification bars with automatic thinking model escalation
Self-QA sweep — agent navigates every page of its own UI and reports what's working and what's broken
Live agent monitor — real-time SEE/THINK/ACT transcript of every decision the agent makes
Integrated screen viewer — draggable, resizable VNC viewer on any page with popup window mode

Supported Vision Models

| Model | Role | Coordinate System | Notes | |-------|------|-------------------|-------| | Gemma4 (e4b) | Sees + decides | 1024x1024 normalized, box_2d [y1,x1,y2,x2] | Unified brain — vision and reasoning in one call | | Qwen3-VL (2b) | Coordinate estimation | 1024px internal width | Default servo eyes, fast and accurate on dark UIs | | Qwen3-VL (4b/8b) | Escalation eyes | 1024px internal width | Automatic escalation after 3 consecutive failures | | Moondream | Fallback eyes | 1024px internal width | For text-only models that need external vision |

Swarm Orchestrator — Parallel Agent Execution

Launch multiple AI coding agents in parallel, each working in an isolated git worktree on its own branch. Results merge back with dependency-ordered conflict detection, optional test validation, and full cost tracking.

Two backends — Claude Code (cloud, cost-tracked at $0.015/$0.075 per 1K tokens) and Cline/OpenClaw (fully local via Ollama, zero cost)
Flight Mode — fully offline operation. Auto-detects network state, falls back to local models, serializes file conflicts automatically. No prompts, no internet required.
Git worktree isolation — each task gets its own branch and working directory. All worktrees share the .git directory (lightweight). Automatically excluded from git status.
Dependency-aware merging — topological sort ensures foundational changes land first. Dry-run conflict detection before real merge. Test suite validation before integration.
Built-in templates — REST API scaffold, refactor-and-extract, test coverage expansion, Flight Mode demo
Up to 20 concurrent agents — configurable limit with automatic slot management
Live dashboard — real-time status, per-task logs, cost breakdown, elapsed time, disk usage

Video Generation Pipeline

State-of-the-art video generation running entirely on your GPU. No cloud APIs, no per-minute billing, no content restrictions.

| Model | Type | Max Duration | Native Resolution | VRAM | |-------|------|-------------|-------------------|------| | Wan 2.2 (14B MoE) | Text-to-Video | 5s (81 frames @ 16fps) | 832x480 | 11GB | | CogVideoX-5B | Text-to-Video | 6s (49 frames @ 8fps) | 720x480 | 16GB | | CogVideoX-2B | Text-to-Video | 6s (49 frames @ 8fps) | 720x480 | 12GB | | CogVideoX-5B I2V | Image-to-Video | 6s (49 frames @ 8fps) | 720x480 | 16GB | | SVD XT | Text-to-Video | 3.5s (25 frames @ 7fps) | 512x512 | <8GB |

Resolution options — 512px, 576px, 720px, 1280px, 1920px (1080p), and custom dimensions (multiples of 8)
Quality tiers — Fast (10 steps), Standard (30), High (40), Maximum (50)
Frame interpolation — 1x raw, 2x doubled FPS, 2x + upscale for cinema-quality output
Prompt enhancement — Cinematic, Realistic, Artistic, Anime, or raw
Low VRAM mode — automatically reduces resolution, frames, and inference steps for 8–12GB GPUs
Batch processing — queue multiple videos from a prompt list, processed by Celery workers
ComfyUI integration — one-click launch to the node editor for custom workflows

GPU Image Upscaling — 4K and 8K Output

Upscale images and video frames to 4K (3840px) or 8K (7680px) resolution using GPU-accelerated super-resolution models.

| Model | Scale | Size | Best For | |-------|-------|------|----------| | HAT-L SRx4 | 4x | 159 MB | Maximum quality restoration | | RealESRGAN x4plus | 4x | 64 MB | General-purpose, photorealistic | | RealESRGAN x2plus | 2x | 64 MB | Mild upscaling | | RealESRGAN x4plus (Anime) | 4x | 17 MB | Anime and stylized content | | realesr-animevideov3 | 4x | 6 MB | Video-optimized anime | | 4x-UltraSharp | 4x | 67 MB | Enhanced sharpness | | 4x NMKD-Superscale | 4x | 67 MB | Advanced super-scaling | | 4x Foolhardy Remacri | 4x | 67 MB | Texture-focused upscaling |

Two-pass mode — run the model twice for maximum quality
Precision control — FP16 (standard GPUs), BF16 (Ampere+), torch.compile for up to 3x speedup
Video upscaling — frame-by-frame processing with progress tracking for MP4, MKV, AVI, MOV, WebM
Watch folder — optional auto-processing of new files dropped into a directory

RAG That Actually Works

Chat grounded in your documents. Upload files, build a knowledge base, and ask questions. The AI reads and understands your content — not just keyword matching.

Hybrid retrieval — BM25 keyword + vector semantic search combined
Smart chunking — code files get AST-informed chunking, prose gets semantic splitting
Multiple embedding models — switch between lightweight (300M) and high-quality (4B+) via UI
RAG Autoresearch — autonomous optimization loop that experiments with parameters, keeps improvements, reverts regressions
Entity extraction — automatic entity and relationship indexing
Per-project isolation — each project has its own knowledge base and chat context

Self-Improving AI

The system runs its own test suite, identifies failures, dispatches an AI agent to read the code and fix the bugs, verifies the fix, and broadcasts the learning to other instances. No human in the loop.

Three modes — Scheduled (every 6 hours), Reactive (triggered by repeated 500 errors), Directed (manual tasks)
Guardian review — Uncle Claude (Anthropic API) reviews code changes for safety before applying, with risk levels and halt directives
**Veri

Related Skills

bluebubbles

353.1k

Use when you need to send or manage iMessages via BlueBubbles (recommended iMessage integration). Calls go through the generic message tool with channel="bluebubbles".

slack

353.1k

Use when you need to control Slack from OpenClaw via the slack tool, including reacting to messages or pinning/unpinning items in Slack channels or DMs.

qqbot-channel

353.1k

QQ 频道管理技能。查询频道列表、子频道、成员、发帖、公告、日程等操作。使用 qqbot_channel_api 工具代理 QQ 开放平台 HTTP 接口，自动处理 Token 鉴权。当用户需要查看频道、管理子频道、查询成员、发布帖子/公告/日程时使用。

claude-opus-4-5-migration

111.6k

Migrate prompts and code from Claude Sonnet 4.0, Sonnet 4.5, or Opus 4.1 to Opus 4.5