SkillAgentSearch skills...

Guaardvark

The self-hosted AI workstation. Autonomous screen agents, 3-tier neural routing, parallel agent swarms, video generation, 4K/8K upscaling, RAG, voice interface, 57-tool execution engine — all running locally on your hardware.

Install / Use

/learn @guaardvark/Guaardvark

README

Guaardvark

Version 2.5.2 · guaardvark.com

The self-hosted AI workstation. Autonomous agents that see your screen and control your apps. A three-tier neural routing engine. Parallel agent swarms across isolated git worktrees. Video generation, image upscaling to 4K/8K, RAG over your documents, voice interface, and a 57-tool execution engine — all running locally on your hardware. Your machine. Your data. Your rules.

<p align="center"> <img src="docs/screenshots/guaardvark-demo.gif" alt="Guaardvark Demo" width="100%"> </p>

License: MIT CI PyPI GitHub stars GitHub issues Sponsor

git clone https://github.com/guaardvark/guaardvark.git && cd guaardvark && ./start.sh

One command. Installs everything. Starts all services. Done.

AI-Generated Film — Made Entirely with Guaardvark

Every frame generated on a single desktop GPU. No cloud. No stock footage. No API keys.

Gotham Rising — AI-Generated Short Film


What Makes This Different

AgentBrain — Three-Tier Neural Routing

Every message is routed through a three-tier decision engine that picks the fastest path to the right answer. Reflexes fire in under a millisecond. Instinct handles single-shot requests in one LLM call. Deliberation spins up a full ReACT reasoning loop when the problem demands it.

| Agent Control | Agent Tools | |:-:|:-:| | Agents | Tools |

| Tier | Name | Latency | LLM Calls | When It Fires | |------|------|---------|-----------|---------------| | 1 | Reflex | <100ms | 0 | Greetings, farewells, media controls — pattern-matched, no inference | | 2 | Instinct | 1–3s | 1 | Single-shot questions, web searches, image generation, vision tasks | | 3 | Deliberation | 5–30s | 3–10 | Multi-step research, analysis chains, complex agent tasks |

  • Automatic escalation — Tier 2 can signal complexity and hand off to Tier 3 mid-response
  • BrainState singleton — pre-computes tool schemas, model capabilities, system prompts, and reflex tables at startup so routing adds zero overhead
  • Warm-up — background thread loads the active model into VRAM before the first request arrives

Autonomous Screen Agents

Guaardvark agents control a real virtual desktop (Xvfb + openbox at 1280x720). They see the screen through vision models, move the mouse, click buttons, type text, navigate browsers, and verify their own actions.

  • Unified vision brain — Gemma4 sees the screen and decides the next action in a single inference call. Qwen3-VL handles coordinate estimation. Both calibrated per-model with tracked scale factors.
  • Closed-loop servo targeting — three-attempt adaptive strategy: ballistic move → single correction with crosshair overlay → full corrections with zoom-cropped analysis around the cursor
  • 45+ deterministic recipes — browser navigation, tabs, scroll, search, find, zoom, copy/paste — all execute instantly from a JSON recipe library, bypassing the vision loop entirely
  • Obstacle detection — handles popups, permission dialogs, and notification bars with automatic thinking model escalation
  • Self-QA sweep — agent navigates every page of its own UI and reports what's working and what's broken
  • Live agent monitor — real-time SEE/THINK/ACT transcript of every decision the agent makes
  • Integrated screen viewer — draggable, resizable VNC viewer on any page with popup window mode

Supported Vision Models

| Model | Role | Coordinate System | Notes | |-------|------|-------------------|-------| | Gemma4 (e4b) | Sees + decides | 1024x1024 normalized, box_2d [y1,x1,y2,x2] | Unified brain — vision and reasoning in one call | | Qwen3-VL (2b) | Coordinate estimation | 1024px internal width | Default servo eyes, fast and accurate on dark UIs | | Qwen3-VL (4b/8b) | Escalation eyes | 1024px internal width | Automatic escalation after 3 consecutive failures | | Moondream | Fallback eyes | 1024px internal width | For text-only models that need external vision |

Swarm Orchestrator — Parallel Agent Execution

Launch multiple AI coding agents in parallel, each working in an isolated git worktree on its own branch. Results merge back with dependency-ordered conflict detection, optional test validation, and full cost tracking.

  • Two backends — Claude Code (cloud, cost-tracked at $0.015/$0.075 per 1K tokens) and Cline/OpenClaw (fully local via Ollama, zero cost)
  • Flight Mode — fully offline operation. Auto-detects network state, falls back to local models, serializes file conflicts automatically. No prompts, no internet required.
  • Git worktree isolation — each task gets its own branch and working directory. All worktrees share the .git directory (lightweight). Automatically excluded from git status.
  • Dependency-aware merging — topological sort ensures foundational changes land first. Dry-run conflict detection before real merge. Test suite validation before integration.
  • Built-in templates — REST API scaffold, refactor-and-extract, test coverage expansion, Flight Mode demo
  • Up to 20 concurrent agents — configurable limit with automatic slot management
  • Live dashboard — real-time status, per-task logs, cost breakdown, elapsed time, disk usage

Video Generation Pipeline

State-of-the-art video generation running entirely on your GPU. No cloud APIs, no per-minute billing, no content restrictions.

| Video Generation | Plugin System | |:-:|:-:| | Video Gen | Plugins |

| Model | Type | Max Duration | Native Resolution | VRAM | |-------|------|-------------|-------------------|------| | Wan 2.2 (14B MoE) | Text-to-Video | 5s (81 frames @ 16fps) | 832x480 | 11GB | | CogVideoX-5B | Text-to-Video | 6s (49 frames @ 8fps) | 720x480 | 16GB | | CogVideoX-2B | Text-to-Video | 6s (49 frames @ 8fps) | 720x480 | 12GB | | CogVideoX-5B I2V | Image-to-Video | 6s (49 frames @ 8fps) | 720x480 | 16GB | | SVD XT | Text-to-Video | 3.5s (25 frames @ 7fps) | 512x512 | <8GB |

  • Resolution options — 512px, 576px, 720px, 1280px, 1920px (1080p), and custom dimensions (multiples of 8)
  • Quality tiers — Fast (10 steps), Standard (30), High (40), Maximum (50)
  • Frame interpolation — 1x raw, 2x doubled FPS, 2x + upscale for cinema-quality output
  • Prompt enhancement — Cinematic, Realistic, Artistic, Anime, or raw
  • Low VRAM mode — automatically reduces resolution, frames, and inference steps for 8–12GB GPUs
  • Batch processing — queue multiple videos from a prompt list, processed by Celery workers
  • ComfyUI integration — one-click launch to the node editor for custom workflows

GPU Image Upscaling — 4K and 8K Output

Upscale images and video frames to 4K (3840px) or 8K (7680px) resolution using GPU-accelerated super-resolution models.

| Model | Scale | Size | Best For | |-------|-------|------|----------| | HAT-L SRx4 | 4x | 159 MB | Maximum quality restoration | | RealESRGAN x4plus | 4x | 64 MB | General-purpose, photorealistic | | RealESRGAN x2plus | 2x | 64 MB | Mild upscaling | | RealESRGAN x4plus (Anime) | 4x | 17 MB | Anime and stylized content | | realesr-animevideov3 | 4x | 6 MB | Video-optimized anime | | 4x-UltraSharp | 4x | 67 MB | Enhanced sharpness | | 4x NMKD-Superscale | 4x | 67 MB | Advanced super-scaling | | 4x Foolhardy Remacri | 4x | 67 MB | Texture-focused upscaling |

  • Two-pass mode — run the model twice for maximum quality
  • Precision control — FP16 (standard GPUs), BF16 (Ampere+), torch.compile for up to 3x speedup
  • Video upscaling — frame-by-frame processing with progress tracking for MP4, MKV, AVI, MOV, WebM
  • Watch folder — optional auto-processing of new files dropped into a directory

RAG That Actually Works

Chat grounded in your documents. Upload files, build a knowledge base, and ask questions. The AI reads and understands your content — not just keyword matching.

| Chat with RAG | Document Manager | |:-:|:-:| | Chat | Documents |

  • Hybrid retrieval — BM25 keyword + vector semantic search combined
  • Smart chunking — code files get AST-informed chunking, prose gets semantic splitting
  • Multiple embedding models — switch between lightweight (300M) and high-quality (4B+) via UI
  • RAG Autoresearch — autonomous optimization loop that experiments with parameters, keeps improvements, reverts regressions
  • Entity extraction — automatic entity and relationship indexing
  • Per-project isolation — each project has its own knowledge base and chat context

Self-Improving AI

The system runs its own test suite, identifies failures, dispatches an AI agent to read the code and fix the bugs, verifies the fix, and broadcasts the learning to other instances. No human in the loop.

  • Three modes — Scheduled (every 6 hours), Reactive (triggered by repeated 500 errors), Directed (manual tasks)
  • Guardian review — Uncle Claude (Anthropic API) reviews code changes for safety before applying, with risk levels and halt directives
  • **Veri

Related Skills

View on GitHub
GitHub Stars14
CategoryContent
Updated1d ago
Forks13

Languages

Python

Security Score

95/100

Audited on Apr 7, 2026

No findings