g3 - AI Coding Agent

g3 is a coding AI agent designed to help you complete tasks by writing code and executing commands. Built in Rust, it provides a flexible architecture for interacting with various Large Language Model (LLM) providers while offering powerful code generation and task automation capabilities.

Architecture Overview

g3 follows a modular architecture organized as a Rust workspace with multiple crates, each responsible for specific functionality:

Core Components

g3-core

The heart of the agent system, containing:

Agent Engine: Main orchestration logic for handling conversations, tool execution, and task management
Context Window Management: Intelligent tracking of token usage with context thinning (50-80%) and auto-compaction at 80% capacity
Tool System: Built-in tools for file operations, shell commands, computer control, TODO management, and structured output
Streaming Response Parser: Real-time parsing of LLM responses with tool call detection and execution
Task Execution: Support for single and iterative task execution with automatic retry logic

g3-providers

Abstraction layer for LLM providers:

Provider Interface: Common trait-based API for different LLM backends
Multiple Provider Support:
- Anthropic (Claude models)
- Databricks (DBRX and other models)
- Local/embedded models via llama.cpp with Metal acceleration on macOS
OAuth Authentication: Built-in OAuth flow support for secure provider authentication
Provider Registry: Dynamic provider management and selection

g3-config

Configuration management system:

Environment-based configuration
Provider credentials and settings
Model selection and parameters
Runtime configuration options

g3-execution

Task execution framework:

Task planning and decomposition
Execution strategies (sequential, parallel)
Error handling and retry mechanisms
Progress tracking and reporting

g3-computer-control

Computer control capabilities:

Mouse and keyboard automation
UI element inspection and interaction
Screenshot capture and window management
OCR text extraction via Tesseract

g3-cli

Command-line interface:

Interactive terminal interface
Task submission and monitoring
Configuration management commands
Session management

Error Handling & Resilience

g3 includes robust error handling with automatic retry logic:

Recoverable Error Detection: Automatically identifies recoverable errors (rate limits, network issues, server errors, timeouts)
Exponential Backoff with Jitter: Implements intelligent retry delays to avoid overwhelming services
Detailed Error Logging: Captures comprehensive error context including stack traces, request/response data, and session information
Error Persistence: Saves detailed error logs to .g3/errors/ for post-mortem analysis
Graceful Degradation: Non-recoverable errors are logged with full context before terminating

Tool Call Duplicate Detection

g3 includes intelligent duplicate detection to prevent the LLM from accidentally calling the same tool twice in a row:

Sequential Duplicate Prevention: Only immediately sequential identical tool calls are blocked
Text Separation Allowed: If there's any text between tool calls, they're not considered duplicates
Session-Wide Reuse: Tools can be called multiple times throughout a session - only back-to-back duplicates are prevented

This catches cases where the LLM "stutters" and outputs the same tool call twice, while still allowing legitimate re-use of tools.

Timing Footer

After each response, g3 displays a timing footer showing elapsed time, time to first token, token usage (from the LLM, not estimated), and current context window usage percentage. The token and context info is displayed dimmed for a clean interface.

Key Features

Intelligent Context Management

Automatic context window monitoring with percentage-based tracking
Smart auto-compaction when approaching token limits
Context thinning at 50%, 60%, 70%, 80% thresholds - automatically replaces large tool results with file references
Conversation history preservation through summaries
Dynamic token allocation for different providers (4k to 200k+ tokens)

Interactive Control Commands

g3's interactive CLI includes control commands for manual context management:

/compact: Manually trigger compaction to compact conversation history
/thinnify: Manually trigger context thinning to replace large tool results with file references
/skinnify: Manually trigger full context thinning (like /thinnify but processes the entire context window, not just the first third)
/readme: Reload README.md and AGENTS.md from disk without restarting
/stats: Show detailed context and performance statistics
/help: Display all available control commands

These commands give you fine-grained control over context management, allowing you to proactively optimize token usage and refresh project documentation. See Control Commands Documentation for detailed usage.

Tool Ecosystem

File Operations: Read, write, and edit files with line-range precision
Shell Integration: Execute system commands with output capture
Code Generation: Structured code generation with syntax awareness
TODO Management: Read and write TODO lists with markdown checkbox format
Computer Control (Experimental): Automate desktop applications
- Mouse and keyboard control
- UI element inspection
- Screenshot capture and window management
- Window listing and identification
Code Search: Embedded tree-sitter for syntax-aware code search (Rust, Python, JavaScript, TypeScript, Go, Java, C, C++) - see Code Search Guide
Final Output: Formatted result presentation

Agent Skills

g3 supports the Agent Skills specification - an open format for portable skill packages that give the agent new capabilities.

Skill Locations (in priority order, later overrides earlier):

Embedded skills (compiled into binary)
Global: ~/.g3/skills/
Extra paths from config
Workspace: .g3/skills/
Repo: skills/ (highest priority, checked into git)

SKILL.md Format:

---
name: pdf-processing          # Required: 1-64 chars, lowercase + hyphens
description: Extract text...  # Required: 1-1024 chars, when to use
license: Apache-2.0           # Optional
compatibility: Requires git   # Optional: environment requirements
---

# PDF Processing

Detailed instructions for the agent...

Configuration (in g3.toml):

[skills]
enabled = true                    # Default: true
extra_paths = ["/path/to/skills"] # Additional skill directories

At startup, g3 scans skill directories and injects a summary into the system prompt. When the agent needs a skill, it reads the full SKILL.md using the read_file tool.

Each skill adds ~50-100 tokens to context (name + description + path). Skills can include:

scripts/ - Executable code (Python, Bash, etc.)
references/ - Additional documentation
assets/ - Templates, data files

Embedded Skills: Core skills like research are compiled into the binary, ensuring they work anywhere without external files. Embedded scripts are automatically extracted to .g3/bin/ on first use.

Built-in Research Skill: Perform asynchronous web research via background_process("research", ".g3/bin/g3-research 'your query'"). Results are saved to .g3/research/<id>/report.md.

See Skills Guide for detailed documentation.

Provider Flexibility

Support for multiple LLM providers through a unified interface
Hot-swappable providers without code changes
Provider-specific optimizations and feature support
Local model support for offline operation

Embedded Models (Local LLMs)

g3 supports local models via llama.cpp with Metal acceleration on macOS. Here's a performance comparison for agentic tasks (multi-step tool-calling workflows):

Test case: Comic book repacking - extract CBR/CBZ archives, reorder files preserving page and issue order, repack into single archive. Requires correct sequencing, file handling, and no race conditions.

Cloud Models (Baseline)

| Model | Agentic Score | Notes | |-------|---------------|-------| | Claude Opus 4.5 | ⭐⭐⭐⭐⭐ | Flawless execution | | Gemini 3 Pro | ⭐⭐⭐⭐⭐ | Flawless, fast execution | | Claude Sonnet 4.5 | ⭐⭐⭐⭐ | Good, occasional issues | | Claude 4 family | ⭐⭐⭐ | Gets there eventually, needs manual checking |

Local Models

| Model | Size | Speed | Agentic Score | Notes | |-------|------|-------|---------------|-------| | ~~Qwen3-32B~~ (Dense) | 18 GB | Slow | ❌ | Good reasoning, but flails on execution and crashes | | Qwen3-14B | 8.4 GB | Medium | ⭐⭐ | Understands tasks but makes implementation errors | | GLM-4 9B | 5.7 GB | Fast | ⭐⭐ | Works with adapter (strips code fences) | | Qwen3-4B | 2.3 GB | Very Fast | ❌ | Generates malformed tool calls - not for agentic use | | ~~Qwen3-30B-A3B~~ (MoE) | 17 GB | Very Fast | ❌ | Avoid - loops infinitely on tool calls |

Key findings:

Dense models (Qwen3-32B, Qwen3-14B) handle agentic loops correctly
MoE models (Qwen3-30B-A3B) are fast but don't know when to stop tool-calling
Metal GPU works well with dense models on Apple Silicon
Even the best local models (32B) lag significantly behind Claude Opus 4.5 on complex tasks
Local models are best for simpler agentic tasks or when offline/privacy is required

Configuration example:

[providers.embedded.qwen3-big]
model_path = "~/.g3/models/Qwen_Qwen3-32B-Q4_K_M.gguf"
model_type = "qwen"
context_length = 40960
gpu_layers = 99  # Full GPU offload on Apple Silicon

Task Automation

Single-shot task execution for quic

G3

Install / Use

README