Voicetest
Test harness for voice agents. Import from Retell, VAPI, Bland, LiveKit. Run autonomous simulations. Evaluate with LLM judges.
Install / Use
/learn @voicetestdev/VoicetestREADME
A generic test harness for voice agent workflows. Test agents from Retell, VAPI, LiveKit, Bland, Telnyx, and custom sources using a unified execution and evaluation model.
Installation
uv tool install voicetest
Or add to a project (use uv run voicetest to run):
uv add voicetest
Or with pip:
pip install voicetest

Quick Start
Try voicetest with a sample healthcare receptionist agent and tests:
# Set up an API key (free, no credit card at https://console.groq.com)
export GROQ_API_KEY=gsk_...
# Load demo and start interactive shell
voicetest demo
# Or load demo and start web UI
voicetest demo --serve
Tip: If you have Claude Code installed, you can skip API key setup entirely and use
claudecode/sonnetas your model. See Claude Code Passthrough for details.
The demo includes a healthcare receptionist agent with 8 test cases covering appointment scheduling, identity verification, and more.

Interactive Shell
# Launch interactive shell (default)
uv run voicetest
# In the shell:
> agent tests/fixtures/retell/sample_config.json
> tests tests/fixtures/retell/sample_tests.json
> set agent_model ollama_chat/qwen2.5:0.5b
> run
CLI Commands
# List available importers
voicetest importers
# Run tests against an agent definition
voicetest run --agent agent.json --tests tests.json --all
# Export agent to different formats
voicetest export --agent agent.json --format mermaid # Diagram
voicetest export --agent agent.json --format livekit # Python code
voicetest export --agent agent.json --format retell-llm # Retell LLM JSON
voicetest export --agent agent.json --format retell-cf # Retell Conversation Flow JSON
voicetest export --agent agent.json --format vapi-assistant # VAPI Assistant JSON
voicetest export --agent agent.json --format vapi-squad # VAPI Squad JSON
voicetest export --agent agent.json --format bland # Bland AI JSON
voicetest export --agent agent.json --format telnyx # Telnyx AI JSON
voicetest export --agent agent.json --format voicetest # Voicetest JSON (.vt.json)
# Launch full TUI
voicetest tui --agent agent.json --tests tests.json
# Start REST API server with Web UI
voicetest serve
# Start infrastructure (LiveKit, Whisper, Kokoro) + backend for live calls
voicetest up
# Stop infrastructure services
voicetest down
Core Concepts
Agent Graphs
An agent is represented as an AgentGraph: a directed graph of nodes connected by transitions. Each node has a prompt, a type, and outgoing edges that control conversation flow. The graph has a single entry_node_id where every conversation starts.
Node Types
| Type | LLM Call | Speech | Routing |
| ---------------- | ---------------- | ------ | -------------------------------------------------------------------------- |
| Conversation | Yes | Yes | LLM picks a transition via prompt match, or falls back to an always edge |
| Logic | No | No | Evaluates equations top-to-bottom; first match wins |
| Extract | Yes (extraction) | No | LLM extracts variables from the conversation, then equations route |
Any node type can also be a global node — reachable from any conversation node without explicit edges. See Global Nodes below.
Conversation nodes are the standard building block — they generate a spoken response and use LLM judgment (or an always edge) to choose the next node.
Logic nodes (also called branch nodes) have no prompt and produce no speech. All their transitions use equation or always conditions, evaluated deterministically without an LLM call.
Extract nodes combine LLM extraction with deterministic routing. They define variables_to_extract (each with a name, description, type, and optional choices). The engine calls the LLM once to extract all variables from the conversation history, stores them as dynamic variables, then evaluates equation transitions using the extracted values.
Global Nodes
Global nodes are reachable from any conversation node in the flow without requiring explicit edges from every source. They are a Retell Conversation Flow concept supported in the IR.
Each global node has a global_node_setting containing:
condition— An LLM prompt that triggers entry (e.g., "Caller wants to cancel")go_back_conditions— LLM-prompted conditions that return to the originating node
The engine appends global node conditions to every conversation node's transition options. The LLM sees both local transitions and global entry conditions, and picks the best match. When a global node is entered, the engine tracks the originating node. Go-back conditions target the originator, effectively resuming the previous conversation with transcript context intact.
Stacking: Global nodes can trigger other global nodes. The engine maintains an originator stack — each go-back pops one level.
Zero global nodes: When a flow has no global nodes, behavior is identical to before. The format_transitions signature is backward-compatible.
Dynamic Variables
Prompts can reference dynamic variables using {{variable_name}} syntax. Variables come from two sources:
- Test case
dynamic_variables: Set before the conversation starts (e.g.,{{caller_name}},{{account_id}}) - Extract node output: Populated during the conversation when an extract node fires
Expansion order: snippet references {%name%} are resolved first, then {{variable}} placeholders are substituted into the result. Unknown variables are left as-is.
Equations
Equation conditions on transitions support these operators:
| Operator | Example | Notes |
| ----------------- | -------------------------- | ------------------------------------------------- |
| == | status == "active" | String equality |
| != | tier != "free" | String inequality |
| > >= < <= | age >= 18 | Numeric coercion; non-numeric values return false |
| contains | notes contains "urgent" | Substring match |
| not_contains | reply not_contains "err" | Substring absence |
| exists | email exists | Variable is set |
| not_exist | phone not_exist | Variable is absent |
Multiple clauses combine with logical_operator: "and" (default, all must match) or "or" (any must match).
Test Cases
Test cases define simulated conversations to run against an agent:
[
{
"name": "Customer billing inquiry",
"user_prompt": "## Identity\nYour name is Jane.\n\n## Goal\nGet help with a charge on your bill.",
"metrics": ["Agent greeted the customer and addressed the billing concern"],
"dynamic_variables": {"caller_name": "Jane", "account_id": "12345"},
"tool_mocks": [],
"type": "simulation"
}
]
type: "simulation"— The engine simulates both agent and user, running a full multi-turn conversationmetrics— LLM judges evaluate each metric against the transcript and produce a 0–1 scoredynamic_variables— Key-value pairs injected into{{var}}placeholders before the conversation starts
CLI Reference
Testing
# Run tests against an agent definition
voicetest run --agent agent.json --tests tests.json --all
# Chat with an agent interactively
voicetest chat -a agent.json --model openai/gpt-4o --var name=Jane --var account=12345
# Evaluate a transcript against metrics (no simulation)
voicetest evaluate -t transcript.json -m "Agent was polite" -m "Agent resolved the issue"
# Diagnose test failures and suggest fixes
voicetest diagnose -a agent.json -t tests.json
voicetest diagnose -a agent.json -t tests.json --auto-fix --save fixed_agent.json
# Decompose an agent into sub-agents
voicetest decompose -a agent.json -o output/ [--num-agents N] [--model ID]
Agent Management
| Command | Description |
| -------------------------------------------------------------------------- | -------------------------------------- |
| voicetest agent list | List agents in the database |
| voicetest agent create -a agent.json --name "My Agent" | Create an agent from a definition file |
| voicetest agent get <agent-id> | Get agent details |
| voicetest agent update <agent-id> --name "Renamed" --model openai/gpt-4o | Update
Related Skills
gh-issues
349.0kFetch GitHub issues, spawn sub-agents to implement fixes and open PRs, then monitor and address PR review comments. Usage: /gh-issues [owner/repo] [--label bug] [--limit 5] [--milestone v1.0] [--assignee @me] [--fork user/repo] [--watch] [--interval 5] [--reviews-only] [--cron] [--dry-run] [--model glm-5] [--notify-channel -1002381931352]
node-connect
349.0kDiagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps
frontend-design
109.4kCreate distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.
Writing Hookify Rules
109.4kThis skill should be used when the user asks to "create a hookify rule", "write a hook rule", "configure hookify", "add a hookify rule", or needs guidance on hookify rule syntax and patterns.
