Voicetest

Test harness for voice agents. Import from Retell, VAPI, Bland, LiveKit. Run autonomous simulations. Evaluate with LLM judges.

Generate Convert Improve

Install / Use

/learn @voicetestdev/Voicetest

About this skill

Quality Score

0/100

README

A generic test harness for voice agent workflows. Test agents from Retell, VAPI, LiveKit, Bland, Telnyx, and custom sources using a unified execution and evaluation model.

Installation

uv tool install voicetest

Or add to a project (use uv run voicetest to run):

uv add voicetest

Or with pip:

pip install voicetest

Web UI Demo (light)

Quick Start

Try voicetest with a sample healthcare receptionist agent and tests:

# Set up an API key (free, no credit card at https://console.groq.com)
export GROQ_API_KEY=gsk_...

# Load demo and start interactive shell
voicetest demo

# Or load demo and start web UI
voicetest demo --serve

Tip: If you have Claude Code installed, you can skip API key setup entirely and use claudecode/sonnet as your model. See Claude Code Passthrough for details.

The demo includes a healthcare receptionist agent with 8 test cases covering appointment scheduling, identity verification, and more.

CLI Demo

Interactive Shell

# Launch interactive shell (default)
uv run voicetest

# In the shell:
> agent tests/fixtures/retell/sample_config.json
> tests tests/fixtures/retell/sample_tests.json
> set agent_model ollama_chat/qwen2.5:0.5b
> run

CLI Commands

# List available importers
voicetest importers

# Run tests against an agent definition
voicetest run --agent agent.json --tests tests.json --all

# Export agent to different formats
voicetest export --agent agent.json --format mermaid         # Diagram
voicetest export --agent agent.json --format livekit         # Python code
voicetest export --agent agent.json --format retell-llm      # Retell LLM JSON
voicetest export --agent agent.json --format retell-cf       # Retell Conversation Flow JSON
voicetest export --agent agent.json --format vapi-assistant  # VAPI Assistant JSON
voicetest export --agent agent.json --format vapi-squad      # VAPI Squad JSON
voicetest export --agent agent.json --format bland           # Bland AI JSON
voicetest export --agent agent.json --format telnyx          # Telnyx AI JSON
voicetest export --agent agent.json --format voicetest       # Voicetest JSON (.vt.json)

# Launch full TUI
voicetest tui --agent agent.json --tests tests.json

# Start REST API server with Web UI
voicetest serve

# Start infrastructure (LiveKit, Whisper, Kokoro) + backend for live calls
voicetest up

# Stop infrastructure services
voicetest down

Core Concepts

Agent Graphs

An agent is represented as an AgentGraph: a directed graph of nodes connected by transitions. Each node has a prompt, a type, and outgoing edges that control conversation flow. The graph has a single entry_node_id where every conversation starts.

Node Types

| Type | LLM Call | Speech | Routing | | ---------------- | ---------------- | ------ | -------------------------------------------------------------------------- | | Conversation | Yes | Yes | LLM picks a transition via prompt match, or falls back to an always edge | | Logic | No | No | Evaluates equations top-to-bottom; first match wins | | Extract | Yes (extraction) | No | LLM extracts variables from the conversation, then equations route |

Any node type can also be a global node — reachable from any conversation node without explicit edges. See Global Nodes below.

Conversation nodes are the standard building block — they generate a spoken response and use LLM judgment (or an always edge) to choose the next node.

Logic nodes (also called branch nodes) have no prompt and produce no speech. All their transitions use equation or always conditions, evaluated deterministically without an LLM call.

Extract nodes combine LLM extraction with deterministic routing. They define variables_to_extract (each with a name, description, type, and optional choices). The engine calls the LLM once to extract all variables from the conversation history, stores them as dynamic variables, then evaluates equation transitions using the extracted values.

Global Nodes

Global nodes are reachable from any conversation node in the flow without requiring explicit edges from every source. They are a Retell Conversation Flow concept supported in the IR.

Each global node has a global_node_setting containing:

condition — An LLM prompt that triggers entry (e.g., "Caller wants to cancel")
go_back_conditions — LLM-prompted conditions that return to the originating node

The engine appends global node conditions to every conversation node's transition options. The LLM sees both local transitions and global entry conditions, and picks the best match. When a global node is entered, the engine tracks the originating node. Go-back conditions target the originator, effectively resuming the previous conversation with transcript context intact.

Stacking: Global nodes can trigger other global nodes. The engine maintains an originator stack — each go-back pops one level.

Zero global nodes: When a flow has no global nodes, behavior is identical to before. The format_transitions signature is backward-compatible.

Dynamic Variables

Prompts can reference dynamic variables using {{variable_name}} syntax. Variables come from two sources:

Test case dynamic_variables: Set before the conversation starts (e.g., {{caller_name}}, {{account_id}})
Extract node output: Populated during the conversation when an extract node fires

Expansion order: snippet references {%name%} are resolved first, then {{variable}} placeholders are substituted into the result. Unknown variables are left as-is.

Equations

Equation conditions on transitions support these operators:

| Operator | Example | Notes | | ----------------- | -------------------------- | ------------------------------------------------- | | == | status == "active" | String equality | | != | tier != "free" | String inequality | | > >= < <= | age >= 18 | Numeric coercion; non-numeric values return false | | contains | notes contains "urgent" | Substring match | | not_contains | reply not_contains "err" | Substring absence | | exists | email exists | Variable is set | | not_exist | phone not_exist | Variable is absent |

Multiple clauses combine with logical_operator: "and" (default, all must match) or "or" (any must match).

Test Cases

Test cases define simulated conversations to run against an agent:

[
  {
    "name": "Customer billing inquiry",
    "user_prompt": "## Identity\nYour name is Jane.\n\n## Goal\nGet help with a charge on your bill.",
    "metrics": ["Agent greeted the customer and addressed the billing concern"],
    "dynamic_variables": {"caller_name": "Jane", "account_id": "12345"},
    "tool_mocks": [],
    "type": "simulation"
  }
]

type: "simulation" — The engine simulates both agent and user, running a full multi-turn conversation
metrics — LLM judges evaluate each metric against the transcript and produce a 0–1 score
dynamic_variables — Key-value pairs injected into {{var}} placeholders before the conversation starts

CLI Reference

Testing

# Run tests against an agent definition
voicetest run --agent agent.json --tests tests.json --all

# Chat with an agent interactively
voicetest chat -a agent.json --model openai/gpt-4o --var name=Jane --var account=12345

# Evaluate a transcript against metrics (no simulation)
voicetest evaluate -t transcript.json -m "Agent was polite" -m "Agent resolved the issue"

# Diagnose test failures and suggest fixes
voicetest diagnose -a agent.json -t tests.json
voicetest diagnose -a agent.json -t tests.json --auto-fix --save fixed_agent.json

# Decompose an agent into sub-agents
voicetest decompose -a agent.json -o output/ [--num-agents N] [--model ID]

Agent Management

| Command | Description | | -------------------------------------------------------------------------- | -------------------------------------- | | voicetest agent list | List agents in the database | | voicetest agent create -a agent.json --name "My Agent" | Create an agent from a definition file | | voicetest agent get <agent-id> | Get agent details | | voicetest agent update <agent-id> --name "Renamed" --model openai/gpt-4o | Update

Related Skills

gh-issues

349.0k

Fetch GitHub issues, spawn sub-agents to implement fixes and open PRs, then monitor and address PR review comments. Usage: /gh-issues [owner/repo] [--label bug] [--limit 5] [--milestone v1.0] [--assignee @me] [--fork user/repo] [--watch] [--interval 5] [--reviews-only] [--cron] [--dry-run] [--model glm-5] [--notify-channel -1002381931352]

node-connect

349.0k

Diagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps

frontend-design

109.4k

Create distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.

Writing Hookify Rules

109.4k

This skill should be used when the user asks to "create a hookify rule", "write a hook rule", "configure hookify", "add a hookify rule", or needs guidance on hookify rule syntax and patterns.

voicetestdev

View profile

View on GitHub

GitHub Stars12

CategoryDevelopment

Updated7d ago

Forks1

voicetestdev/voicetest

Languages

Python

Security Score

95/100

Audited on Mar 29, 2026

No findings