DTS
🌳 MCTS-inspired parallel beam search for conversation optimization. Explore multiple dialogue strategies simultaneously, stress-test against diverse user personas, score with multi-judge consensus, and discover winning conversation paths that single-shot LLMs miss.
Install / Use
/learn @MVPandey/DTSREADME
Dialogue Tree Search (DTS)
An LLM-powered tree search engine for multi-turn conversation optimization.
DTS explores conversation strategies in parallel, simulates diverse user reactions, scores trajectories with multi-judge consensus, and prunes underperformers—finding optimal dialogue paths that single-shot LLM responses miss.
Real-time tree exploration with strategy scoring, conversation playback, and detailed evaluation breakdowns
Table of Contents
- Why DTS?
- How It Works
- System Architecture
- Prerequisites & API Keys
- Installation
- Quick Start
- Configuration
- Deep Research Integration
- API Reference
- Frontend Visualizer
- Project Structure
- Token Usage & Cost Management
- Troubleshooting
- License
Why DTS?
Standard LLMs generate responses one turn at a time, optimizing locally without considering long-term conversation outcomes. This leads to:
- Myopic responses that sound good but lead to dead ends
- Single-path thinking that misses better strategic approaches
- Fragile strategies that fail when users respond unexpectedly
DTS solves this by treating conversation as a tree search problem:
- Explore multiple strategies in parallel (not just one response)
- Simulate diverse user reactions (skeptical, enthusiastic, confused, etc.)
- Score complete trajectories against your goal
- Prune bad paths early to focus computation on promising directions
The result: dialogue strategies that are robust, goal-oriented, and tested against varied user behaviors.
How It Works
The Algorithm
DTS implements a parallel beam search with the following loop:
For each round:
1. Generate N diverse conversation strategies
2. For each strategy, simulate K user intent variants
3. Roll out multi-turn conversations for each branch
4. Score all trajectories with 3 independent judges
5. Prune branches below threshold (median vote)
6. Backpropagate scores up the tree
7. Repeat with surviving branches
Parallel Beam Search
Unlike traditional single-path generation, DTS maintains multiple conversation branches simultaneously:
graph TD
subgraph Round 1
Root[User Message] --> S1[Strategy: Empathetic]
Root --> S2[Strategy: Direct]
Root --> S3[Strategy: Socratic]
end
subgraph Round 2
S1 --> S1I1[Intent: Cooperative]
S1 --> S1I2[Intent: Skeptical]
S2 --> S2I1[Intent: Cooperative]
S2 --> S2I2[Intent: Resistant]
end
subgraph Scoring
S1I1 --> J1((Judge 1))
S1I1 --> J2((Judge 2))
S1I1 --> J3((Judge 3))
J1 & J2 & J3 --> M{Median Vote}
end
M -->|Score ≥ 6.5| Keep[Keep Branch]
M -->|Score < 6.5| Prune[Prune Branch]
Branches are color-coded by score: green (passing), yellow (borderline), red (pruned)
Key parameters:
init_branches: Number of initial strategies (default: 6)turns_per_branch: Conversation depth per branch (default: 5)max_concurrency: Parallel LLM calls (default: 16)
User Intent Forking (Optional)
Most dialogue systems assume a single "happy path" user response. DTS can stress-test strategies against diverse user personas when enabled.
User Variability Mode:
user_variability=False(default): Uses a fixed "healthily critical + engaged" persona for consistent, realistic testinguser_variability=True: Generates diverse user intents for robustness testing across user types
When variability is enabled, possible user personas include:
| Emotional Tone | Cognitive Stance | Example Behavior |
|:---------------|:-----------------|:-----------------|
| engaged | accepting | Cooperative, follows suggestions |
| skeptical | questioning | Asks for evidence, challenges claims |
| confused | exploring | Needs clarification, misunderstands |
| resistant | challenging | Pushes back, disagrees |
| anxious | withdrawing | Hesitant, wants to end conversation |
Each strategy can fork into K intent variants (configurable via user_intents_per_branch), creating branches that prove robustness across user types.
UserIntent structure:
UserIntent(
id="skeptical_questioner",
label="Skeptical Questioner",
description="Demands evidence before accepting claims",
emotional_tone="skeptical", # How user feels
cognitive_stance="questioning", # How user thinks
)
Multi-Judge Scoring
Each trajectory is evaluated by 3 independent LLM judges. Scores are aggregated via median voting (robust to outlier judges):
Judge 1: 7.2 ─┐
Judge 2: 6.8 ─┼─► Median: 7.2 ─► Pass (≥ 6.5)
Judge 3: 8.1 ─┘
Why 3 judges?
- Single judge = high variance, easily gamed
- Median of 3 = robust to one outlier
- Majority vote determines pass/fail (2 of 3 must pass)
Scoring criteria (each 0-1, summed to 0-10):
- Goal achievement
- User need addressed
- Forward progress
- Clarity & coherence
- Appropriate tone
- Information accuracy
- Handling objections
- Building rapport
- Conversation flow
- Strategic effectiveness
High-Scoring Branch (9.2/10)

Pruned Branch (4.1/10)

Left: A successful trajectory with detailed strengths. Right: A pruned branch showing weaknesses and why it failed.
Scoring Modes: Comparative vs Absolute

DTS supports two evaluation modes:
| Mode | How It Works | Best For | |:-----|:-------------|:---------| | Comparative | Sibling branches force-ranked against each other | Sharp discrimination, finding the single best path | | Absolute | Each branch scored independently (0-10) | Early pruning, filtering obviously bad paths |
Comparative mode (default):
Input: [Strategy A, Strategy B, Strategy C] (siblings)
Output: A=7.5, B=6.0, C=4.5 (forced ranking with 1.5-point gaps)
Absolute mode:
Input: Strategy A (evaluated alone)
Output: 3 judges → [7.2, 6.8, 8.1] → Median: 7.2
Use scoring_mode="comparative" when you need the best single answer.
Use scoring_mode="absolute" when filtering many branches quickly.
System Architecture
sequenceDiagram
participant User
participant FE as Frontend (HTML/JS)
participant API as FastAPI WebSocket
participant ENG as DTS Engine
participant LLM as OpenRouter/OpenAI
participant RES as Firecrawl + Tavily
User->>FE: Configure & Start Search
FE->>API: WebSocket Connect
API->>ENG: Initialize DTSEngine
opt Deep Research Enabled
ENG->>RES: Research Query
RES-->>ENG: Domain Context
end
loop For Each Round
ENG->>LLM: Generate Strategies
LLM-->>ENG: N Strategies
loop For Each Branch
ENG->>LLM: Generate User Intents
ENG->>LLM: Simulate Conversation
ENG->>LLM: Judge Trajectory (3x)
end
ENG->>API: Emit Events (node_added, scored, pruned)
API-->>FE: Stream Updates
FE->>User: Update Visualization
end
ENG->>API: Complete with Best Path
FE->>User: Show Results
Component Overview
| Component | Location | Purpose |
|:----------|:---------|:--------|
| DTSEngine | backend/core/dts/engine.py | Main orchestrator, runs expand→score→prune loop |
| StrategyGenerator | backend/core/dts/components/generator.py | Creates strategies and user intents |
| ConversationSimulator | backend/core/dts/components/simulator.py | Runs multi-turn dialogue rollouts |
| TrajectoryEvaluator | backend/core/dts/components/evaluator.py | Multi-judge scoring with median aggregation |
| DeepResearcher | backend/core/dts/components/researcher.py | GPT-Researcher integration for context |
| DialogueTree | backend/core/dts/tree.py | Tree data structure with backpropagation |
| LLM Client | backend/llm/client.py | Provider-agnostic OpenAI-compatible wrapper |
Prerequisites & API Keys
Required Credentials
| Service | Environment Variable | Required | Purpose |
|:--------|:--------------------|:---------|:--------|
| LLM Provider | OPENROUTER_API_KEY | Yes | Strategy generation, simulation, and judging |
| Web Scraping | FIRECRAWL_API_KEY | For Deep Research | Scrapes web pages for research context |
| Web Search | TAVILY_API_KEY | For Deep Research | Searches the web for relevant sources |
Getting API Keys
- OpenRouter (recommended): openrouter.ai/keys
- Works with 100+ models (GPT-4, Claude, Gemini, open-source)
- Pay-per-token, no subscriptions
- Set
Related Skills
canvas
337.4kCanvas Skill Display HTML content on connected OpenClaw nodes (Mac app, iOS, Android). Overview The canvas tool lets you present web content on any connected node's canvas view. Great for: -
claude-opus-4-5-migration
83.2kMigrate prompts and code from Claude Sonnet 4.0, Sonnet 4.5, or Opus 4.1 to Opus 4.5
model-usage
337.4kUse CodexBar CLI local cost usage to summarize per-model usage for Codex or Claude, including the current (most recent) model or a full model breakdown. Trigger when asked for model-level usage/cost data from codexbar, or when you need a scriptable per-model summary from codexbar cost JSON.
TrendRadar
49.8k⭐AI-driven public opinion & trend monitor with multi-platform aggregation, RSS, and smart alerts.🎯 告别信息过载,你的 AI 舆情监控助手与热点筛选工具!聚合多平台热点 + RSS 订阅,支持关键词精准筛选。AI 智能筛选新闻 + AI 翻译 + AI 分析简报直推手机,也支持接入 MCP 架构,赋能 AI 自然语言对话分析、情感洞察与趋势预测等。支持 Docker ,数据本地/云端自持。集成微信/飞书/钉钉/Telegram/邮件/ntfy/bark/slack 等渠道智能推送。
