DTS

🌳 MCTS-inspired parallel beam search for conversation optimization. Explore multiple dialogue strategies simultaneously, stress-test against diverse user personas, score with multi-judge consensus, and discover winning conversation paths that single-shot LLMs miss.

Generate Convert Improve

Install / Use

/learn @MVPandey/DTS

About this skill

Quality Score

0/100

README

Dialogue Tree Search (DTS)

An LLM-powered tree search engine for multi-turn conversation optimization.

DTS explores conversation strategies in parallel, simulates diverse user reactions, scores trajectories with multi-judge consensus, and prunes underperformers—finding optimal dialogue paths that single-shot LLM responses miss.

DTS Visualizer Real-time tree exploration with strategy scoring, conversation playback, and detailed evaluation breakdowns

Why DTS?
How It Works
System Architecture
Prerequisites & API Keys
Installation
Quick Start
Configuration
Deep Research Integration
API Reference
Frontend Visualizer
Project Structure
Token Usage & Cost Management
Troubleshooting
License

Why DTS?

Standard LLMs generate responses one turn at a time, optimizing locally without considering long-term conversation outcomes. This leads to:

Myopic responses that sound good but lead to dead ends
Single-path thinking that misses better strategic approaches
Fragile strategies that fail when users respond unexpectedly

DTS solves this by treating conversation as a tree search problem:

Explore multiple strategies in parallel (not just one response)
Simulate diverse user reactions (skeptical, enthusiastic, confused, etc.)
Score complete trajectories against your goal
Prune bad paths early to focus computation on promising directions

The result: dialogue strategies that are robust, goal-oriented, and tested against varied user behaviors.

How It Works

The Algorithm

DTS implements a parallel beam search with the following loop:

For each round:
    1. Generate N diverse conversation strategies
    2. For each strategy, simulate K user intent variants
    3. Roll out multi-turn conversations for each branch
    4. Score all trajectories with 3 independent judges
    5. Prune branches below threshold (median vote)
    6. Backpropagate scores up the tree
    7. Repeat with surviving branches

Parallel Beam Search

Unlike traditional single-path generation, DTS maintains multiple conversation branches simultaneously:

graph TD
    subgraph Round 1
        Root[User Message] --> S1[Strategy: Empathetic]
        Root --> S2[Strategy: Direct]
        Root --> S3[Strategy: Socratic]
    end

    subgraph Round 2
        S1 --> S1I1[Intent: Cooperative]
        S1 --> S1I2[Intent: Skeptical]
        S2 --> S2I1[Intent: Cooperative]
        S2 --> S2I2[Intent: Resistant]
    end

    subgraph Scoring
        S1I1 --> J1((Judge 1))
        S1I1 --> J2((Judge 2))
        S1I1 --> J3((Judge 3))
        J1 & J2 & J3 --> M{Median Vote}
    end

    M -->|Score ≥ 6.5| Keep[Keep Branch]
    M -->|Score < 6.5| Prune[Prune Branch]

Tree Visualization Branches are color-coded by score: green (passing), yellow (borderline), red (pruned)

Key parameters:

init_branches: Number of initial strategies (default: 6)
turns_per_branch: Conversation depth per branch (default: 5)
max_concurrency: Parallel LLM calls (default: 16)

User Intent Forking (Optional)

Most dialogue systems assume a single "happy path" user response. DTS can stress-test strategies against diverse user personas when enabled.

User Variability Mode:

user_variability=False (default): Uses a fixed "healthily critical + engaged" persona for consistent, realistic testing
user_variability=True: Generates diverse user intents for robustness testing across user types

When variability is enabled, possible user personas include:

| Emotional Tone | Cognitive Stance | Example Behavior | |:---------------|:-----------------|:-----------------| | engaged | accepting | Cooperative, follows suggestions | | skeptical | questioning | Asks for evidence, challenges claims | | confused | exploring | Needs clarification, misunderstands | | resistant | challenging | Pushes back, disagrees | | anxious | withdrawing | Hesitant, wants to end conversation |

Each strategy can fork into K intent variants (configurable via user_intents_per_branch), creating branches that prove robustness across user types.

UserIntent structure:

UserIntent(
    id="skeptical_questioner",
    label="Skeptical Questioner",
    description="Demands evidence before accepting claims",
    emotional_tone="skeptical",      # How user feels
    cognitive_stance="questioning",  # How user thinks
)

Multi-Judge Scoring

Each trajectory is evaluated by 3 independent LLM judges. Scores are aggregated via median voting (robust to outlier judges):

Judge 1: 7.2  ─┐
Judge 2: 6.8  ─┼─► Median: 7.2  ─► Pass (≥ 6.5)
Judge 3: 8.1  ─┘

Why 3 judges?

Single judge = high variance, easily gamed
Median of 3 = robust to one outlier
Majority vote determines pass/fail (2 of 3 must pass)

Scoring criteria (each 0-1, summed to 0-10):

Goal achievement
User need addressed
Forward progress
Clarity & coherence
Appropriate tone
Information accuracy
Handling objections
Building rapport
Conversation flow
Strategic effectiveness

High-Scoring Branch (9.2/10)

High-scoring branch

</td> <td width="50%">

Pruned Branch (4.1/10)

Pruned branch

</td> </tr> </table>

Left: A successful trajectory with detailed strengths. Right: A pruned branch showing weaknesses and why it failed.

Scoring Modes: Comparative vs Absolute

Scoring Mode Selection

DTS supports two evaluation modes:

| Mode | How It Works | Best For | |:-----|:-------------|:---------| | Comparative | Sibling branches force-ranked against each other | Sharp discrimination, finding the single best path | | Absolute | Each branch scored independently (0-10) | Early pruning, filtering obviously bad paths |

Comparative mode (default):

Input: [Strategy A, Strategy B, Strategy C] (siblings)
Output: A=7.5, B=6.0, C=4.5 (forced ranking with 1.5-point gaps)

Absolute mode:

Input: Strategy A (evaluated alone)
Output: 3 judges → [7.2, 6.8, 8.1] → Median: 7.2

Use scoring_mode="comparative" when you need the best single answer. Use scoring_mode="absolute" when filtering many branches quickly.

System Architecture

sequenceDiagram
    participant User
    participant FE as Frontend (HTML/JS)
    participant API as FastAPI WebSocket
    participant ENG as DTS Engine
    participant LLM as OpenRouter/OpenAI
    participant RES as Firecrawl + Tavily

    User->>FE: Configure & Start Search
    FE->>API: WebSocket Connect
    API->>ENG: Initialize DTSEngine

    opt Deep Research Enabled
        ENG->>RES: Research Query
        RES-->>ENG: Domain Context
    end

    loop For Each Round
        ENG->>LLM: Generate Strategies
        LLM-->>ENG: N Strategies

        loop For Each Branch
            ENG->>LLM: Generate User Intents
            ENG->>LLM: Simulate Conversation
            ENG->>LLM: Judge Trajectory (3x)
        end

        ENG->>API: Emit Events (node_added, scored, pruned)
        API-->>FE: Stream Updates
        FE->>User: Update Visualization
    end

    ENG->>API: Complete with Best Path
    FE->>User: Show Results

Component Overview

| Component | Location | Purpose | |:----------|:---------|:--------| | DTSEngine | backend/core/dts/engine.py | Main orchestrator, runs expand→score→prune loop | | StrategyGenerator | backend/core/dts/components/generator.py | Creates strategies and user intents | | ConversationSimulator | backend/core/dts/components/simulator.py | Runs multi-turn dialogue rollouts | | TrajectoryEvaluator | backend/core/dts/components/evaluator.py | Multi-judge scoring with median aggregation | | DeepResearcher | backend/core/dts/components/researcher.py | GPT-Researcher integration for context | | DialogueTree | backend/core/dts/tree.py | Tree data structure with backpropagation | | LLM Client | backend/llm/client.py | Provider-agnostic OpenAI-compatible wrapper |

Prerequisites & API Keys

Required Credentials

| Service | Environment Variable | Required | Purpose | |:--------|:--------------------|:---------|:--------| | LLM Provider | OPENROUTER_API_KEY | Yes | Strategy generation, simulation, and judging | | Web Scraping | FIRECRAWL_API_KEY | For Deep Research | Scrapes web pages for research context | | Web Search | TAVILY_API_KEY | For Deep Research | Searches the web for relevant sources |

Getting API Keys

OpenRouter (recommended): openrouter.ai/keys
- Works with 100+ models (GPT-4, Claude, Gemini, open-source)
- Pay-per-token, no subscriptions
- Set

Related Skills

canvas

337.4k

Canvas Skill Display HTML content on connected OpenClaw nodes (Mac app, iOS, Android). Overview The canvas tool lets you present web content on any connected node's canvas view. Great for: -

claude-opus-4-5-migration

83.2k

Migrate prompts and code from Claude Sonnet 4.0, Sonnet 4.5, or Opus 4.1 to Opus 4.5

model-usage

337.4k

Use CodexBar CLI local cost usage to summarize per-model usage for Codex or Claude, including the current (most recent) model or a full model breakdown. Trigger when asked for model-level usage/cost data from codexbar, or when you need a scriptable per-model summary from codexbar cost JSON.

TrendRadar

49.8k

⭐AI-driven public opinion & trend monitor with multi-platform aggregation, RSS, and smart alerts.🎯 告别信息过载，你的 AI 舆情监控助手与热点筛选工具！聚合多平台热点 + RSS 订阅，支持关键词精准筛选。AI 智能筛选新闻 + AI 翻译 + AI 分析简报直推手机，也支持接入 MCP 架构，赋能 AI 自然语言对话分析、情感洞察与趋势预测等。支持 Docker ，数据本地/云端自持。集成微信/飞书/钉钉/Telegram/邮件/ntfy/bark/slack 等渠道智能推送。

MVPandey

View profile

View on GitHub

GitHub Stars35

CategoryEducation

Updated2mo ago

Forks6

MVPandey/DTS

Languages

Python

Security Score

95/100

Audited on Jan 18, 2026

No findings

DTS