SkillAgentSearch skills...

DTS

🌳 MCTS-inspired parallel beam search for conversation optimization. Explore multiple dialogue strategies simultaneously, stress-test against diverse user personas, score with multi-judge consensus, and discover winning conversation paths that single-shot LLMs miss.

Install / Use

/learn @MVPandey/DTS

README

Dialogue Tree Search (DTS)

Python 3.11+ License Ruff uv

An LLM-powered tree search engine for multi-turn conversation optimization.

DTS explores conversation strategies in parallel, simulates diverse user reactions, scores trajectories with multi-judge consensus, and prunes underperformers—finding optimal dialogue paths that single-shot LLM responses miss.

DTS Visualizer Real-time tree exploration with strategy scoring, conversation playback, and detailed evaluation breakdowns


Table of Contents


Why DTS?

Standard LLMs generate responses one turn at a time, optimizing locally without considering long-term conversation outcomes. This leads to:

  • Myopic responses that sound good but lead to dead ends
  • Single-path thinking that misses better strategic approaches
  • Fragile strategies that fail when users respond unexpectedly

DTS solves this by treating conversation as a tree search problem:

  1. Explore multiple strategies in parallel (not just one response)
  2. Simulate diverse user reactions (skeptical, enthusiastic, confused, etc.)
  3. Score complete trajectories against your goal
  4. Prune bad paths early to focus computation on promising directions

The result: dialogue strategies that are robust, goal-oriented, and tested against varied user behaviors.


How It Works

The Algorithm

DTS implements a parallel beam search with the following loop:

For each round:
    1. Generate N diverse conversation strategies
    2. For each strategy, simulate K user intent variants
    3. Roll out multi-turn conversations for each branch
    4. Score all trajectories with 3 independent judges
    5. Prune branches below threshold (median vote)
    6. Backpropagate scores up the tree
    7. Repeat with surviving branches

Parallel Beam Search

Unlike traditional single-path generation, DTS maintains multiple conversation branches simultaneously:

graph TD
    subgraph Round 1
        Root[User Message] --> S1[Strategy: Empathetic]
        Root --> S2[Strategy: Direct]
        Root --> S3[Strategy: Socratic]
    end

    subgraph Round 2
        S1 --> S1I1[Intent: Cooperative]
        S1 --> S1I2[Intent: Skeptical]
        S2 --> S2I1[Intent: Cooperative]
        S2 --> S2I2[Intent: Resistant]
    end

    subgraph Scoring
        S1I1 --> J1((Judge 1))
        S1I1 --> J2((Judge 2))
        S1I1 --> J3((Judge 3))
        J1 & J2 & J3 --> M{Median Vote}
    end

    M -->|Score ≥ 6.5| Keep[Keep Branch]
    M -->|Score < 6.5| Prune[Prune Branch]

Tree Visualization Branches are color-coded by score: green (passing), yellow (borderline), red (pruned)

Key parameters:

  • init_branches: Number of initial strategies (default: 6)
  • turns_per_branch: Conversation depth per branch (default: 5)
  • max_concurrency: Parallel LLM calls (default: 16)

User Intent Forking (Optional)

Most dialogue systems assume a single "happy path" user response. DTS can stress-test strategies against diverse user personas when enabled.

User Variability Mode:

  • user_variability=False (default): Uses a fixed "healthily critical + engaged" persona for consistent, realistic testing
  • user_variability=True: Generates diverse user intents for robustness testing across user types

When variability is enabled, possible user personas include:

| Emotional Tone | Cognitive Stance | Example Behavior | |:---------------|:-----------------|:-----------------| | engaged | accepting | Cooperative, follows suggestions | | skeptical | questioning | Asks for evidence, challenges claims | | confused | exploring | Needs clarification, misunderstands | | resistant | challenging | Pushes back, disagrees | | anxious | withdrawing | Hesitant, wants to end conversation |

Each strategy can fork into K intent variants (configurable via user_intents_per_branch), creating branches that prove robustness across user types.

UserIntent structure:

UserIntent(
    id="skeptical_questioner",
    label="Skeptical Questioner",
    description="Demands evidence before accepting claims",
    emotional_tone="skeptical",      # How user feels
    cognitive_stance="questioning",  # How user thinks
)

Multi-Judge Scoring

Each trajectory is evaluated by 3 independent LLM judges. Scores are aggregated via median voting (robust to outlier judges):

Judge 1: 7.2  ─┐
Judge 2: 6.8  ─┼─► Median: 7.2  ─► Pass (≥ 6.5)
Judge 3: 8.1  ─┘

Why 3 judges?

  • Single judge = high variance, easily gamed
  • Median of 3 = robust to one outlier
  • Majority vote determines pass/fail (2 of 3 must pass)

Scoring criteria (each 0-1, summed to 0-10):

  • Goal achievement
  • User need addressed
  • Forward progress
  • Clarity & coherence
  • Appropriate tone
  • Information accuracy
  • Handling objections
  • Building rapport
  • Conversation flow
  • Strategic effectiveness
<table> <tr> <td width="50%">

High-Scoring Branch (9.2/10)

High-scoring branch

</td> <td width="50%">

Pruned Branch (4.1/10)

Pruned branch

</td> </tr> </table>

Left: A successful trajectory with detailed strengths. Right: A pruned branch showing weaknesses and why it failed.

Scoring Modes: Comparative vs Absolute

Scoring Mode Selection

DTS supports two evaluation modes:

| Mode | How It Works | Best For | |:-----|:-------------|:---------| | Comparative | Sibling branches force-ranked against each other | Sharp discrimination, finding the single best path | | Absolute | Each branch scored independently (0-10) | Early pruning, filtering obviously bad paths |

Comparative mode (default):

Input: [Strategy A, Strategy B, Strategy C] (siblings)
Output: A=7.5, B=6.0, C=4.5 (forced ranking with 1.5-point gaps)

Absolute mode:

Input: Strategy A (evaluated alone)
Output: 3 judges → [7.2, 6.8, 8.1] → Median: 7.2

Use scoring_mode="comparative" when you need the best single answer. Use scoring_mode="absolute" when filtering many branches quickly.


System Architecture

sequenceDiagram
    participant User
    participant FE as Frontend (HTML/JS)
    participant API as FastAPI WebSocket
    participant ENG as DTS Engine
    participant LLM as OpenRouter/OpenAI
    participant RES as Firecrawl + Tavily

    User->>FE: Configure & Start Search
    FE->>API: WebSocket Connect
    API->>ENG: Initialize DTSEngine

    opt Deep Research Enabled
        ENG->>RES: Research Query
        RES-->>ENG: Domain Context
    end

    loop For Each Round
        ENG->>LLM: Generate Strategies
        LLM-->>ENG: N Strategies

        loop For Each Branch
            ENG->>LLM: Generate User Intents
            ENG->>LLM: Simulate Conversation
            ENG->>LLM: Judge Trajectory (3x)
        end

        ENG->>API: Emit Events (node_added, scored, pruned)
        API-->>FE: Stream Updates
        FE->>User: Update Visualization
    end

    ENG->>API: Complete with Best Path
    FE->>User: Show Results

Component Overview

| Component | Location | Purpose | |:----------|:---------|:--------| | DTSEngine | backend/core/dts/engine.py | Main orchestrator, runs expand→score→prune loop | | StrategyGenerator | backend/core/dts/components/generator.py | Creates strategies and user intents | | ConversationSimulator | backend/core/dts/components/simulator.py | Runs multi-turn dialogue rollouts | | TrajectoryEvaluator | backend/core/dts/components/evaluator.py | Multi-judge scoring with median aggregation | | DeepResearcher | backend/core/dts/components/researcher.py | GPT-Researcher integration for context | | DialogueTree | backend/core/dts/tree.py | Tree data structure with backpropagation | | LLM Client | backend/llm/client.py | Provider-agnostic OpenAI-compatible wrapper |


Prerequisites & API Keys

Required Credentials

| Service | Environment Variable | Required | Purpose | |:--------|:--------------------|:---------|:--------| | LLM Provider | OPENROUTER_API_KEY | Yes | Strategy generation, simulation, and judging | | Web Scraping | FIRECRAWL_API_KEY | For Deep Research | Scrapes web pages for research context | | Web Search | TAVILY_API_KEY | For Deep Research | Searches the web for relevant sources |

Getting API Keys

  1. OpenRouter (recommended): openrouter.ai/keys
    • Works with 100+ models (GPT-4, Claude, Gemini, open-source)
    • Pay-per-token, no subscriptions
    • Set

Related Skills

View on GitHub
GitHub Stars35
CategoryEducation
Updated2mo ago
Forks6

Languages

Python

Security Score

95/100

Audited on Jan 18, 2026

No findings