Xtalk

X-Talk is an open-source full-duplex cascaded spoken dialogue system framework enabling low-latency, interruptible, and human-like speech interaction with a lightweight, pure-Python, production-ready architecture.

Generate Convert Improve

Install / Use

/learn @xcc-zach/Xtalk

About this skill

Quality Score

0/100

README

X-Talk

⚠️ X-Talk is in active prototyping. Interfaces and functions are subject to change. We will try to keep interfaces stable.

X-Talk is an open-source full-duplex cascaded spoken dialogue system framework featuring:

⚡ Low-Latency, Interruptible, Human-Like Speech Interaction
- Speech flow is optimized to support impressive low latency
- Enables natural user interruption during interaction
- Paralinguistic information (e.g. environment noise, emotion) is encoded in parallel to support in-depth understanding and empathy
🧪 Researcher Friendly
- New models and relevant logic can be added within one Python script, and seamlessly integrated with the default pipeline.
🧩 Super Lightweight
- The framework backend is pure Python; nothing to build and install beyond pip install.
🏭 Production Ready
- Concurrency is ensured through asynchronous backend
- Websocket-based implementation empowers deployment from web browsers to edge devices.

🎬 Demo

Online Demo

Demo Link

This demo runs on 4090 cluster with 8-bit quantized SenseVoice as speech recognizer, IndexTTS 1.5 as speech generator, and 4-bit quantized Qwen3-30B-A3B as language model. Though at the cost of intelligence due to a relatively small language model, it demonstrates low latency.

Demo Videos

The tour guiding demos are conducted with Qwen3-Next-80B-A3B-Instruct as language model, and the other eight demos are aligned with the online demo setting. Larger language models are more intelligent at the cost of latency.

🛠️ Installation

pip install git+https://github.com/xcc-zach/xtalk.git@main

🚀 Quickstart

We will use APIs from AliCloud to demonstrate the basic capability of X-Talk.

First, install dependencies for AliCloud and server script:

pip install "xtalk[ali] @ git+https://github.com/xcc-zach/xtalk.git@main"
pip install jinja2 python-multipart 'uvicorn[standard]'

Then, obtain an API key from AliCloud Bailian Platform. We will be using free-tier service from AliCloud.

Online service may be unstable and of high latency. We recommend using locally deployed models for better user experience. See server config tutorial and supported models for details.

After that, create a JSON config specifying the models to use, and fill in <API_KEY> with the key you obtained:

{
    "asr": {
        "type": "Qwen3ASRFlashRealtime",
        "params": {
            "api_key": "<API_KEY>"
        }
    },
    "llm_agent": {
        "type": "DefaultAgent",
        "params": {
            "model": {
                "api_key": "<API_KEY>",
                "model": "qwen-plus-2025-12-01",
                "base_url": "https://dashscope.aliyuncs.com/compatible-mode/v1"
            }
        }
    },
    "tts": {
        "type": "CosyVoice",
        "params": {
            "api_key": "<API_KEY>"
        }
    }
}

If you find Qwen3ASRFlashRealtime not working properly, you can use "asr": "SenseVoiceSmallLocal", instead which is a ~1GB local model. Also, you can try to use local speech generation model IndexTTS (setup tutorial):
"tts": {
    "type": "IndexTTS",
    "params": {
        "port": 6006
    }
},
If you want all models deployed locally, see here.

The next step is to compose the startup script. Since we also need to link frontend webpage and scripts to get the demo working, the startup script is ready at examples/sample_app/configurable_server.py. We simply need to start the server with the config file (fill in <PATH_TO_CONFIG>.json with the path to the config file we just created) and a custom port:

git clone https://github.com/xcc-zach/xtalk.git
cd xtalk
python examples/sample_app/configurable_server.py  --port 7635 --config <PATH_TO_CONFIG>.json

Finally, our demo is ready at http://localhost:7635. View it in the browser!

📖 Tutorial

Start the Server

[!NOTE] See examples/sample_app/configurable_server.py, frontend/src, examples/sample_app/templates and examples/sample_app/static for details.

X-Talk has most models and execution on server side, and the client is responsible for interacting with microphone, transmitting audio and Websocket messages, and handle lightweight operations like Voice-Actitvty-Detection.

For client side, you can start with snippet in examples\sample_app\static\js\index.js and track where convo is used to see how to use the client API:

async function loadXtalk() {
    try {
        return await import("../../xtalk/index.js"); // Try local import first, this is dev only
    } catch (e) {
        return await import("https://unpkg.com/xtalk-client@latest/dist/index.js"); // Use unpkg CDN for production
    }
}

const { createConversation } = await loadXtalk();


function getWebSocketURL() {
    const proto = location.protocol === "https:" ? "wss:" : "ws:";
    const wsPath = new URL("./ws", window.location.href);
    wsPath.protocol = proto;
    wsPath.host = window.location.host;
    return wsPath
}

const convo = createConversation(getWebSocketURL());

We recently published the client API as a separate package xtalk-client. Therefore, you can directly import it from https://unpkg.com/xtalk-client@latest/dist/index.js without hosting the client code by yourself, as shown above. We plan to continuously improve the client-side API in the future.

For the server side, the core logic is to connect a X-Talk instance to Websocket of FastAPI instance:

from fastapi import FastAPI, WebSocket
from xtalk import Xtalk
app = FastAPI(title="Xtalk Server")
xtalk_instance = Xtalk.from_config("path/to/config.json")
@app.websocket("/ws")
async def websocket_endpoint(websocket: WebSocket):
    await xtalk_instance.connect(websocket)

Then you can check examples/sample_app/configurable_server.py for how to mount client-side scripts and pages.

Text Embedding

[!NOTE] See examples/sample_app/configurable_server.py and `frontend/src/js/index.j

Related Skills

canvas

348.5k

Canvas Skill Display HTML content on connected OpenClaw nodes (Mac app, iOS, Android). Overview The canvas tool lets you present web content on any connected node's canvas view. Great for: -

node-connect

348.5k

Diagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps

claude-opus-4-5-migration

109.1k

Migrate prompts and code from Claude Sonnet 4.0, Sonnet 4.5, or Opus 4.1 to Opus 4.5

frontend-design

109.1k

Create distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.

xcc-zach

View profile

View on GitHub

GitHub Stars189

CategoryDevelopment

Updated4d ago

Forks19

xcc-zach/xtalk

Languages

Python

Security Score

85/100

Audited on Apr 1, 2026

No findings