Quillman

A voice chat app

Generate Convert Improve

Install / Use

/learn @modal-labs/Quillman

About this skill

Quality Score

0/100

README

QuiLLMan: Voice Chat with Moshi

A complete voice chat app powered by a speech-to-speech language model and bidirectional streaming.

On the backend is Kyutai Lab's Moshi model, which will continuously listen, plan, and respond to the user. It uses the Mimi streaming encoder/decoder model to maintain an unbroken stream of audio in and out, and a speech-text foundation model to determine when and how to respond.

Thanks to bidirectional websocket streaming and use of the Opus audio codec for compressing audio across the network, response times on good internet can be nearly instantaneous, closely matching the cadence of human speech.

You can find the demo live here.

Quillman

This repo is meant to serve as a starting point for your own language model-based apps, as well as a playground for experimentation. Contributions are welcome and encouraged!

[Note: this code is provided for illustration only; please remember to check the license before using any model for commercial purposes.]

File structure

React frontend (src/frontend/), served by src/app.py
Moshi websocket server (src/moshi.py)

Developing locally

Requirements

modal installed in your current Python virtual environment (pip install modal)
A Modal account (modal setup)
A Modal token set up in your environment (modal token new)

Developing the inference module

The Moshi server is a Modal class module to load the models and maintain streaming state, with a FastAPI http server to expose a websocket interface over the internet.

To run a development server for the Moshi module, run this command from the root of the repo.

modal serve -m src.moshi

In the terminal output, you'll find a URL for creating a websocket connection.

While the modal serve process is running, changes to any of the project files will be automatically applied. Ctrl+C will stop the app.

Testing the websocket connection

From a seperate terminal, we can test the websocket connection directly from the command line with the tests/moshi_client.py client.

It requires non-standard dependencies, which can be installed with:

python -m venv venv
source venv/bin/activate
pip install -r requirements/requirements-dev.txt

With dependencies installed, run the terminal client with:

python tests/moshi_client.py

And begin speaking! Be sure to have your microphone and speakers enabled.

Developing the http server and frontend

The http server at src/app.py is a second FastAPI app, for serving the frontend as static files.

A development server can be run with:

modal serve src.app

Since src/app.py imports the src/moshi.py module, this also starts the Moshi websocket server.

In the terminal output, you'll find a URL that you can visit to use your app. While the modal serve process is running, changes to any of the project files will be automatically applied. Ctrl+C will stop the app.

Note that for frontend changes, the browser cache may need to be cleared.

Deploying to Modal

Once you're happy with your changes, deploy your app:

modal deploy src.app

This will deploy both the frontend server and the Moshi websocket server.

Note that leaving the app deployed on Modal doesn't cost you anything! Modal apps are serverless and scale to 0 when not in use.

Related Skills

node-connect

342.0k

Diagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps

claude-opus-4-5-migration

84.7k

Migrate prompts and code from Claude Sonnet 4.0, Sonnet 4.5, or Opus 4.1 to Opus 4.5

frontend-design

84.7k

Create distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.

model-usage

342.0k

Use CodexBar CLI local cost usage to summarize per-model usage for Codex or Claude, including the current (most recent) model or a full model breakdown. Trigger when asked for model-level usage/cost data from codexbar, or when you need a scriptable per-model summary from codexbar cost JSON.

modal-labs

View profile

View on GitHub

GitHub Stars1.2k

CategoryDevelopment

Updated3d ago

Forks156

modal-labs/quillman

Languages

Python

Security Score

95/100

Audited on Mar 27, 2026

No findings