Scribble
Transcription server and library written in Rust
Install / Use
/learn @itsmontoya/ScribbleREADME
Scribble

Scribble is a fast, lightweight transcription engine written in Rust, with a built-in Whisper backend and a backend trait for custom implementations.

Scribble will demux/decode audio or video containers (MP4, MP3, WAV, FLAC, OGG, WebM, MKV, etc.), downmix to mono, and resample to 16 kHz — no preprocessing required.
Demo
<img src="https://github.com/itsmontoya/scribble/blob/main/demo/demo.gif?raw=true" />Project goals
- Provide a clean, idiomatic Rust API for audio transcription
- Support multiple output formats (JSON, VTT, plain text, etc.)
- Work equally well as a CLI tool or embedded library
- Be streaming-first: designed to support incremental, chunk-based transcription pipelines (live audio, long-running streams, and low-latency workflows)
- Enable composable pipelines: VAD → transcription → encoding, with clear extension points for streaming and real-time use cases
- Keep the core simple, explicit, and easy to extend
Scribble is built with streaming and real-time transcription in mind, even when operating on static files today.
Installation
Rust toolchain
Scribble targets Rust stable (tracked via rust-toolchain.toml).
Clone the repository and build the binaries:
./scripts/build-all.sh
Or build a single binary to a target directory:
./scripts/build.sh scribble-cli ./dist
This will produce the following binaries:
scribble-cli— transcribe audio/video (decodes + normalizes to mono 16 kHz)scribble-server— HTTP server for transcriptionmodel-downloader— download Whisper and VAD models
GPU acceleration (feature flags)
Scribble exposes whisper-rs GPU backend features as Cargo features. Enable the
backend you want in Cargo.toml or via --features:
[dependencies]
scribble = { version = "0.5", features = ["cuda"] }
cargo run --features "bin-scribble-cli,cuda" --bin scribble-cli -- \
--model ./models/ggml-large-v3-turbo.bin \
--input ./input.wav
Available GPU feature flags:
cuda(NVIDIA CUDA)metal(Apple Metal)hipblas(AMD ROCm)vulkan(Vulkan)coreml(Apple CoreML)
These are passthrough features; you still need the corresponding system dependencies installed for your platform. See the whisper-rs documentation for backend setup details.
model-downloader
model-downloader is a small helper CLI for downloading known-good Whisper and Whisper-VAD models into a local directory.
List available models
cargo run --features bin-model-downloader --bin model-downloader -- --list
Example output:
Whisper models:
- tiny
- base.en
- large-v3-turbo
- large-v3-turbo-q8_0
...
VAD models:
- silero-v5.1.2
- silero-v6.2.0
Download a model
cargo run --features bin-model-downloader --bin model-downloader -- --name large-v3-turbo
By default, models are downloaded into ./models.
Download into a custom directory
cargo run --features bin-model-downloader --bin model-downloader -- \
--name silero-v6.2.0 \
--dir /opt/scribble/models
Downloads are performed safely:
- written to
*.part - fsynced
- atomically renamed into place
scribble-cli
scribble-cli is the main transcription CLI.
It accepts audio or video containers and normalizes them to Whisper’s required mono 16 kHz internally. Provide:
- an input media path (e.g. MP4, MP3, WAV, FLAC, OGG, WebM, MKV) or
-to stream from stdin - a Whisper model
- a Whisper-VAD model (used when
--enable-vadis set)
Basic transcription (VTT output)
cargo run --features bin-scribble-cli --bin scribble-cli -- \
--model ./models/ggml-large-v3-turbo.bin \
--vad-model ./models/ggml-silero-v6.2.0.bin \
--input ./input.mp4
Output is written to stdout in WebVTT format by default.
Stream a live URL into scribble-cli (via ffmpeg)
If you have a live audio stream URL (MP3/AAC/etc.), you can decode it to Whisper-friendly WAV and pipe it into scribble-cli via stdin:
ffmpeg -re -loglevel error -nostats \
-i "https://stream.example.com/live.mp3?session-id=REDACTED" \
-f wav -ac 1 -ar 16000 - \
| scribble-cli \
--model ./models/ggml-tiny.bin \
--vad-model ./models/ggml-silero-v6.2.0.bin \
--enable-vad \
--input -
Stream a Twitch channel into scribble-cli (via streamlink + ffmpeg)
If you have streamlink installed, you can pull a Twitch stream to stdout and feed it through ffmpeg:
streamlink --stdout https://www.twitch.tv/dougdoug best \
| ffmpeg -hide_banner -loglevel error -i pipe:0 -vn -ac 1 -ar 16000 -f wav pipe:1 \
| scribble-cli \
--model ./models/ggml-tiny.bin \
--vad-model ./models/ggml-silero-v6.2.0.bin \
--enable-vad \
--input -
scribble-server
scribble-server is a long-running HTTP server that loads models once and accepts transcription requests over HTTP.
Start the server
cargo run --features bin-scribble-server --bin scribble-server -- \
--model ./models/ggml-large-v3-turbo.bin \
--vad-model ./models/ggml-silero-v6.2.0.bin \
--host 127.0.0.1 \
--port 8080
Transcribe via HTTP (multipart upload)
curl -sS --data-binary @./input.mp4 \
"http://127.0.0.1:8080/transcribe?output=vtt" \
> transcript.vtt
For JSON output:
curl -sS --data-binary @./input.wav \
"http://127.0.0.1:8080/transcribe?output=json" \
> transcript.json
Example using all query params:
curl -sS --data-binary @./input.mp4 \
"http://127.0.0.1:8080/transcribe?output=json&output_type=json&model_key=ggml-large-v3-turbo.bin&enable_vad=true&translate_to_english=true&language=en" \
> transcript.json
Prometheus metrics
scribble-server exposes Prometheus metrics at GET /metrics.
curl -sS "http://127.0.0.1:8080/metrics"
Key metrics:
scribble_http_requests_total(labels:status)scribble_http_request_duration_seconds(labels:status)scribble_http_in_flight_requests
Logging
All binaries emit structured JSON logs to stderr.
- Default level:
error - Override with
SCRIBBLE_LOG(e.g.SCRIBBLE_LOG=info)
JSON output
cargo run --features bin-scribble-cli --bin scribble-cli -- \
--model ./models/ggml-large-v3-turbo.bin \
--vad-model ./models/ggml-silero-v6.2.0.bin \
--input ./input.wav \
--output-type json
Enable voice activity detection (VAD)
cargo run --features bin-scribble-cli --bin scribble-cli -- \
--model ./models/ggml-large-v3-turbo.bin \
--vad-model ./models/ggml-silero-v6.2.0.bin \
--enable-vad \
--input ./input.wav
When VAD is enabled:
- non-speech regions are suppressed
- if no speech is detected, no output is produced
Specify language explicitly
cargo run --features bin-scribble-cli --bin scribble-cli -- \
--model ./models/ggml-large-v3-turbo.bin \
--vad-model ./models/ggml-silero-v6.2.0.bin \
--input ./input.wav \
--language en
If --language is omitted, Whisper will auto-detect.
Write output to a file
cargo run --features bin-scribble-cli --bin scribble-cli -- \
--model ./models/ggml-large-v3-turbo.bin \
--vad-model ./models/ggml-silero-v6.2.0.bin \
--input ./input.wav \
--output-type vtt \
> transcript.vtt
Library usage
Scribble is also designed to be embedded as a library.
High-level usage looks like:
use scribble::{Opts, OutputType, Scribble};
use std::fs::File;
let mut scribble = Scribble::new(
["./models/ggml-large-v3-turbo.bin"],
"./models/ggml-silero-v6.2.0.bin",
)?;
let mut input = File::open("audio.wav")?;
let mut output = Vec::new();
let opts = Opts {
model_key: None,
enable_translate_to_english: false,
enable_voice_activity_detection: true,
language: None,
output_type: OutputType::Json,
incremental_min_window_seconds: 1,
};
scribble.transcribe(&mut input, &mut output, &opts)?;
let json = String::from_utf8(output)?;
println!("{json}");
Goals
- [X] Make VAD streaming-capable
- [X] Support streaming and incremental transcription
- [X] Select the primary audio track in multi-track video containers
- [X] Implement a web server
- [X] Add Prometheus metrics endpoint
- [X] Add structured logs (tracing)
- [X] Expand test coverage to 80%+
Coverage
This project uses cargo-llvm-cov for coverage locally and in CI.
One-time setup:
rustup component add llvm-tools-preview
cargo install cargo-llvm-cov
Run coverage locally:
# Print a summary to stdout
cargo llvm-cov --features bin-scribble-cli,bin-model-downloader,bin-scribble-server --all-targets
# Generate an HTML report (writes to ./target/llvm-cov/html)
cargo llvm-cov --features bin-scribble-cli,bin-model-downloader,bin-scribble-server --all-targets -
