Scribble

Scribble is a fast, lightweight transcription engine written in Rust, with a built-in Whisper backend and a backend trait for custom implementations.

Scribble will demux/decode audio or video containers (MP4, MP3, WAV, FLAC, OGG, WebM, MKV, etc.), downmix to mono, and resample to 16 kHz — no preprocessing required.

Demo

Project goals

Provide a clean, idiomatic Rust API for audio transcription
Support multiple output formats (JSON, VTT, plain text, etc.)
Work equally well as a CLI tool or embedded library
Be streaming-first: designed to support incremental, chunk-based transcription pipelines (live audio, long-running streams, and low-latency workflows)
Enable composable pipelines: VAD → transcription → encoding, with clear extension points for streaming and real-time use cases
Keep the core simple, explicit, and easy to extend

Scribble is built with streaming and real-time transcription in mind, even when operating on static files today.

Installation

Rust toolchain

Scribble targets Rust stable (tracked via rust-toolchain.toml).

Clone the repository and build the binaries:

./scripts/build-all.sh

Or build a single binary to a target directory:

./scripts/build.sh scribble-cli ./dist

This will produce the following binaries:

scribble-cli — transcribe audio/video (decodes + normalizes to mono 16 kHz)
scribble-server — HTTP server for transcription
model-downloader — download Whisper and VAD models

GPU acceleration (feature flags)

Scribble exposes whisper-rs GPU backend features as Cargo features. Enable the backend you want in Cargo.toml or via --features:

[dependencies]
scribble = { version = "0.5", features = ["cuda"] }

cargo run --features "bin-scribble-cli,cuda" --bin scribble-cli -- \
  --model ./models/ggml-large-v3-turbo.bin \
  --input ./input.wav

Available GPU feature flags:

cuda (NVIDIA CUDA)
metal (Apple Metal)
hipblas (AMD ROCm)
vulkan (Vulkan)
coreml (Apple CoreML)

These are passthrough features; you still need the corresponding system dependencies installed for your platform. See the whisper-rs documentation for backend setup details.

model-downloader

model-downloader is a small helper CLI for downloading known-good Whisper and Whisper-VAD models into a local directory.

List available models

cargo run --features bin-model-downloader --bin model-downloader -- --list

Example output:

Whisper models:
  - tiny
  - base.en
  - large-v3-turbo
  - large-v3-turbo-q8_0
  ...

VAD models:
  - silero-v5.1.2
  - silero-v6.2.0

Download a model

cargo run --features bin-model-downloader --bin model-downloader -- --name large-v3-turbo

By default, models are downloaded into ./models.

Download into a custom directory

cargo run --features bin-model-downloader --bin model-downloader -- \
  --name silero-v6.2.0 \
  --dir /opt/scribble/models

Downloads are performed safely:

written to *.part
fsynced
atomically renamed into place

scribble-cli

scribble-cli is the main transcription CLI.

It accepts audio or video containers and normalizes them to Whisper’s required mono 16 kHz internally. Provide:

an input media path (e.g. MP4, MP3, WAV, FLAC, OGG, WebM, MKV) or - to stream from stdin
a Whisper model
a Whisper-VAD model (used when --enable-vad is set)

Basic transcription (VTT output)

cargo run --features bin-scribble-cli --bin scribble-cli -- \
  --model ./models/ggml-large-v3-turbo.bin \
  --vad-model ./models/ggml-silero-v6.2.0.bin \
  --input ./input.mp4

Output is written to stdout in WebVTT format by default.

Stream a live URL into `scribble-cli` (via `ffmpeg`)

If you have a live audio stream URL (MP3/AAC/etc.), you can decode it to Whisper-friendly WAV and pipe it into scribble-cli via stdin:

ffmpeg -re -loglevel error -nostats \
  -i "https://stream.example.com/live.mp3?session-id=REDACTED" \
  -f wav -ac 1 -ar 16000 - \
| scribble-cli \
    --model ./models/ggml-tiny.bin \
    --vad-model ./models/ggml-silero-v6.2.0.bin \
    --enable-vad \
    --input -

Stream a Twitch channel into `scribble-cli` (via `streamlink` + `ffmpeg`)

If you have streamlink installed, you can pull a Twitch stream to stdout and feed it through ffmpeg:

streamlink --stdout https://www.twitch.tv/dougdoug best \
| ffmpeg -hide_banner -loglevel error -i pipe:0 -vn -ac 1 -ar 16000 -f wav pipe:1 \
| scribble-cli \
    --model ./models/ggml-tiny.bin \
    --vad-model ./models/ggml-silero-v6.2.0.bin \
    --enable-vad \
    --input -

scribble-server

scribble-server is a long-running HTTP server that loads models once and accepts transcription requests over HTTP.

Start the server

cargo run --features bin-scribble-server --bin scribble-server -- \
  --model ./models/ggml-large-v3-turbo.bin \
  --vad-model ./models/ggml-silero-v6.2.0.bin \
  --host 127.0.0.1 \
  --port 8080

Transcribe via HTTP (multipart upload)

curl -sS --data-binary @./input.mp4 \
  "http://127.0.0.1:8080/transcribe?output=vtt" \
  > transcript.vtt

For JSON output:

curl -sS --data-binary @./input.wav \
  "http://127.0.0.1:8080/transcribe?output=json" \
  > transcript.json

Example using all query params:

curl -sS --data-binary @./input.mp4 \
  "http://127.0.0.1:8080/transcribe?output=json&output_type=json&model_key=ggml-large-v3-turbo.bin&enable_vad=true&translate_to_english=true&language=en" \
  > transcript.json

Prometheus metrics

scribble-server exposes Prometheus metrics at GET /metrics.

curl -sS "http://127.0.0.1:8080/metrics"

Key metrics:

scribble_http_requests_total (labels: status)
scribble_http_request_duration_seconds (labels: status)
scribble_http_in_flight_requests

Logging

All binaries emit structured JSON logs to stderr.

Default level: error
Override with SCRIBBLE_LOG (e.g. SCRIBBLE_LOG=info)

JSON output

cargo run --features bin-scribble-cli --bin scribble-cli -- \
  --model ./models/ggml-large-v3-turbo.bin \
  --vad-model ./models/ggml-silero-v6.2.0.bin \
  --input ./input.wav \
  --output-type json

Enable voice activity detection (VAD)

cargo run --features bin-scribble-cli --bin scribble-cli -- \
  --model ./models/ggml-large-v3-turbo.bin \
  --vad-model ./models/ggml-silero-v6.2.0.bin \
  --enable-vad \
  --input ./input.wav

When VAD is enabled:

non-speech regions are suppressed
if no speech is detected, no output is produced

Specify language explicitly

cargo run --features bin-scribble-cli --bin scribble-cli -- \
  --model ./models/ggml-large-v3-turbo.bin \
  --vad-model ./models/ggml-silero-v6.2.0.bin \
  --input ./input.wav \
  --language en

If --language is omitted, Whisper will auto-detect.

Write output to a file

cargo run --features bin-scribble-cli --bin scribble-cli -- \
  --model ./models/ggml-large-v3-turbo.bin \
  --vad-model ./models/ggml-silero-v6.2.0.bin \
  --input ./input.wav \
  --output-type vtt \
  > transcript.vtt

Library usage

Scribble is also designed to be embedded as a library.

High-level usage looks like:

use scribble::{Opts, OutputType, Scribble};
use std::fs::File;

let mut scribble = Scribble::new(
    ["./models/ggml-large-v3-turbo.bin"],
    "./models/ggml-silero-v6.2.0.bin",
)?;

let mut input = File::open("audio.wav")?;
let mut output = Vec::new();

let opts = Opts {
    model_key: None,
    enable_translate_to_english: false,
    enable_voice_activity_detection: true,
    language: None,
    output_type: OutputType::Json,
    incremental_min_window_seconds: 1,
};

scribble.transcribe(&mut input, &mut output, &opts)?;

let json = String::from_utf8(output)?;
println!("{json}");

Goals

[X] Make VAD streaming-capable
[X] Support streaming and incremental transcription
[X] Select the primary audio track in multi-track video containers
[X] Implement a web server
[X] Add Prometheus metrics endpoint
[X] Add structured logs (tracing)
[X] Expand test coverage to 80%+

Coverage

This project uses cargo-llvm-cov for coverage locally and in CI.

One-time setup:

rustup component add llvm-tools-preview
cargo install cargo-llvm-cov

Run coverage locally:

# Print a summary to stdout
cargo llvm-cov --features bin-scribble-cli,bin-model-downloader,bin-scribble-server --all-targets

# Generate an HTML report (writes to ./target/llvm-cov/html)
cargo llvm-cov --features bin-scribble-cli,bin-model-downloader,bin-scribble-server --all-targets -

Scribble

Install / Use

README

Scribble

Demo

Project goals

Installation

Rust toolchain

GPU acceleration (feature flags)

model-downloader

List available models

Download a model

Download into a custom directory

scribble-cli

Basic transcription (VTT output)

Stream a live URL into `scribble-cli` (via `ffmpeg`)

Stream a Twitch channel into `scribble-cli` (via `streamlink` + `ffmpeg`)

scribble-server

Start the server

Transcribe via HTTP (multipart upload)

Prometheus metrics

Logging

JSON output

Enable voice activity detection (VAD)

Specify language explicitly

Write output to a file

Library usage

Goals

Coverage

Scribble

Install / Use

README

Scribble

Demo

Project goals

Installation

Rust toolchain

GPU acceleration (feature flags)

model-downloader

List available models

Download a model

Download into a custom directory

scribble-cli

Basic transcription (VTT output)

Stream a live URL into scribble-cli (via ffmpeg)

Stream a Twitch channel into scribble-cli (via streamlink + ffmpeg)

scribble-server

Start the server

Transcribe via HTTP (multipart upload)

Prometheus metrics

Logging

JSON output

Enable voice activity detection (VAD)

Specify language explicitly

Write output to a file

Library usage

Goals

Coverage

Stream a live URL into `scribble-cli` (via `ffmpeg`)

Stream a Twitch channel into `scribble-cli` (via `streamlink` + `ffmpeg`)