CAPTION.Ninja

A free-to-use captioning, transcription, and real-time translation tool for live streams, presentations, and more.

Demo video: https://www.youtube.com/watch?v=v7172QO8z6c

Quick Start Guide

Open https://caption.ninja in a supported browser (Chrome or Edge recommended)
Accept microphone permissions when prompted
Start speaking - your words will be transcribed automatically
Access the overlay URL (provided on the page) to display captions in OBS or other streaming software

How It Works

CAPTION.Ninja leverages your browser's built-in speech recognition capabilities to perform real-time transcription:

Your browser captures audio from your default microphone (or virtual audio device)
Browser-based speech recognition converts the audio to text
The text is sent through a websocket server to any connected overlay pages
Overlay pages display the text with customizable formatting

The application runs entirely in your browser - no software installation required. Speech-to-text processing is handled by Google's speech recognition services (through the browser), while optional translation features use either Mozilla's free translation service or Google Cloud Translation API.

Browser Compatibility

For best results, use Google Chrome or Microsoft Edge. These browsers provide the most reliable speech recognition services.

Important Note: Firefox does not currently include free speech-to-text capabilities, making it unsuitable for the main transcription page. However, Firefox can still be used for displaying the overlay page.

Some users report Chrome has issues with text truncation, so Edge may provide more consistent results.

Setting Up for Streaming

Basic Setup

Open CAPTION.Ninja in Chrome/Edge and allow microphone access
Copy the overlay URL provided on the page
Add the overlay URL as a Browser Source in OBS Studio, vMix, or similar software
Customize the appearance using CSS as needed (see customization section below)

Using with Electron Capture

For desktop applications that need captions overlay, use the Electron Capture app: https://github.com/steveseguin/electroncapture

This allows you to pin the captions on top of other applications on your desktop.

Using Non-Microphone Audio Sources

CAPTION.Ninja uses your system's default recording device. To capture audio from other sources:

Virtual Audio Cable Method

Using a virtual audio cable allows you to route audio from any application to CAPTION.Ninja:

Install a virtual audio cable solution like VB-Audio Cable
Set the virtual cable as your default recording device in your system sound settings
Route audio from your desired source (media player, streaming site, etc.) to the virtual cable
CAPTION.Ninja will now transcribe audio from any application sending to the virtual cable

This technique works for:

YouTube or Twitch live streams
Audio from video files
System sounds
Audio from other applications like Zoom or Teams
Game audio

The virtual audio cable acts as a bridge between your audio sources and CAPTION.Ninja, effectively turning any audio into captions.

Translation Features

CAPTION.Ninja offers multiple ways to translate content:

Method 1: Dedicated Translation Page

Use https://caption.ninja/translate for real-time translation capabilities:

Select source and target languages from the dropdown menus
Browser-based transcription + Mozilla's free translation service (17 languages)
Optional Google Cloud Translation integration for premium results (100+ languages)
Works with the same overlay system

Free translation languages supported: Bulgarian (bg), Czech (cs), Dutch (nl), English (en), Estonian (et), German (de), French (fr), Icelandic (is), Italian (it), Norwegian Bokmål (nb), Norwegian Nynorsk (nn), Persian (fa), Polish (pl), Portuguese (pt), Russian (ru), Spanish (es), Ukrainian (uk)

Method 2: Multiple Language Outputs from Single Source

A more efficient approach for multiple language support:

Use the standard capture page (index.html) with your preferred input language
Create multiple overlay pages with different target languages by adding the &translate=XX parameter
Share these overlay URLs with viewers who need different languages

Example:

Main Capture: https://caption.ninja/?room=abc123&lang=en-US
English Overlay: https://caption.ninja/overlay?room=abc123
Spanish Overlay: https://caption.ninja/overlay?room=abc123&translate=es
French Overlay: https://caption.ninja/overlay?room=abc123&translate=fr
German Overlay: https://caption.ninja/overlay?room=abc123&translate=de

Benefits of this approach:

Single transcription source with multiple translation outputs
No need to run multiple browser tabs for different languages
Lower resource usage on the broadcasting computer
Viewers select their preferred language by accessing the appropriate URL
Translation processing happens in the viewer's browser

Note: The translation quality using this method relies on the viewer's browser capabilities and may vary compared to the dedicated translation page.

Method 2.5: Show Translation + Transcript in One Overlay

Prefer one browser source that shows both languages? Add &dual=1 (or &view=dual) to any translated overlay URL. The translated line renders first and the original transcript appears beneath it in a compact style, so the overlay stays roughly the same height.

https://caption.ninja/overlay?room=abc123&translate=ja&googlekey=YOUR_API_KEY&dual=1

&clear, &showtime, &maxlines, and other history flags still apply to the combined block, and TTS continues speaking the translated text while the original is just displayed.

Method 3: Premium/Remote Translation in Overlay

For professional-quality translation with 100+ language support, you can use Google Cloud or other remote providers directly in the overlay:

https://caption.ninja/overlay?room=abc123&translate=ja&googlekey=YOUR_API_KEY

Features:

Context-aware translation: Add &context=1 for better accuracy in conversations
Adjustable context size: Use &contextsize=5 to include more previous messages
Force local translation: Add &forcelocal=1 to use Mozilla even with API key
Override source language: Use &fromlang=es if auto-detection isn't working

Example with all features:

https://caption.ninja/overlay?room=abc123&translate=ko&googlekey=KEY&context=1&contextsize=3

This provides professional-grade translation quality while maintaining the simple overlay system.

Example (OpenAI-compatible provider):

https://caption.ninja/overlay?room=abc123&translate=ja&tprovider=openai&tmodel=gpt-4o-mini&tkey=YOUR_API_KEY

Translation Parameters Reference

| Parameter | Description | Example | |-----------|-------------|---------| | translate=XX or lang=XX or ln=XX | Target translation language | &translate=es | | fromlang=XX | Override source language detection | &fromlang=en | | googlekey=KEY or gkey=KEY | Google Cloud Translation API key | &googlekey=YOUR_KEY | | tprovider=google|openai|anthropic|ollama|local | Select remote/local translation provider | &tprovider=openai | | tkey=KEY or translatekey=KEY | API key for non-Google remote providers | &tkey=YOUR_KEY | | turl=URL | Custom API base URL for proxy/self-hosted providers | &turl=http://127.0.0.1:11434/v1 | | tmodel=MODEL | Model id for OpenAI-compatible/Anthropic/Ollama | &tmodel=gpt-4o-mini | | context=1 | Enable context-aware translation | &context=1 | | contextsize=N | Number of previous messages for context (default: 2) | &contextsize=5 | | forcelocal=1 | Force Mozilla translation even with API key | &forcelocal=1 |

For a comprehensive guide to all translation features, visit: https://caption.ninja/translation-guide.html

TTS Integration

Caption.Ninja can read captions aloud using browser/system TTS or the tts.rocks engine (Kokoro, Piper, ElevenLabs, Google, OpenAI, etc.). Enable it via URL parameters; nothing changes by default.

Overlay readout: overlay.html?room=abc123&tts=en-US
- Built‑in providers: &ttsprovider=google&ttskey=YOUR_KEY, &ttsprovider=elevenlabs&elevenlabskey=KEY&voice11=VOICE_ID
- Use tts.rocks engine: &ttslib=rocks&ttsprovider=kokoro&voicekokoro=af_aoede&korospeed=1.0
- Optional interim streaming: &ttsstream=1
Capture readout: index.html?room=abc123&lang=en-US&tts=en-US
Manual readout: manual.html?room=abc123&tts=en-US
Pop‑out TTS window (tts.rocks bridge): add &ttspopout=1 to auto‑open, or go directly:
- tts.rocks/caption-bridge.html?room=abc123&tts=en-US&ttsprovider=kokoro

Quick discovery (GUI):

tts.rocks homepage now includes a “Use with Caption.Ninja” panel to generate ready‑to‑use links (Overlay, Capture, Manual, Bridge) based on your chosen engine, keys, voices, and rates.
Voice picker and URL builder: tts.rocks/tts.html lists local voices and generates example URLs.

Security note: API keys in URLs are visible to anyone with the link. Prefer local/native providers when possible, or only share overlays that do not embed keys.

Language Support

Default language is &lang=en-US. Change the language by adding a language code parameter.

Supported language codes: https://cloud.google.com/speech-to-text/docs/languages

Manual Text Entry Mode

For situations where automatic transcription isn't ideal, use manual text entry: https://caption.ninja/manual.html

This lets you type captions directly, which appear on the same overlay system.

Captionninja

Install / Use

README