Captionninja
Closed-captioning transcription/translation tool that generates text overlays using real-time ML.
Install / Use
/learn @steveseguin/CaptionninjaREADME
CAPTION.Ninja
A free-to-use captioning, transcription, and real-time translation tool for live streams, presentations, and more.
Demo video: https://www.youtube.com/watch?v=v7172QO8z6c

Quick Start Guide
- Open https://caption.ninja in a supported browser (Chrome or Edge recommended)
- Accept microphone permissions when prompted
- Start speaking - your words will be transcribed automatically
- Access the overlay URL (provided on the page) to display captions in OBS or other streaming software
How It Works
CAPTION.Ninja leverages your browser's built-in speech recognition capabilities to perform real-time transcription:
- Your browser captures audio from your default microphone (or virtual audio device)
- Browser-based speech recognition converts the audio to text
- The text is sent through a websocket server to any connected overlay pages
- Overlay pages display the text with customizable formatting
The application runs entirely in your browser - no software installation required. Speech-to-text processing is handled by Google's speech recognition services (through the browser), while optional translation features use either Mozilla's free translation service or Google Cloud Translation API.
Browser Compatibility
For best results, use Google Chrome or Microsoft Edge. These browsers provide the most reliable speech recognition services.
Important Note: Firefox does not currently include free speech-to-text capabilities, making it unsuitable for the main transcription page. However, Firefox can still be used for displaying the overlay page.
Some users report Chrome has issues with text truncation, so Edge may provide more consistent results.
Setting Up for Streaming
Basic Setup
- Open CAPTION.Ninja in Chrome/Edge and allow microphone access
- Copy the overlay URL provided on the page
- Add the overlay URL as a Browser Source in OBS Studio, vMix, or similar software
- Customize the appearance using CSS as needed (see customization section below)
Using with Electron Capture
For desktop applications that need captions overlay, use the Electron Capture app: https://github.com/steveseguin/electroncapture
This allows you to pin the captions on top of other applications on your desktop.
Using Non-Microphone Audio Sources
CAPTION.Ninja uses your system's default recording device. To capture audio from other sources:
Virtual Audio Cable Method
Using a virtual audio cable allows you to route audio from any application to CAPTION.Ninja:
- Install a virtual audio cable solution like VB-Audio Cable
- Set the virtual cable as your default recording device in your system sound settings
- Route audio from your desired source (media player, streaming site, etc.) to the virtual cable
- CAPTION.Ninja will now transcribe audio from any application sending to the virtual cable
This technique works for:
- YouTube or Twitch live streams
- Audio from video files
- System sounds
- Audio from other applications like Zoom or Teams
- Game audio
The virtual audio cable acts as a bridge between your audio sources and CAPTION.Ninja, effectively turning any audio into captions.
Translation Features
CAPTION.Ninja offers multiple ways to translate content:
Method 1: Dedicated Translation Page
Use https://caption.ninja/translate for real-time translation capabilities:
- Select source and target languages from the dropdown menus
- Browser-based transcription + Mozilla's free translation service (17 languages)
- Optional Google Cloud Translation integration for premium results (100+ languages)
- Works with the same overlay system
Free translation languages supported: Bulgarian (bg), Czech (cs), Dutch (nl), English (en), Estonian (et), German (de), French (fr), Icelandic (is), Italian (it), Norwegian Bokmål (nb), Norwegian Nynorsk (nn), Persian (fa), Polish (pl), Portuguese (pt), Russian (ru), Spanish (es), Ukrainian (uk)
Method 2: Multiple Language Outputs from Single Source
A more efficient approach for multiple language support:
- Use the standard capture page (index.html) with your preferred input language
- Create multiple overlay pages with different target languages by adding the
&translate=XXparameter - Share these overlay URLs with viewers who need different languages
Example:
Main Capture: https://caption.ninja/?room=abc123&lang=en-US
English Overlay: https://caption.ninja/overlay?room=abc123
Spanish Overlay: https://caption.ninja/overlay?room=abc123&translate=es
French Overlay: https://caption.ninja/overlay?room=abc123&translate=fr
German Overlay: https://caption.ninja/overlay?room=abc123&translate=de
Benefits of this approach:
- Single transcription source with multiple translation outputs
- No need to run multiple browser tabs for different languages
- Lower resource usage on the broadcasting computer
- Viewers select their preferred language by accessing the appropriate URL
- Translation processing happens in the viewer's browser
Note: The translation quality using this method relies on the viewer's browser capabilities and may vary compared to the dedicated translation page.
Method 2.5: Show Translation + Transcript in One Overlay
Prefer one browser source that shows both languages? Add &dual=1 (or &view=dual) to any translated overlay URL. The translated line renders first and the original transcript appears beneath it in a compact style, so the overlay stays roughly the same height.
https://caption.ninja/overlay?room=abc123&translate=ja&googlekey=YOUR_API_KEY&dual=1
&clear, &showtime, &maxlines, and other history flags still apply to the combined block, and TTS continues speaking the translated text while the original is just displayed.
Method 3: Premium/Remote Translation in Overlay
For professional-quality translation with 100+ language support, you can use Google Cloud or other remote providers directly in the overlay:
https://caption.ninja/overlay?room=abc123&translate=ja&googlekey=YOUR_API_KEY
Features:
- Context-aware translation: Add
&context=1for better accuracy in conversations - Adjustable context size: Use
&contextsize=5to include more previous messages - Force local translation: Add
&forcelocal=1to use Mozilla even with API key - Override source language: Use
&fromlang=esif auto-detection isn't working
Example with all features:
https://caption.ninja/overlay?room=abc123&translate=ko&googlekey=KEY&context=1&contextsize=3
This provides professional-grade translation quality while maintaining the simple overlay system.
Example (OpenAI-compatible provider):
https://caption.ninja/overlay?room=abc123&translate=ja&tprovider=openai&tmodel=gpt-4o-mini&tkey=YOUR_API_KEY
Translation Parameters Reference
| Parameter | Description | Example |
|-----------|-------------|---------|
| translate=XX or lang=XX or ln=XX | Target translation language | &translate=es |
| fromlang=XX | Override source language detection | &fromlang=en |
| googlekey=KEY or gkey=KEY | Google Cloud Translation API key | &googlekey=YOUR_KEY |
| tprovider=google|openai|anthropic|ollama|local | Select remote/local translation provider | &tprovider=openai |
| tkey=KEY or translatekey=KEY | API key for non-Google remote providers | &tkey=YOUR_KEY |
| turl=URL | Custom API base URL for proxy/self-hosted providers | &turl=http://127.0.0.1:11434/v1 |
| tmodel=MODEL | Model id for OpenAI-compatible/Anthropic/Ollama | &tmodel=gpt-4o-mini |
| context=1 | Enable context-aware translation | &context=1 |
| contextsize=N | Number of previous messages for context (default: 2) | &contextsize=5 |
| forcelocal=1 | Force Mozilla translation even with API key | &forcelocal=1 |
For a comprehensive guide to all translation features, visit: https://caption.ninja/translation-guide.html
TTS Integration
Caption.Ninja can read captions aloud using browser/system TTS or the tts.rocks engine (Kokoro, Piper, ElevenLabs, Google, OpenAI, etc.). Enable it via URL parameters; nothing changes by default.
- Overlay readout:
overlay.html?room=abc123&tts=en-US- Built‑in providers:
&ttsprovider=google&ttskey=YOUR_KEY,&ttsprovider=elevenlabs&elevenlabskey=KEY&voice11=VOICE_ID - Use tts.rocks engine:
&ttslib=rocks&ttsprovider=kokoro&voicekokoro=af_aoede&korospeed=1.0 - Optional interim streaming:
&ttsstream=1
- Built‑in providers:
- Capture readout:
index.html?room=abc123&lang=en-US&tts=en-US - Manual readout:
manual.html?room=abc123&tts=en-US - Pop‑out TTS window (tts.rocks bridge): add
&ttspopout=1to auto‑open, or go directly:tts.rocks/caption-bridge.html?room=abc123&tts=en-US&ttsprovider=kokoro
Quick discovery (GUI):
- tts.rocks homepage now includes a “Use with Caption.Ninja” panel to generate ready‑to‑use links (Overlay, Capture, Manual, Bridge) based on your chosen engine, keys, voices, and rates.
- Voice picker and URL builder:
tts.rocks/tts.htmllists local voices and generates example URLs.
Security note: API keys in URLs are visible to anyone with the link. Prefer local/native providers when possible, or only share overlays that do not embed keys.
Language Support
Default language is &lang=en-US. Change the language by adding a language code parameter.
Supported language codes: https://cloud.google.com/speech-to-text/docs/languages
Manual Text Entry Mode
For situations where automatic transcription isn't ideal, use manual text entry: https://caption.ninja/manual.html
This lets you type captions directly, which appear on the same overlay system.
