SubtitleLLM

An intelligent automatic video subtitle generation system that combines OpenAI Whisper speech recognition with LLM correction for high-quality subtitle creation.

Generate Convert Improve

Install / Use

/learn @jabee0228/SubtitleLLM

About this skill

Quality Score

0/100

README

SubtitleLLM

An intelligent automatic video subtitle generation system that combines OpenAI Whisper speech recognition with LLM correction for high-quality subtitle creation.

Features

Automatic Audio Extraction: Extract audio from various video formats (MP4, AVI, MOV, MKV, WMV, FLV)
Speech Recognition: Convert speech to text using OpenAI Whisper models
AI-Powered Correction: Improve transcription accuracy using LLM correction (OpenAI GPT or Google Gemini)
Flexible Configuration: Customizable settings for different use cases
Command Line Interface: Easy-to-use CLI with extensive options

Installation

Clone the repository:

git clone <repository-url>
cd subtitleLLM

Install dependencies:

pip install -r requirements.txt

Set up environment variables (copy .env.example to .env file):

# For OpenAI (optional, for LLM correction)
OPENAI_API_KEY=your_openai_api_key

# For Google Gemini (optional, alternative to OpenAI)
GEMINI_API_KEY=your_gemini_api_key

Requirements

Python 3.11+
FFmpeg (for audio processing)
GPU support recommended for larger Whisper models

Dependencies

openai-whisper: Speech recognition
openai: GPT API access
google-generativeai: Gemini API access
ffmpeg-python: Audio processing
moviepy: Video file handling
python-dotenv: Environment variable management

Usage

Basic Usage

python main.py video.mp4

Advanced Usage Examples

# Specify output directory
python main.py video.mp4 -o ./output

# Disable LLM correction (faster processing)
python main.py video.mp4 --no-correction

# Use different Whisper model
python main.py video.mp4 --model small

# Generate English subtitles
python main.py video.mp4 --lang en

# Use Google Gemini instead of OpenAI
python main.py video.mp4 --provider gemini

# Keep temporary files for debugging
python main.py video.mp4 --keep-temp

# Enable debug logging
python main.py video.mp4 --log-level DEBUG

Command Line Options

video_path: Path to input video file (required)
-o, --output-dir: Output directory (default: same as video directory)
--model: Whisper model size - tiny, base, small, medium, large (default: base)
--lang: Language code (default: zh for Chinese)
--provider: LLM provider - openai, gemini (default: openai)
--no-correction: Disable LLM correction
--keep-temp: Keep temporary files
--log-level: Logging level - DEBUG, INFO, WARNING, ERROR (default: INFO)

Configuration

The system uses a hierarchical configuration system with support for environment variables and command-line overrides.

Environment Variables

OPENAI_API_KEY: OpenAI API key for GPT models
GEMINI_API_KEY: Google Gemini API key
WHISPER_MODEL: Default Whisper model name
DEFAULT_LANGUAGE: Default language for transcription

Configuration Classes

The application uses structured configuration with the following sections:

WhisperConfig: Whisper model and language settings
LLMConfig: LLM provider, API keys, and model parameters
SubtitleConfig: Subtitle formatting options
ProcessingConfig: Processing pipeline settings

Supported Languages

The system supports all languages supported by OpenAI Whisper, including:

English
Chinese

Performance Considerations

Whisper Model Selection

tiny: Fastest, lowest accuracy (~39 MB)
base: Good balance of speed and accuracy (~74 MB)
small: Better accuracy, slower (~244 MB)
medium: High accuracy (~769 MB)
large: Best accuracy, slowest (~1550 MB)

LLM Correction

Improves transcription accuracy significantly
Adds processing time and API costs
Can be disabled with --no-correction for faster processing

Development

Project Structure

subtitleLLM/
├── main.py                 # Main entry point
├── requirements.txt        # Dependencies
├── config/
│   ├── __init__.py
│   └── settings.py        # Configuration classes
├── core/
│   ├── __init__.py
│   ├── audio_extractor.py # Audio extraction
│   ├── whisper_transcriber.py # Speech recognition
│   ├── llm_corrector.py   # LLM correction
│   ├── subtitle_generator.py # Subtitle generation
│   └── video_processor.py # Main coordinator
├── tests/
│   ├── __init__.py
│   └── test_basic.py      # Basic tests
└── utils/
    └── __init__.py

License

This project is licensed under the MIT License - see the LICENSE file for details.

Future Work

Additional Output Formats: Support for VTT, ASS, and other subtitle formats
GUI Interface: Development of a graphical user interface for easier operation
Batch Processing: Support for processing multiple video files simultaneously

Contributing

Contributions are welcome! Please feel free to submit issues, feature requests, or pull requests.

Related Skills

docs-writer

99.6k

`docs-writer` skill instructions As an expert technical writer and editor for the Gemini CLI project, you produce accurate, clear, and consistent documentation. When asked to write, edit, or revie

model-usage

341.8k

Use CodexBar CLI local cost usage to summarize per-model usage for Codex or Claude, including the current (most recent) model or a full model breakdown. Trigger when asked for model-level usage/cost data from codexbar, or when you need a scriptable per-model summary from codexbar cost JSON.

ddd

Guía de Principios DDD para el Proyecto > 📚 Documento Complementario : Este documento define los principios y reglas de DDD. Para ver templates de código, ejemplos detallados y guías paso

arscontexta

2.9k

Claude Code plugin that generates individualized knowledge systems from conversation. You describe how you think and work, have a conversation and get a complete second brain as markdown files you own.

jabee0228

View profile

View on GitHub

GitHub Stars6

CategoryContent

Updated6mo ago

Forks0

jabee0228/SubtitleLLM

Languages

Python

Security Score

77/100

Audited on Sep 20, 2025

No findings