SubtitleLLM
An intelligent automatic video subtitle generation system that combines OpenAI Whisper speech recognition with LLM correction for high-quality subtitle creation.
Install / Use
/learn @jabee0228/SubtitleLLMREADME
SubtitleLLM
An intelligent automatic video subtitle generation system that combines OpenAI Whisper speech recognition with LLM correction for high-quality subtitle creation.
Features
- Automatic Audio Extraction: Extract audio from various video formats (MP4, AVI, MOV, MKV, WMV, FLV)
- Speech Recognition: Convert speech to text using OpenAI Whisper models
- AI-Powered Correction: Improve transcription accuracy using LLM correction (OpenAI GPT or Google Gemini)
- Flexible Configuration: Customizable settings for different use cases
- Command Line Interface: Easy-to-use CLI with extensive options
Installation
- Clone the repository:
git clone <repository-url>
cd subtitleLLM
- Install dependencies:
pip install -r requirements.txt
- Set up environment variables (copy
.env.exampleto.envfile):
# For OpenAI (optional, for LLM correction)
OPENAI_API_KEY=your_openai_api_key
# For Google Gemini (optional, alternative to OpenAI)
GEMINI_API_KEY=your_gemini_api_key
Requirements
- Python 3.11+
- FFmpeg (for audio processing)
- GPU support recommended for larger Whisper models
Dependencies
openai-whisper: Speech recognitionopenai: GPT API accessgoogle-generativeai: Gemini API accessffmpeg-python: Audio processingmoviepy: Video file handlingpython-dotenv: Environment variable management
Usage
Basic Usage
python main.py video.mp4
Advanced Usage Examples
# Specify output directory
python main.py video.mp4 -o ./output
# Disable LLM correction (faster processing)
python main.py video.mp4 --no-correction
# Use different Whisper model
python main.py video.mp4 --model small
# Generate English subtitles
python main.py video.mp4 --lang en
# Use Google Gemini instead of OpenAI
python main.py video.mp4 --provider gemini
# Keep temporary files for debugging
python main.py video.mp4 --keep-temp
# Enable debug logging
python main.py video.mp4 --log-level DEBUG
Command Line Options
video_path: Path to input video file (required)-o, --output-dir: Output directory (default: same as video directory)--model: Whisper model size - tiny, base, small, medium, large (default: base)--lang: Language code (default: zh for Chinese)--provider: LLM provider - openai, gemini (default: openai)--no-correction: Disable LLM correction--keep-temp: Keep temporary files--log-level: Logging level - DEBUG, INFO, WARNING, ERROR (default: INFO)
Configuration
The system uses a hierarchical configuration system with support for environment variables and command-line overrides.
Environment Variables
OPENAI_API_KEY: OpenAI API key for GPT modelsGEMINI_API_KEY: Google Gemini API keyWHISPER_MODEL: Default Whisper model nameDEFAULT_LANGUAGE: Default language for transcription
Configuration Classes
The application uses structured configuration with the following sections:
- WhisperConfig: Whisper model and language settings
- LLMConfig: LLM provider, API keys, and model parameters
- SubtitleConfig: Subtitle formatting options
- ProcessingConfig: Processing pipeline settings
Supported Languages
The system supports all languages supported by OpenAI Whisper, including:
- English
- Chinese
Performance Considerations
Whisper Model Selection
- tiny: Fastest, lowest accuracy (~39 MB)
- base: Good balance of speed and accuracy (~74 MB)
- small: Better accuracy, slower (~244 MB)
- medium: High accuracy (~769 MB)
- large: Best accuracy, slowest (~1550 MB)
LLM Correction
- Improves transcription accuracy significantly
- Adds processing time and API costs
- Can be disabled with
--no-correctionfor faster processing
Development
Project Structure
subtitleLLM/
├── main.py # Main entry point
├── requirements.txt # Dependencies
├── config/
│ ├── __init__.py
│ └── settings.py # Configuration classes
├── core/
│ ├── __init__.py
│ ├── audio_extractor.py # Audio extraction
│ ├── whisper_transcriber.py # Speech recognition
│ ├── llm_corrector.py # LLM correction
│ ├── subtitle_generator.py # Subtitle generation
│ └── video_processor.py # Main coordinator
├── tests/
│ ├── __init__.py
│ └── test_basic.py # Basic tests
└── utils/
└── __init__.py
License
This project is licensed under the MIT License - see the LICENSE file for details.
Future Work
- Additional Output Formats: Support for VTT, ASS, and other subtitle formats
- GUI Interface: Development of a graphical user interface for easier operation
- Batch Processing: Support for processing multiple video files simultaneously
Contributing
Contributions are welcome! Please feel free to submit issues, feature requests, or pull requests.
Related Skills
docs-writer
99.6k`docs-writer` skill instructions As an expert technical writer and editor for the Gemini CLI project, you produce accurate, clear, and consistent documentation. When asked to write, edit, or revie
model-usage
341.8kUse CodexBar CLI local cost usage to summarize per-model usage for Codex or Claude, including the current (most recent) model or a full model breakdown. Trigger when asked for model-level usage/cost data from codexbar, or when you need a scriptable per-model summary from codexbar cost JSON.
ddd
Guía de Principios DDD para el Proyecto > 📚 Documento Complementario : Este documento define los principios y reglas de DDD. Para ver templates de código, ejemplos detallados y guías paso
arscontexta
2.9kClaude Code plugin that generates individualized knowledge systems from conversation. You describe how you think and work, have a conversation and get a complete second brain as markdown files you own.
