Pyvideotrans
Translate the video from one language to another and embed dubbing & subtitles.
Install / Use
/learn @jianchang512/PyvideotransREADME
Sponsors: Recall.ai - Meeting Transcription API
If you’re looking for a transcription API for meetings, consider checking out Recall.ai , an API that works with Zoom, Google Meet, Microsoft Teams, and more
pyVideoTrans
<div align="center">A Powerful Open Source Video Translation / Audio Transcription / AI Dubbing / Subtitle Translation Tool
中文 | Documentation | Online Q&A
</div>pyVideoTrans is dedicated to seamlessly converting videos from one language to another, offering a complete workflow that includes speech recognition, subtitle translation, multi-role dubbing, and audio-video synchronization. It supports both local offline deployment and a wide variety of mainstream online APIs.
<img width="1658" height="935" alt="image" src="https://github.com/user-attachments/assets/c5959e59-6014-480c-9a7d-44c2b1729d36" />✨ Core Features
- 🎥 Fully Automatic Video Translation: One-click workflow: Speech Recognition (ASR) -> Subtitle Translation -> Speech Synthesis (TTS) -> Video Synthesis.
- 🎙️ Audio Transcription / Subtitle Generation: Batch convert audio/video to SRT subtitles, supporting Speaker Diarization to distinguish between different roles.
- 🗣️ Multi-Role AI Dubbing: Assign different AI dubbing voices to different speakers.
- 🧬 Voice Cloning: Integrates models like F5-TTS, CosyVoice, GPT-SoVITS for zero-shot voice cloning.
- 🧠 Powerful Model Support:
- ASR: Faster-Whisper (Local), OpenAI Whisper, Alibaba Qwen, ByteDance Volcano, Azure, Google, etc.
- LLM Translation: DeepSeek, ChatGPT, Claude, Gemini, MiniMax, Ollama (Local), Alibaba Bailian, etc.
- TTS: Edge-TTS (Free), OpenAI, Azure, Minimaxi, ChatTTS, ChatterBox, etc.
- 🖥️ Interactive Editing: Supports pausing and manual proofreading at each stage (recognition, translation, dubbing) to ensure accuracy.
- 🛠️ Utility Toolkit: Includes auxiliary tools such as vocal separation, video/subtitle merging, audio-video alignment, and transcript matching.
- 💻 Command Line Interface (CLI): Supports headless operation, convenient for server deployment or batch processing.
🚀 Quick Start (Windows Users)
We provide a pre-packaged .exe version for Windows 10/11 users, requiring no Python environment configuration.
- Download: Click to download the latest pre-packaged version
- Unzip: Extract the compressed file to a path (e.g.,
D:\pyVideoTrans). - Run: Double-click
sp.exeinside the folder to launch.
Note:
- Do not run directly from within the compressed archive.
- To use GPU acceleration, ensure CUDA 12.8 and cuDNN 9.11 are installed.
🛠️ Source Deployment (macOS / Linux / Windows Developers)
We recommend using uv for package management for faster speed and better environment isolation.
1. Prerequisites
- Python: Recommended version 3.10 --> 3.12
- FFmpeg: Must be installed and configured in the environment variables.
- macOS:
brew install ffmpeg libsndfile git - Linux (Ubuntu/Debian):
sudo apt-get install ffmpeg libsndfile1-dev - Windows: Download FFmpeg and configure Path, or place
ffmpeg.exeandffprobe.exedirectly in the project directory.
- macOS:
2. Install uv (If not installed)
# macOS/Linux
curl -LsSf https://astral.sh/uv/install.sh | sh
# Windows (PowerShell)
powershell -c "irm https://astral.sh/uv/install.ps1 | iex"
3. Clone and Install
# 1. Clone the repository (Ensure path has no spaces/Chinese characters)
git clone https://github.com/jianchang512/pyvideotrans.git
cd pyvideotrans
# 2. Install dependencies (uv automatically syncs environment)
uv sync
# If you need local channels for qwen-tts and qwen-asr, please execute `uv sync --extra qwen-tts --extra qwen-asr`
4. Launch Software
Launch GUI:
uv run sp.py
Use CLI:
# Video Translation Example
uv run cli.py --task vtv --name "./video.mp4" --source_language_code zh --target_language_code en
# Audio to Subtitle Example
uv run cli.py --task stt --name "./audio.wav" --model_name large-v3
5. (Optional) GPU Acceleration Configuration
If you have an NVIDIA graphics card, execute the following commands to install the CUDA-supported PyTorch version:
# Uninstall CPU version
uv remove torch torchaudio
# Install CUDA version (Example for CUDA 12.x)
uv add torch==2.7 torchaudio==2.7 --index-url https://download.pytorch.org/whl/cu128
uv add nvidia-cublas-cu12 nvidia-cudnn-cu12
🧩 Supported Channels & Models (Partial)
| Category | Channel/Model | Description | | :--- | :--- | :--- | | ASR (Speech Recognition) | Faster-Whisper (Local) | Recommended, fast speed, high accuracy | | | WhisperX / Parakeet | Supports timestamp alignment & speaker diarization | | | Alibaba Qwen3-ASR / ByteDance Volcano | Online API, excellent for Chinese | | Translation (LLM/MT) | DeepSeek / ChatGPT | Supports context understanding, more natural translation | | | MiniMax AI | MiniMax M2.7 LLM, latest flagship model, OpenAI-compatible | | | Google / Microsoft | Traditional machine translation, fast speed | | | Ollama / M2M100 | Fully local offline translation | | TTS (Speech Synthesis) | Edge-TTS | Microsoft free interface, natural effect | | | F5-TTS / CosyVoice | Supports Voice Cloning, requires local deployment | | | GPT-SoVITS / ChatTTS | High-quality open-source TTS | | | 302.AI / OpenAI / Azure | High-quality commercial API |
📚 Documentation & Support
- Official Documentation: https://pyvideotrans.com (Includes detailed tutorials, API configuration guides, FAQ)
- Online Q&A Community: https://bbs.pyvideotrans.com (Submit error logs for automated AI analysis and answers)
⚠️ Disclaimer
This software is an open-source, free, non-commercial project. Users are solely responsible for any legal consequences arising from the use of this software (including but not limited to calling third-party APIs or processing copyrighted video content). Please comply with local laws and regulations and the terms of use of relevant service providers.
🙏 Acknowledgements
This project mainly relies on the following open-source projects (partial):
Created by jianchang512
Related Skills
docs-writer
99.2k`docs-writer` skill instructions As an expert technical writer and editor for the Gemini CLI project, you produce accurate, clear, and consistent documentation. When asked to write, edit, or revie
model-usage
337.7kUse CodexBar CLI local cost usage to summarize per-model usage for Codex or Claude, including the current (most recent) model or a full model breakdown. Trigger when asked for model-level usage/cost data from codexbar, or when you need a scriptable per-model summary from codexbar cost JSON.
ddd
Guía de Principios DDD para el Proyecto > 📚 Documento Complementario : Este documento define los principios y reglas de DDD. Para ver templates de código, ejemplos detallados y guías paso
arscontexta
2.9kClaude Code plugin that generates individualized knowledge systems from conversation. You describe how you think and work, have a conversation and get a complete second brain as markdown files you own.
