OCR
A Simple tool to OCR Muse HardSub subtitles.
Install / Use
/learn @TPN-Team/OCRREADME
TPN-Team's OCR tool
How it work?
Diff a hardsubbed video with video that is not hardsubbed to detect subtitles frames (Default: VapourSynth). Another way is using VideoSubFinder to extract subtitles frames. Extract these frames as images and OCR them using Google Lens. Integrate the resulting text into an SRT subtitle file.
Accuracy
Guess this :)
Setup
For Windows
Step 1: Install Python
Step 2: Create the Virtual Environment: python -m venv .venv
Step 3: Activate the Virtual Environment:
# window cmd
.venv\Scripts\activate.bat
# or
# window powershell
Set-ExecutionPolicy Unrestricted -Scope Process
.venv\Scripts\Activate.ps1
Step 4: Install python libraies:
pip install -r requirements.txt
[!NOTE]
If you are pretend to using OCRing images from VideoSubFinder only, you can skip the steps below.
Step 5: Install VapourSynth
Step 6: Install vsrepo
git clone https://github.com/vapoursynth/vsrepo
Step 7: Install vapoursynth plugins:
python ./vsrepo/vsrepo.py install acrop hysteresis bestsource misc tcanny tedgemask resize2 imwri akarin vszip
Step 8: Install vsjetpack
pip install vsjetpack==0.6.2 vspreview
For Arch linux
Step 1: Install Python: yay -S python
Step 2: Create the Virtual Environment: python -m venv .venv
Step 3: Activate the Virtual Environment:
# for fish shell
. .venv/bin/activate.fish
# bash shell
. .venv/bin/activate
Step 4: Install python libraies: pip install -r requirements.txt
[!NOTE]
If you are pretend to using OCRing images from VideoSubFinder only, you can skip the steps below.
Step 5: Install VapourSynth + ffmpeg: yay -S vapoursynth ffmpeg
Step 6: Install vapoursynth plugins:
yay -S vapoursynth-plugin-imwri-git vapoursynth-plugin-bestsource-git vapoursynth-plugin-misc-git vapoursynth-plugin-resize2-git vapoursynth-plugin-tcanny-git vapoursynth-plugin-tedgemask-git vapoursynth-plugin-vszip-git
git clone https://github.com/vapoursynth/vsrepo
# Install hysteresis plugin
sudo python ./vsrepo/vsrepo.py update
sudo python ./vsrepo/vsrepo.py install hysteresis
# The following commands to build acrop plugin
git clone https://github.com/Irrational-Encoding-Wizardry/vapoursynth-autocrop
# Link C interfaces to build acrop
cp -R /usr/include/vapoursynth/*.h ./vapoursynth-autocrop/
# Build and install acrop plugin
cd ./vapoursynth-autocrop/ && sudo g++ -std=c++11 -shared -fPIC -O2 ./autocrop.cpp -o /usr/lib/vapoursynth/libautocrop.so && cd ..
# Install vsjetpack
pip install vsjetpack==0.6.2 vspreview
Usage
Vapoursynth Method
Prepare two sources.
- Muse HardSubed from YouTube, should choose 720p AVC format.
- Non HardSubed, should be the same resoluion with HardSubed source. Higher resolution will take a longer time to process.
Two sources must be synchronized. If not, adjust offset arguments.
python run.py clean.mkv sub.mp4
Batch mode Prepare 2 folder, one is contain HardSub, another is contain Non HardSubed. Episode naming between 2 folder must be the same. Give program the path to 2 folder above.
python run.py clean sub
For non-Muse sources, it is necessary to adjust the crop parameters to an subtitles area, also may need to adjust SceneDetect threshold. In filter.py with preview.
Modify Filter function at the end of filter.py file.
filter = Filter(r"clean.mkv", 0, r"sub.mkv", 0, images_dir=Path("images"))
python -m vspreview filter.py
VideoSubFinder Method
If two sources is hard to sync, then use VSF instead to generate subtitles frame.
python run.py --engine videosubfinder -vsf {Path to VideoSubFinderWXW} -i {Path to Video Directory or Video File}
If Windows VideoSubFinderWXW path must have ".exe" suffix. eg: blabla/VideoSubFinderWXW.exe.
If VideoSubFinderWXW already in Path then you no need to specify path to VideoSubFinderWXW.
For more VideoSubFinder tunning param.
python run.py --help
OCR Engine
OCR Engine Settings:
--ocr_engine OCR_ENGINE
Select OCR engine. Choices: ['gglens', 'gemini']. Default: gglens
--gglens_thread GGLENS_THREAD
Google Lens OCR threads.
--gemini_model GEMINI_MODEL
Gemini model name. Default: gemini-2.5-flash
--gemini_batch_size GEMINI_BATCH_SIZE
Gemini batch size for processing multiple images. Default: 50
--gemini_prompt GEMINI_PROMPT
Custom context prompt for Gemini OCR processing
--gemini_max_retries GEMINI_MAX_RETRIES
Maximum retry attempts for failed Gemini API calls. Default: 3
--gemini_retry_delay GEMINI_RETRY_DELAY
Delay between Gemini retry attempts in seconds. Default: 5.0
--gemini_max_workers GEMINI_MAX_WORKERS
Maximum concurrent workers for Gemini batch processing. Default: 3
For Gemini need to set GOOGLE_API_KEY or GEMINI_API_KEY in env. Example: Windows with Powershell:
$env:GOOGLE_API_KEY = "your key"
Linux with Bash
export GOOGLE_API_KEY="your key"
Acknowledgement
Related Skills
node-connect
352.0kDiagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps
frontend-design
111.1kCreate distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.
openai-whisper-api
352.0kTranscribe audio via OpenAI Audio Transcriptions API (Whisper).
qqbot-media
352.0kQQBot 富媒体收发能力。使用 <qqmedia> 标签,系统根据文件扩展名自动识别类型(图片/语音/视频/文件)。
