Whisply
๐ฌ Fast, cross-platform CLI and GUI for batch transcription, translation, speaker annotation and subtitle generation using OpenAIโs Whisper on CPU, Nvidia GPU and Apple MLX.
Install / Use
/learn @tsmdt/WhisplyREADME
whisply
<img src="assets/whisply.png" width="25%">Transcribe, translate, annotate and subtitle audio and video files with OpenAI's Whisper ... fast!
whisply combines faster-whisper and mlx-whisper to offer an easy-to-use solution for batch processing files on Windows, Linux and Mac. It also enables word-level speaker annotation by integrating whisperX and pyannote.
Table of contents
Features
-
๐ดโโ๏ธ Performance:
whisplyselects the fastest Whisper implementation based on your hardware:- CPU/GPU (Nvidia CUDA):
faster-whisperorwhisperX - MLX (Apple M1-M5):
mlx-whisper
- CPU/GPU (Nvidia CUDA):
-
โฉ large-v3-turbo Ready: Support for whisper-large-v3-turbo on all devices. Note: Subtitling and annotations on CPU/GPU use
whisperXfor accurate timestamps, butwhisper-large-v3-turboisnโt currently available forwhisperX. -
โ Auto Device Selection:
whisplyautomatically choosesfaster-whisper(CPU, Nvidia GPU) orwhisper-MLX(Apple M1-M5) for transcription and translation unless a specific--deviceoption is passed. -
๐ฃ๏ธ Word-level Annotations: Enabling
--subtitleor--annotateuseswhisperXandpyannotefor word segmentation and speaker annotations.whisplyapproximates missing timestamps for numeric words. -
๐ฌ Customizable Subtitles: Specify words per subtitle block (e.g., "5") to generate
.srt,.vttand.webvttfiles with fixed word counts and timestamps. -
๐ฆ Batch Processing: Handle single files, folders, URLs, or lists via
.listdocuments. See the Batch processing section for details. -
๐ฉโ๐ป CLI / App:
whisplycan be run directly from CLI or as a browser app. -
โ๏ธ Export Formats:
- Structured:
.json,.rttm - Unstructured:
.txt,.txt (annotated) - Markup:
.html(compatible with noScribe's editor) - Subtitles:
.srt,.webvtt,.vtt
- Structured:
Requirements
- FFmpeg
- >= Python3.10 <Python3.14
- GPU processing requires:
- Nvidia GPU (CUDA: cuBLAS and cuDNN for CUDA 12)
- Apple Silicon (Mac M1-M5)
- Speaker annotation requires a HuggingFace Access Token
Installation
Install ffmpeg
# --- macOS ---
brew install ffmpeg
# --- Linux ---
sudo apt-get update
sudo apt-get install ffmpeg
# --- Windows ---
winget install Gyan.FFmpeg
For more information you can visit the FFmpeg website.
Installation with pip
pip install whisplyinstalls CPU + annotation dependencies (torch, torchaudio, pyannote) out of the box. Add one of the extras below if you want MLX acceleration or the whisply browser app.
- Create a Python virtual environment
python3 -m venv venv
- Activate the environment
# --- Linux & macOS ---
source venv/bin/activate
# --- Windows ---
venv\Scripts\activate
- Install whisply
pip install whisply
- (Optional) Install extras if you need them
pip install ".[app]" # For running the whisply browser app
pip install ".[mlx]" # For running whisply-MLX on Apple M1-M5
Installation from source
- Clone this repository
git clone https://github.com/tsmdt/whisply.git
- Change to project folder
cd whisply
- Create a Python virtual environment
python3 -m venv venv
- Activate the Python virtual environment
# --- Linux & macOS ---
source venv/bin/activate
# --- Windows ---
venv\Scripts\activate
- Install whisply
pip install .
- (Optional) Install whisply extras
pip install -e ".[mlx,app]"
Nvidia GPU fix (November 2025)
<details> <summary><i>Could not load library libcudnn_ops.so.9</i> (<b>click to expand</b>)</summary> <br>If you use <b>whisply</b> with a Nvidia GPU and encounter this error:<br><br>Unable to load any of {libcudnn_ops.so.9.1.0, libcudnn_ops.so.9.1, libcudnn_ops.so.9, libcudnn_ops.so}
<b>Use the following steps to fix the issue</b>:
- In your activated python environment run
pip listand check thattorch==2.8.0andtorchaudio==2.8.0are installed. - If yes, run
pip install ctranslate2==4.6.0. - Export the following environment variable to your shell:
export LD_LIBRARY_PATH="$(
python - <<'PY'
import importlib.util, pathlib
spec = importlib.util.find_spec("nvidia.cudnn")
if not spec or not spec.submodule_search_locations:
raise SystemExit("Could not locate nvidia.cudnn package")
pkg_dir = pathlib.Path(spec.submodule_search_locations[0])
lib_dir = pkg_dir / "lib"
print(lib_dir)
PY
):${LD_LIBRARY_PATH}"
- To make the change permanent, run this bash command while your python environment is activated:
printf '\n# --- add cuDNN wheel dir ---\nexport LD_LIBRARY_PATH="$(python - <<'"'"'PY'"'"'\nimport importlib.util, pathlib\nspec = importlib.util.find_spec("nvidia.cudnn")\npkg_dir = pathlib.Path(spec.submodule_search_locations[0])\nprint(pkg_dir / "lib")\nPY\n):${LD_LIBRARY_PATH}"\n' >> "$VIRTUAL_ENV/bin/activate"
Finally, deactivate the environment and reactivate it to apply the changes.
Find additional information at <a href="https://github.com/SYSTRAN/faster-whisper" target="_blank">faster-whisper</a>'s GitHub page.
</details>Usage
CLI
Three CLI commands are available:
whisply run: Running a transcription taskwhisply app: Starting the whisply browser appwhisply list: Listing available models
$ whisply run
Usage: whisply run [OPTIONS]
Transcribe files with whisply
โญโ Options โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฎ
โ --files -f TEXT Path to file, folder, URL or .list to process. โ
โ --output_dir -o DIRECTORY Output folder [default: transcriptions] โ
โ --device -d [auto|cpu|gpu|mlx] CPU, GPU (NVIDIA), MLX (Mac M1-M5) [default: auto] โ
โ --model -m TEXT Whisper model (run "whisply list" to see options) โ
โ [default: large-v3-turbo] โ
โ --language -l TEXT Language of your file(s) ("en", "de") (Default: auto-detect) โ
โ --annotate -a Enable speaker annotation (Default: False) โ
โ --num_speakers -num INTEGER Number of speakers to annotate (Default: auto-detect) โ
โ --hf_token -hf TEXT HuggingFace Access token required for speaker annotation โ
โ --subtitle -s Create subtitles (Default: False) โ
โ --subtitle_length -sub_length INTEGER Subtitle segment length in words [default: 5] โ
โ --translate -t Translate transcription to English (Default: False) โ
โ --export -e [all|json|txt|rttm|vtt|webvtt|srt|html] Choose the export format [default: all] โ
โ --del_originals -del Delete input files after file conversion. (Default: False) โ
โ --download_language -dl TEXT Specify a language code ("en", "de" ...) to transcribe a โ
โ specific audio track of a URL. (Default: auto-detect) โ
โ --config -c PATH Path to configuration file โ
โ --post_correction -post PATH Path to YAML file for post-correction โ
โ --verbose -v Print text chunks during transcription (Default: False) โ
โ --help Show this message and exit. โ
โฐโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฏ
Related Skills
node-connect
347.0kDiagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps
frontend-design
107.8kCreate distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.
openai-whisper-api
347.0kTranscribe audio via OpenAI Audio Transcriptions API (Whisper).
qqbot-media
347.0kQQBot ๅฏๅชไฝๆถๅ่ฝๅใไฝฟ็จ <qqmedia> ๆ ็ญพ๏ผ็ณป็ปๆ นๆฎๆไปถๆฉๅฑๅ่ชๅจ่ฏๅซ็ฑปๅ๏ผๅพ็/่ฏญ้ณ/่ง้ข/ๆไปถ๏ผใ
