Whisply

💬 Fast, cross-platform CLI and GUI for batch transcription, translation, speaker annotation and subtitle generation using OpenAI’s Whisper on CPU, Nvidia GPU and Apple MLX.

Generate Convert Improve

Install / Use

/learn @tsmdt/Whisply

About this skill

Quality Score

0/100

README

whisply

Transcribe, translate, annotate and subtitle audio and video files with OpenAI's Whisper ... fast!

whisply combines faster-whisper and mlx-whisper to offer an easy-to-use solution for batch processing files on Windows, Linux and Mac. It also enables word-level speaker annotation by integrating whisperX and pyannote.

Features
Requirements
Installation
Usage
Citation

Features

🚴‍♂️ Performance: whisply selects the fastest Whisper implementation based on your hardware:
- CPU/GPU (Nvidia CUDA): faster-whisper or whisperX
- MLX (Apple M1-M5): mlx-whisper
⏩ large-v3-turbo Ready: Support for whisper-large-v3-turbo on all devices. Note: Subtitling and annotations on CPU/GPU use whisperX for accurate timestamps, but whisper-large-v3-turbo isn’t currently available for whisperX.
✅ Auto Device Selection: whisply automatically chooses faster-whisper (CPU, Nvidia GPU) or whisper-MLX (Apple M1-M5) for transcription and translation unless a specific --device option is passed.
🗣️ Word-level Annotations: Enabling --subtitle or --annotate uses whisperX and pyannote for word segmentation and speaker annotations. whisply approximates missing timestamps for numeric words.
💬 Customizable Subtitles: Specify words per subtitle block (e.g., "5") to generate .srt, .vtt and .webvtt files with fixed word counts and timestamps.
📦 Batch Processing: Handle single files, folders, URLs, or lists via .list documents. See the Batch processing section for details.
👩‍💻 CLI / App: whisply can be run directly from CLI or as a browser app.
⚙️ Export Formats:
- Structured: .json, .rttm
- Unstructured: .txt, .txt (annotated)
- Markup: .html (compatible with noScribe's editor)
- Subtitles: .srt, .webvtt, .vtt

Requirements

FFmpeg
>= Python3.10 <Python3.14
GPU processing requires:
- Nvidia GPU (CUDA: cuBLAS and cuDNN for CUDA 12)
- Apple Silicon (Mac M1-M5)
Speaker annotation requires a HuggingFace Access Token

Installation

Install `ffmpeg`

# --- macOS ---
brew install ffmpeg

# --- Linux ---
sudo apt-get update
sudo apt-get install ffmpeg

# --- Windows ---
winget install Gyan.FFmpeg

For more information you can visit the FFmpeg website.

Installation with `pip`

pip install whisply installs CPU + annotation dependencies (torch, torchaudio, pyannote) out of the box. Add one of the extras below if you want MLX acceleration or the whisply browser app.

Create a Python virtual environment

python3 -m venv venv

Activate the environment

# --- Linux & macOS ---
source venv/bin/activate

# --- Windows ---
venv\Scripts\activate

Install whisply

pip install whisply

(Optional) Install extras if you need them

pip install ".[app]"  # For running the whisply browser app
pip install ".[mlx]"  # For running whisply-MLX on Apple M1-M5

Installation from `source`

Clone this repository

git clone https://github.com/tsmdt/whisply.git

Change to project folder

cd whisply

Create a Python virtual environment

python3 -m venv venv

Activate the Python virtual environment

# --- Linux & macOS ---
source venv/bin/activate

# --- Windows ---
venv\Scripts\activate

Install whisply

pip install .

(Optional) Install whisply extras

pip install -e ".[mlx,app]"

Nvidia GPU fix (November 2025)

<details> <summary>Could not load library libcudnn_ops.so.9 (click to expand)</summary> If you use whisply with a Nvidia GPU and encounter this error:

Unable to load any of {libcudnn_ops.so.9.1.0, libcudnn_ops.so.9.1, libcudnn_ops.so.9, libcudnn_ops.so}

Use the following steps to fix the issue:

In your activated python environment run pip list and check that torch==2.8.0 and torchaudio==2.8.0 are installed.
If yes, run pip install ctranslate2==4.6.0.
Export the following environment variable to your shell:

export LD_LIBRARY_PATH="$(
python - <<'PY'
import importlib.util, pathlib

spec = importlib.util.find_spec("nvidia.cudnn")
if not spec or not spec.submodule_search_locations:
    raise SystemExit("Could not locate nvidia.cudnn package")

pkg_dir = pathlib.Path(spec.submodule_search_locations[0])
lib_dir = pkg_dir / "lib"
print(lib_dir)
PY
):${LD_LIBRARY_PATH}"

To make the change permanent, run this bash command while your python environment is activated:

printf '\n# --- add cuDNN wheel dir ---\nexport LD_LIBRARY_PATH="$(python - <<'"'"'PY'"'"'\nimport importlib.util, pathlib\nspec = importlib.util.find_spec("nvidia.cudnn")\npkg_dir = pathlib.Path(spec.submodule_search_locations[0])\nprint(pkg_dir / "lib")\nPY\n):${LD_LIBRARY_PATH}"\n' >> "$VIRTUAL_ENV/bin/activate"

Finally, deactivate the environment and reactivate it to apply the changes.

Find additional information at <a href="https://github.com/SYSTRAN/faster-whisper" target="_blank">faster-whisper</a>'s GitHub page.

</details>

Usage

CLI

Three CLI commands are available:

whisply run: Running a transcription task
whisply app: Starting the whisply browser app
whisply list: Listing available models

$ whisply run

 Usage: whisply run [OPTIONS]

 Transcribe files with whisply

╭─ Options ───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ --files              -f               TEXT                                     Path to file, folder, URL or .list to process.               │
│ --output_dir         -o               DIRECTORY                                Output folder [default: transcriptions]                      │
│ --device             -d               [auto|cpu|gpu|mlx]                       CPU, GPU (NVIDIA), MLX (Mac M1-M5) [default: auto]           │
│ --model              -m               TEXT                                     Whisper model (run "whisply list" to see options)            │
│                                                                                [default: large-v3-turbo]                                    │
│ --language           -l               TEXT                                     Language of your file(s) ("en", "de") (Default: auto-detect) │
│ --annotate           -a                                                        Enable speaker annotation (Default: False)                   │
│ --num_speakers       -num             INTEGER                                  Number of speakers to annotate (Default: auto-detect)        │
│ --hf_token           -hf              TEXT                                     HuggingFace Access token required for speaker annotation     │
│ --subtitle           -s                                                        Create subtitles (Default: False)                            │
│ --subtitle_length    -sub_length      INTEGER                                  Subtitle segment length in words [default: 5]                │
│ --translate          -t                                                        Translate transcription to English (Default: False)          │
│ --export             -e               [all|json|txt|rttm|vtt|webvtt|srt|html]  Choose the export format [default: all]                      │
│ --del_originals      -del                                                      Delete input files after file conversion. (Default: False)   │
│ --download_language  -dl              TEXT                                     Specify a language code ("en", "de" ...) to transcribe a     │
│                                                                                specific audio track of a URL. (Default: auto-detect)        │
│ --config             -c               PATH                                     Path to configuration file                                   │
│ --post_correction    -post            PATH                                     Path to YAML file for post-correction                        │
│ --verbose            -v                                                        Print text chunks during transcription (Default: False)      │
│ --help                                                                         Show this message and exit.                                  │
╰─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯

Related Skills

node-connect

347.0k

Diagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps

frontend-design

107.8k

Create distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.

openai-whisper-api

347.0k

Transcribe audio via OpenAI Audio Transcriptions API (Whisper).

qqbot-media

347.0k

QQBot 富媒体收发能力。使用 <qqmedia> 标签，系统根据文件扩展名自动识别类型（图片/语音/视频/文件）。

tsmdt

View profile

View on GitHub

GitHub Stars116

CategoryDevelopment

Updated40m ago

Forks18

tsmdt/whisply

Languages

Python

Security Score

100/100

Audited on Apr 3, 2026

No findings

Whisply

Install / Use

README

whisply

Table of contents

Features

Requirements

Installation

Install ffmpeg

Installation with pip

Installation from source

Nvidia GPU fix (November 2025)

Usage

CLI

Related Skills

Install `ffmpeg`

Installation with `pip`

Installation from `source`