SkillAgentSearch skills...

Whisply

๐Ÿ’ฌ Fast, cross-platform CLI and GUI for batch transcription, translation, speaker annotation and subtitle generation using OpenAIโ€™s Whisper on CPU, Nvidia GPU and Apple MLX.

Install / Use

/learn @tsmdt/Whisply

README

whisply

PyPI version

<img src="assets/whisply.png" width="25%">

Transcribe, translate, annotate and subtitle audio and video files with OpenAI's Whisper ... fast!

whisply combines faster-whisper and mlx-whisper to offer an easy-to-use solution for batch processing files on Windows, Linux and Mac. It also enables word-level speaker annotation by integrating whisperX and pyannote.

Table of contents

Features

  • ๐Ÿšดโ€โ™‚๏ธ Performance: whisply selects the fastest Whisper implementation based on your hardware:

    • CPU/GPU (Nvidia CUDA): faster-whisper or whisperX
    • MLX (Apple M1-M5): mlx-whisper
  • โฉ large-v3-turbo Ready: Support for whisper-large-v3-turbo on all devices. Note: Subtitling and annotations on CPU/GPU use whisperX for accurate timestamps, but whisper-large-v3-turbo isnโ€™t currently available for whisperX.

  • โœ… Auto Device Selection: whisply automatically chooses faster-whisper (CPU, Nvidia GPU) or whisper-MLX (Apple M1-M5) for transcription and translation unless a specific --device option is passed.

  • ๐Ÿ—ฃ๏ธ Word-level Annotations: Enabling --subtitle or --annotate uses whisperX and pyannote for word segmentation and speaker annotations. whisply approximates missing timestamps for numeric words.

  • ๐Ÿ’ฌ Customizable Subtitles: Specify words per subtitle block (e.g., "5") to generate .srt, .vtt and .webvtt files with fixed word counts and timestamps.

  • ๐Ÿ“ฆ Batch Processing: Handle single files, folders, URLs, or lists via .list documents. See the Batch processing section for details.

  • ๐Ÿ‘ฉโ€๐Ÿ’ป CLI / App: whisply can be run directly from CLI or as a browser app.

  • โš™๏ธ Export Formats:

    • Structured: .json, .rttm
    • Unstructured: .txt, .txt (annotated)
    • Markup: .html (compatible with noScribe's editor)
    • Subtitles: .srt, .webvtt, .vtt

Requirements

  • FFmpeg
  • >= Python3.10 <Python3.14
  • GPU processing requires:
    • Nvidia GPU (CUDA: cuBLAS and cuDNN for CUDA 12)
    • Apple Silicon (Mac M1-M5)
  • Speaker annotation requires a HuggingFace Access Token

Installation

Install ffmpeg

# --- macOS ---
brew install ffmpeg

# --- Linux ---
sudo apt-get update
sudo apt-get install ffmpeg

# --- Windows ---
winget install Gyan.FFmpeg

For more information you can visit the FFmpeg website.

Installation with pip

pip install whisply installs CPU + annotation dependencies (torch, torchaudio, pyannote) out of the box. Add one of the extras below if you want MLX acceleration or the whisply browser app.

  1. Create a Python virtual environment
python3 -m venv venv
  1. Activate the environment
# --- Linux & macOS ---
source venv/bin/activate

# --- Windows ---
venv\Scripts\activate
  1. Install whisply
pip install whisply
  1. (Optional) Install extras if you need them
pip install ".[app]"  # For running the whisply browser app
pip install ".[mlx]"  # For running whisply-MLX on Apple M1-M5

Installation from source

  1. Clone this repository
git clone https://github.com/tsmdt/whisply.git
  1. Change to project folder
cd whisply
  1. Create a Python virtual environment
python3 -m venv venv
  1. Activate the Python virtual environment
# --- Linux & macOS ---
source venv/bin/activate

# --- Windows ---
venv\Scripts\activate
  1. Install whisply
pip install .
  1. (Optional) Install whisply extras
pip install -e ".[mlx,app]"

Nvidia GPU fix (November 2025)

<details> <summary><i>Could not load library libcudnn_ops.so.9</i> (<b>click to expand</b>)</summary> <br>If you use <b>whisply</b> with a Nvidia GPU and encounter this error:<br><br>
Unable to load any of {libcudnn_ops.so.9.1.0, libcudnn_ops.so.9.1, libcudnn_ops.so.9, libcudnn_ops.so}

<b>Use the following steps to fix the issue</b>:

  1. In your activated python environment run pip list and check that torch==2.8.0 and torchaudio==2.8.0 are installed.
  2. If yes, run pip install ctranslate2==4.6.0.
  3. Export the following environment variable to your shell:
export LD_LIBRARY_PATH="$(
python - <<'PY'
import importlib.util, pathlib

spec = importlib.util.find_spec("nvidia.cudnn")
if not spec or not spec.submodule_search_locations:
    raise SystemExit("Could not locate nvidia.cudnn package")

pkg_dir = pathlib.Path(spec.submodule_search_locations[0])
lib_dir = pkg_dir / "lib"
print(lib_dir)
PY
):${LD_LIBRARY_PATH}"
  1. To make the change permanent, run this bash command while your python environment is activated:
printf '\n# --- add cuDNN wheel dir ---\nexport LD_LIBRARY_PATH="$(python - <<'"'"'PY'"'"'\nimport importlib.util, pathlib\nspec = importlib.util.find_spec("nvidia.cudnn")\npkg_dir = pathlib.Path(spec.submodule_search_locations[0])\nprint(pkg_dir / "lib")\nPY\n):${LD_LIBRARY_PATH}"\n' >> "$VIRTUAL_ENV/bin/activate"

Finally, deactivate the environment and reactivate it to apply the changes.

Find additional information at <a href="https://github.com/SYSTRAN/faster-whisper" target="_blank">faster-whisper</a>'s GitHub page.

</details>

Usage

CLI

Three CLI commands are available:

  1. whisply run: Running a transcription task
  2. whisply app: Starting the whisply browser app
  3. whisply list: Listing available models
$ whisply run

 Usage: whisply run [OPTIONS]

 Transcribe files with whisply

โ•ญโ”€ Options โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ•ฎ
โ”‚ --files              -f               TEXT                                     Path to file, folder, URL or .list to process.               โ”‚
โ”‚ --output_dir         -o               DIRECTORY                                Output folder [default: transcriptions]                      โ”‚
โ”‚ --device             -d               [auto|cpu|gpu|mlx]                       CPU, GPU (NVIDIA), MLX (Mac M1-M5) [default: auto]           โ”‚
โ”‚ --model              -m               TEXT                                     Whisper model (run "whisply list" to see options)            โ”‚
โ”‚                                                                                [default: large-v3-turbo]                                    โ”‚
โ”‚ --language           -l               TEXT                                     Language of your file(s) ("en", "de") (Default: auto-detect) โ”‚
โ”‚ --annotate           -a                                                        Enable speaker annotation (Default: False)                   โ”‚
โ”‚ --num_speakers       -num             INTEGER                                  Number of speakers to annotate (Default: auto-detect)        โ”‚
โ”‚ --hf_token           -hf              TEXT                                     HuggingFace Access token required for speaker annotation     โ”‚
โ”‚ --subtitle           -s                                                        Create subtitles (Default: False)                            โ”‚
โ”‚ --subtitle_length    -sub_length      INTEGER                                  Subtitle segment length in words [default: 5]                โ”‚
โ”‚ --translate          -t                                                        Translate transcription to English (Default: False)          โ”‚
โ”‚ --export             -e               [all|json|txt|rttm|vtt|webvtt|srt|html]  Choose the export format [default: all]                      โ”‚
โ”‚ --del_originals      -del                                                      Delete input files after file conversion. (Default: False)   โ”‚
โ”‚ --download_language  -dl              TEXT                                     Specify a language code ("en", "de" ...) to transcribe a     โ”‚
โ”‚                                                                                specific audio track of a URL. (Default: auto-detect)        โ”‚
โ”‚ --config             -c               PATH                                     Path to configuration file                                   โ”‚
โ”‚ --post_correction    -post            PATH                                     Path to YAML file for post-correction                        โ”‚
โ”‚ --verbose            -v                                                        Print text chunks during transcription (Default: False)      โ”‚
โ”‚ --help                                                                         Show this message and exit.                                  โ”‚
โ•ฐโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ•ฏ

Related Skills

View on GitHub
GitHub Stars116
CategoryDevelopment
Updated40m ago
Forks18

Languages

Python

Security Score

100/100

Audited on Apr 3, 2026

No findings