SkillAgentSearch skills...

Whisper.cpp

Port of OpenAI's Whisper model in C/C++

Install / Use

/learn @ggml-org/Whisper.cpp
About this skill

Quality Score

0/100

Supported Platforms

Universal

README

whisper.cpp

whisper.cpp

Actions Status License: MIT Conan Center npm

Stable: v1.8.1 / Roadmap

High-performance inference of OpenAI's Whisper automatic speech recognition (ASR) model:

Supported platforms:

The entire high-level implementation of the model is contained in whisper.h and whisper.cpp. The rest of the code is part of the ggml machine learning library.

Having such a lightweight implementation of the model allows to easily integrate it in different platforms and applications. As an example, here is a video of running the model on an iPhone 13 device - fully offline, on-device: whisper.objc

https://user-images.githubusercontent.com/1991296/197385372-962a6dea-bca1-4d50-bf96-1d8c27b98c81.mp4

You can also easily make your own offline voice assistant application: command

https://user-images.githubusercontent.com/1991296/204038393-2f846eae-c255-4099-a76d-5735c25c49da.mp4

On Apple Silicon, the inference runs fully on the GPU via Metal:

https://github.com/ggml-org/whisper.cpp/assets/1991296/c82e8f86-60dc-49f2-b048-d2fdbd6b5225

Quick start

First clone the repository:

git clone https://github.com/ggml-org/whisper.cpp.git

Navigate into the directory:

cd whisper.cpp

Then, download one of the Whisper models converted in ggml format. For example:

sh ./models/download-ggml-model.sh base.en

Now build the whisper-cli example and transcribe an audio file like this:

# build the project
cmake -B build
cmake --build build -j --config Release

# transcribe an audio file
./build/bin/whisper-cli -f samples/jfk.wav

For a quick demo, simply run make base.en.

The command downloads the base.en model converted to custom ggml format and runs the inference on all .wav samples in the folder samples.

For detailed usage instructions, run: ./build/bin/whisper-cli -h

Note that the whisper-cli example currently runs only with 16-bit WAV files, so make sure to convert your input before running the tool. For example, you can use ffmpeg like this:

ffmpeg -i input.mp3 -ar 16000 -ac 1 -c:a pcm_s16le output.wav

More audio samples

If you want some extra audio samples to play with, simply run:

make -j samples

This will download a few more audio files from Wikipedia and convert them to 16-bit WAV format via ffmpeg.

You can download and run the other models as follows:

make -j tiny.en
make -j tiny
make -j base.en
make -j base
make -j small.en
make -j small
make -j medium.en
make -j medium
make -j large-v1
make -j large-v2
make -j large-v3
make -j large-v3-turbo

Memory usage

| Model | Disk | Mem | | ------ | ------- | ------- | | tiny | 75 MiB | ~273 MB | | base | 142 MiB | ~388 MB | | small | 466 MiB | ~852 MB | | medium | 1.5 GiB | ~2.1 GB | | large | 2.9 GiB | ~3.9 GB |

POWER VSX Intrinsics

whisper.cpp supports POWER architectures and includes code which significantly speeds operation on Linux running on POWER9/10, making it capable of faster-than-realtime transcription on underclocked Raptor Talos II. Ensure you have a BLAS package installed, and replace the standard cmake setup with:

# build with GGML_BLAS defined
cmake -B build -DGGML_BLAS=1
cmake --build build -j --config Release
./build/bin/whisper-cli [ .. etc .. ]

Quantization

whisper.cpp supports integer quantization of the Whisper ggml models. Quantized models require less memory and disk space and depending on the hardware can be processed more efficiently.

Here are the steps for creating and using a quantized model:

# quantize a model with Q5_0 method
cmake -B build
cmake --build build -j --config Release
./build/bin/quantize models/ggml-base.en.bin models/ggml-base.en-q5_0.bin q5_0

# run the examples as usual, specifying the quantized model file
./build/bin/whisper-cli -m models/ggml-base.en-q5_0.bin ./samples/gb0.wav

Core ML support

On Apple Silicon devices, the Encoder inference can be executed on the Apple Neural Engine (ANE) via Core ML. This can result in significant speed-up - more than x3 faster compared with CPU-only execution. Here are the instructions for generating a Core ML model and using it with whisper.cpp:

  • Install Python dependencies needed for the creation of the Core ML model:

    pip install ane_transformers
    pip install openai-whisper
    pip install coremltools
    
    • To ensure coremltools operates correctly, please confirm that Xcode is installed and execute xcode-select --install to install the command-line tools.
    • Python 3.11 is recommended.
    • MacOS Sonoma (version 14) or newer is recommended, as older versions of MacOS might experience issues with transcription hallucination.
    • [OPTIONAL] It is recommended to utilize a Python version management system, such as Miniconda for this step:
      • To create an environment, use: conda create -n py311-whisper python=3.11 -y
      • To activate the environment, use: conda activate py311-whisper
  • Generate a Core ML model. For example, to generate a base.en model, use:

    ./models/generate-coreml-model.sh base.en
    

    This will generate the folder models/ggml-base.en-encoder.mlmodelc

  • Build whisper.cpp with Core ML support:

    # using CMake
    cmake -B build -DWHISPER_COREML=1
    cmake --build build -j --config Release
    
  • Run the examples as usual. For example:

    $ ./build/bin/whisper-cli -m models/ggml-base.en.bin -f samples/jfk.wav
    
    ...
    
    whisper_init_state: loading Core ML model from 'models/ggml-base.en-encoder.mlmodelc'
    whisper_init_state: first run on a device may take a while ...
    whisper_init_state: Core ML model loaded
    
    system_info: n_threads = 4 / 10 | AVX = 0 | AVX2 = 0 | AVX512 = 0 | FMA = 0 | NEON = 1 | ARM_FMA = 1 | F16C = 0 | FP16_VA = 1 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 0 | VSX = 0 | COREML = 1 |
    
    ...
    

    The first run on a device is slow, since the ANE service compiles the Core ML model to some device-specific format. Next runs are faster.

For more information about the Core ML implementation please refer to PR #566.

OpenVINO support

On platforms that support OpenVINO, the Encoder inference can be executed on OpenVINO-supported devices including x86 CPUs and Intel GPUs (integrated & discrete).

This can result in significant speedup in encoder performance. Here are the instructions for generating the OpenVINO model and using it with whisper.cpp:

  • First, setup python virtual env. and install python dependencies. Python 3.10 is recommended.

    Windows:

    cd models
    python -m venv openvino_conv_env
    openvino_conv_env\Scripts\activate
    python -m pip install --upgrade pip
    pip install -r requirements-openvino.txt
    

    Linux and macOS:

    cd models
    python3 -m venv openvino_conv_env
    source openvino_conv_env/bin/activate
    python -m pip install --upgrade pip
    pip install -r requirements-openvino.txt
    
  • Generate an OpenVINO encoder model. For example, to generate a base.en model, use:

    python convert-whisper-to-openvino.py --model base.en
    

    This will produce ggml-base.en-encoder-openvino.xml/.bin IR model files. It's recommended to relocate these to the same folder as ggml models, as that is the default location that the OpenVINO extension will search at runtime.

  • Build whisper.cpp with OpenVINO support:

    Download OpenVINO package from release page.

View on GitHub
GitHub Stars48.1k
CategoryDevelopment
Updated13m ago
Forks5.4k

Languages

C++

Security Score

100/100

Audited on Mar 29, 2026

No findings