Whisper.cpy

Python wrapper for Whisper.cpp

Generate Convert Improve

Install / Use

/learn @fann1993814/Whisper.cpy

About this skill

Quality Score

0/100

README

whisper.cpy

Python wrapper for whisper.cpp

Highlight

Lightweight, using ctypes.CDLL to call functions from the libwhisper shared library.
Migrate the whisper-stream functions to deal with live streaming case for async-processing

Index

Preparing
Usage
- Basic Audio Transcribe and VAD
- Live Streaming
License

Preparing

1. Prepare `whisper.cpp` library

Clone whisper.cpp, then build it

# clone whisper.cpp
git clone https://github.com/ggml-org/whisper.cpp

# nevagation into this folder
cd whisper.cpp/

# checkout the stable version (current supporting)
git checkout v1.8.0

# build whisper.cpp
cmake -B build
cmake --build build --config Release

Download ggml models

# ASR model
sh ./models/download-ggml-model.sh [tiny|base|small|large]

# VAD model
sh ./models/download-vad-model.sh silero-v5.1.2

2. Install `whisper.cpy`

Install from source:

pip install git+https://github.com/fann1993814/whisper.cpy

Usage

Basic Audio Transcribe and Voice Activity Detection

Follow below steps, and trace trancribe.py

1. Share library, model, and testing audio setting

# WHISPER_CPP_PATH is the whisper.cpp project location

audio_wav = f"{WHISPER_CPP_PATH}/samples/jfk.wav"
asr_model_path = f"{WHISPER_CPP_PATH}/models/ggml-tiny.bin"
vad_model_path = f"{WHISPER_CPP_PATH}/models/ggml-silero-v5.1.2.bin"
library_path = f"{WHISPER_CPP_PATH}/build/src/libwhisper.dylib" # Mac: dylib, Linux: so, Win: dll

2. Read testing audio of whisper.cpp

import soundfile as sf

data, sr = sf.read(audio_wav, dtype='float32')

3. Load library and model with whisper.cpy, and transcribe, and get transcript results

from whispercpy import WhisperCPP
from whispercpy.utils import to_timestamp


model = WhisperCPP(library_path, asr_model_path,
                   vad_model_path, use_gpu=True, verbose=False)

print('--------- Lib Version ---------')
print(model.get_version())

# --------- Lib Version ---------
# Ver: 1.8.0

print('--------- VAD Result ----------')

for segment in model.vad(data):
    print(f'[{to_timestamp(segment.t0, False)}' +
          " --> " + f'{to_timestamp(segment.t1, False)}]')

# --------- VAD Result ----------
# [00:00:00.290 --> 00:00:02.210]
# [00:00:03.300 --> 00:00:03.770]
# [00:00:04.000 --> 00:00:04.350]
# [00:00:05.380 --> 00:00:07.650]
# [00:00:08.160 --> 00:00:10.590]

print('--------- ASR Result ----------')

for segment in model.transcribe(data, language='en', beam_size=5, token_timestamps=True):
    print(f'[{to_timestamp(segment.t0, False)}' +
          " --> " + f'{to_timestamp(segment.t1, False)}] ' + segment.text)
    print('--------- Token Info ----------')
    print('\n'.join([f'[{to_timestamp(token.t0, False)}' +
          " --> " + f'{to_timestamp(token.t1, False)}] {token.text}' for token in segment.tokens]))
    print('-------------------------------')

# --------- ASR Result ----------
# [00:00:00.000 --> 00:00:10.400]  And so, my fellow Americans, ask not what your country can do for you, ask what you can do for your country.
# --------- Token Info ----------
# [00:00:00.000 --> 00:00:00.000] [_BEG_]
# [00:00:00.320 --> 00:00:00.320]  And
# [00:00:00.330 --> 00:00:00.530]  so
# [00:00:00.680 --> 00:00:00.740] ,
# [00:00:00.740 --> 00:00:00.950]  my
# [00:00:00.950 --> 00:00:01.590]  fellow
# [00:00:01.590 --> 00:00:02.100]  Americans
# [00:00:02.550 --> 00:00:03.000] ,
# [00:00:03.290 --> 00:00:03.650]  ask
# [00:00:04.010 --> 00:00:04.280]  not
# [00:00:04.650 --> 00:00:05.200]  what
# [00:00:05.410 --> 00:00:05.560]  your
# [00:00:05.650 --> 00:00:06.410]  country
# [00:00:06.410 --> 00:00:06.750]  can
# [00:00:06.750 --> 00:00:06.920]  do
# [00:00:07.010 --> 00:00:07.490]  for
# [00:00:07.490 --> 00:00:07.970]  you
# [00:00:08.170 --> 00:00:08.170] ,
# [00:00:08.190 --> 00:00:08.430]  ask
# [00:00:08.430 --> 00:00:08.750]  what
# [00:00:08.910 --> 00:00:09.040]  you
# [00:00:09.040 --> 00:00:09.350]  can
# [00:00:09.350 --> 00:00:09.500]  do
# [00:00:09.500 --> 00:00:09.710]  for
# [00:00:09.720 --> 00:00:09.980]  your
# [00:00:09.990 --> 00:00:10.350]  country
# [00:00:10.470 --> 00:00:10.500] .
# [00:00:10.500 --> 00:00:10.500] [_TT_525]
# -------------------------------

to_timestamp can translate the time unit from whisper.cpp into a formal repesenation

Live Streaming

Follow below steps, and trace live.py

Note: for realtime inference,

tiny/base/small for cpu
medium/large/large-v2/large-v3 for gpu

1. Load core engine and steaming decoder with library and model

from whispercpy import WhisperCPP, WhisperStream

core = WhisperCPP(lib_path, model_path, use_gpu=False)
asr = WhisperStream(core, language='en', return_token=True)

2. Callback setting

count = 0

def callback(indata, frames, time, status):
    global count

    chunk = indata.copy().tobytes()
    asr.pipe(chunk)
    transcript = asr.get_transcript()
    transcripts = asr.get_transcripts()

    if len(transcripts) > count:
        print("\r"+transcripts[-1].text)
        print('--')
        count += 1
    else:
        print(f"\r{transcript.text}", end="", flush=True)

asr.pipe: a threading function for async to process audio for transcribing continuously
asr.get_transcript: get the current transcirption
asr.get_transcripts: get whole transcirptions

3. Microphone recording setting

import sounddevice as sd
from whispercpy.constant import STREAMING_ENDING

samplerate = 16000
block_duration = 0.25
block_size = int(samplerate * block_duration)
channels = 1

# Recording
try:
    with sd.InputStream(
        samplerate=samplerate,
        channels=channels,
        callback=callback, blocksize=block_size, dtype='float32'):
        print("🎤 Recording for ASR... Press Ctrl+C to stop.")
        while True:
            sd.sleep(1000)
except KeyboardInterrupt:
    print("⏹️ Recording stopped.")
    # send end signal
    asr.pipe(STREAMING_ENDING).join()

# Result
#
# 🎤 Recording for ASR... Press Ctrl+C to stop.
# This is my voice test.
# --
# Can you hear me?
# --
# ^C⏹️ Recording stopped.
# [00:00:00.800 --> 00:00:11.000]  This is my voice test.
# -------------------------------
# [00:00:00.800 --> 00:00:00.800] [_BEG_]
# [00:00:00.800 --> 00:00:01.490]  This
# [00:00:01.850 --> 00:00:01.850]  is
# [00:00:01.890 --> 00:00:02.200]  my
# [00:00:02.200 --> 00:00:03.080]  voice
# [00:00:03.080 --> 00:00:03.710]  test
# [00:00:03.710 --> 00:00:08.290] .
# [00:00:08.300 --> 00:00:11.000] [_TT_150]
# -------------------------------
# [00:00:11.300 --> 00:00:21.500]  Can you hear me?
# -------------------------------
# [00:00:11.300 --> 00:00:11.300] [_BEG_]
# [00:00:11.300 --> 00:00:11.790]  Can
# [00:00:12.050 --> 00:00:12.280]  you
# [00:00:12.280 --> 00:00:12.940]  hear
# [00:00:12.940 --> 00:00:13.070]  me
# [00:00:13.070 --> 00:00:17.960] ?
# [00:00:17.960 --> 00:00:21.500] [_TT_100]
# -------------------------------

STREAMING_ENDING: a singal for stopping transcribing, and use join() for waiting last thread complete.

License

This project follows whisper.cpp license as MIT

Related Skills

node-connect

350.8k

Diagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps

frontend-design

110.4k

Create distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.

openai-whisper-api

350.8k

Transcribe audio via OpenAI Audio Transcriptions API (Whisper).

qqbot-media

350.8k

QQBot 富媒体收发能力。使用 <qqmedia> 标签，系统根据文件扩展名自动识别类型（图片/语音/视频/文件）。

fann1993814

View profile

View on GitHub

GitHub Stars6

CategoryDevelopment

Updated6mo ago

Forks0

fann1993814/whisper.cpy

Languages

Python

Security Score

82/100

Audited on Oct 3, 2025

No findings

Whisper.cpy

Install / Use

README

whisper.cpy

Highlight

Index

Preparing

1. Prepare whisper.cpp library

2. Install whisper.cpy

Usage

Basic Audio Transcribe and Voice Activity Detection

1. Share library, model, and testing audio setting

2. Read testing audio of whisper.cpp

3. Load library and model with whisper.cpy, and transcribe, and get transcript results

Live Streaming

1. Load core engine and steaming decoder with library and model

2. Callback setting

3. Microphone recording setting

License

Related Skills

1. Prepare `whisper.cpp` library

2. Install `whisper.cpy`