whisper.rn

React Native binding of whisper.cpp.

whisper.cpp: High-performance inference of OpenAI's Whisper automatic speech recognition (ASR) model

Screenshots

| <img src="https://github.com/mybigday/whisper.rn/assets/3001525/2fea7b2d-c911-44fb-9afc-8efc7b594446" width="300" /> | <img src="https://github.com/mybigday/whisper.rn/assets/3001525/a5005a6c-44f7-4db9-95e8-0fd951a2e147" width="300" /> | | :------------------------------------------------------------------------------------------------------------------: | :------------------------------------------------------------------------------------------------------------------: | | iOS: Tested on iPhone 13 Pro Max | Android: Tested on Pixel 6 | | (tiny.en, Core ML enabled, release mode + archive) | (tiny.en, armv8.2-a+fp16, release mode) |

Installation

npm install whisper.rn

iOS

Please re-run npx pod-install again.

By default, whisper.rn will use pre-built rnwhisper.xcframework for iOS. If you want to build from source, please set RNWHISPER_BUILD_FROM_SOURCE to 1 in your Podfile.

If you want to use medium or large model, the Extended Virtual Addressing capability is recommended to enable on iOS project.

Android

Add proguard rule if it's enabled in project (android/app/proguard-rules.pro):

# whisper.rn
-keep class com.rnwhisper.** { *; }

It's recommended to use ndkVersion = "24.0.8215888" (or above) in your root project build configuration for Apple Silicon Macs. Otherwise please follow this trobleshooting issue.

Expo

You will need to prebuild the project before using it. See Expo guide for more details.

Tips & Tricks

The Tips & Tricks document is a collection of tips and tricks for using whisper.rn.

Usage

import { initWhisper } from 'whisper.rn'

const whisperContext = await initWhisper({
  filePath: 'file://.../ggml-tiny.en.bin',
})

const sampleFilePath = 'file://.../sample.wav'
const options = { language: 'en' }
const { stop, promise } = whisperContext.transcribe(sampleFilePath, options)

const { result } = await promise
// result: (The inference text result from audio file)

Voice Activity Detection (VAD)

Voice Activity Detection allows you to detect speech segments in audio data using the Silero VAD model.

Initialize VAD Context

import { initWhisperVad } from 'whisper.rn'

const vadContext = await initWhisperVad({
  filePath: require('./assets/ggml-silero-v6.2.0.bin'), // VAD model file
  useGpu: true, // Use GPU acceleration (iOS only)
  nThreads: 4, // Number of threads for processing
})

Detect Speech Segments

From Audio Files

// Detect speech in audio file (supports same formats as transcribe)
const segments = await vadContext.detectSpeech(require('./assets/audio.wav'), {
  threshold: 0.5, // Speech probability threshold (0.0-1.0)
  minSpeechDurationMs: 250, // Minimum speech duration in ms
  minSilenceDurationMs: 100, // Minimum silence duration in ms
  maxSpeechDurationS: 30, // Maximum speech duration in seconds
  speechPadMs: 30, // Padding around speech segments in ms
  samplesOverlap: 0.1, // Overlap between analysis windows
})

// Also supports:
// - File paths: vadContext.detectSpeech('path/to/audio.wav', options)
// - HTTP URLs: vadContext.detectSpeech('https://example.com/audio.wav', options)
// - Base64 WAV: vadContext.detectSpeech('data:audio/wav;base64,...', options)
// - Assets: vadContext.detectSpeech(require('./assets/audio.wav'), options)

From Raw Audio Data

// Detect speech in base64 encoded float32 PCM data
const segments = await vadContext.detectSpeechData(base64AudioData, {
  threshold: 0.5,
  minSpeechDurationMs: 250,
  minSilenceDurationMs: 100,
  maxSpeechDurationS: 30,
  speechPadMs: 30,
  samplesOverlap: 0.1,
})

Process Results

segments.forEach((segment, index) => {
  console.log(
    `Segment ${index + 1}: ${segment.t0.toFixed(2)}s - ${segment.t1.toFixed(
      2,
    )}s`,
  )
  console.log(`Duration: ${(segment.t1 - segment.t0).toFixed(2)}s`)
})

Release VAD Context

await vadContext.release()
// Or release all VAD contexts
await releaseAllWhisperVad()

Realtime Transcription

The new RealtimeTranscriber provides enhanced realtime transcription with features like Voice Activity Detection (VAD), auto-slicing, and memory management.

// If your RN packager is not enable package exports support, use whisper.rn/src/realtime-transcription
import { RealtimeTranscriber } from 'whisper.rn/realtime-transcription'
import { AudioPcmStreamAdapter } from 'whisper.rn/realtime-transcription/adapters'
import RNFS from 'react-native-fs' // or any compatible filesystem

// Dependencies
const whisperContext = await initWhisper({
  /* ... */
})
const vadContext = await initWhisperVad({
  /* ... */
})
const audioStream = new AudioPcmStreamAdapter() // requires @fugood/react-native-audio-pcm-stream

// Create transcriber
const transcriber = new RealtimeTranscriber(
  { whisperContext, vadContext, audioStream, fs: RNFS },
  {
    audioSliceSec: 30,
    vadPreset: 'default',
    autoSliceOnSpeechEnd: true,
    transcribeOptions: { language: 'en' },
  },
  {
    onTranscribe: (event) => console.log('Transcription:', event.data?.result),
    onVad: (event) => console.log('VAD:', event.type, event.confidence),
    onStatusChange: (isActive) =>
      console.log('Status:', isActive ? 'ACTIVE' : 'INACTIVE'),
    onError: (error) => console.error('Error:', error),
  },
)

// Start/stop transcription
await transcriber.start()
await transcriber.stop()

Dependencies:

@fugood/react-native-audio-pcm-stream for AudioPcmStreamAdapter
Compatible filesystem module (e.g., react-native-fs). See filesystem interface for TypeScript definition

Custom Audio Adapters: You can create custom audio stream adapters by implementing the AudioStreamInterface. This allows integration with different audio sources or custom audio processing pipelines.

Example: See complete example for full implementation including file simulation and UI.

Please visit the Documentation for more details.

Usage with assets

You can also use the model file / audio file from assets:

import { initWhisper } from 'whisper.rn'

const whisperContext = await initWhisper({
  filePath: require('../assets/ggml-tiny.en.bin'),
})

const { stop, promise } = whisperContext.transcribe(
  require('../assets/sample.wav'),
  options,
)

// ...

This requires editing the metro.config.js to support assets:

// ...
const defaultAssetExts = require('metro-config/src/defaults/defaults').assetExts

module.exports = {
  // ...
  resolver: {
    // ...
    assetExts: [
      ...defaultAssetExts,
      'bin', // whisper.rn: ggml model binary
      'mil', // whisper.rn: CoreML model asset
    ],
  },
}

Please note that:

It will significantly increase the size of the app in release mode.
The RN packager is not allowed file size larger than 2GB, so it not able to use original f16 large model (2.9GB), you can use quantized models instead.

Core ML support

Platform: iOS 15.0+, tvOS 15.0+

To use Core ML on iOS, you will need to have the Core ML model files.

The .mlmodelc model files is load depend on the ggml model file path. For example, if your ggml model path is ggml-tiny.en.bin, the Core ML model path will be ggml-tiny.en-encoder.mlmodelc. Please note that the ggml model is still needed as decoder or encoder fallback.

The Core ML models are hosted here: https://huggingface.co/ggerganov/whisper.cpp/tree/main

If you want to download model at runtime, during the host file is archive, you will need to unzip the file to get the .mlmodelc directory, you can use library like react-native-zip-archive, or host those individual files to download yourself.

The .mlmodelc is a directory, usually it includes 5 files (3 required):

[
  'model.mil',
  'coremldata.bin',
  'weights/weight.bin',
  // Not required:
  // 'metadata.json', 'analytics/coremldata.bin',
]

Or just use require to bundle that in your app, like the example app does, but this would increase the app size significantly.

const whisperContext = await initWhisper({
  filePath: require('../assets/ggml-tiny.en.bin')
  coreMLModelAsset:
    Platform.OS === 'ios'
      ? {
          filename: 'ggml-tiny.en-encoder.mlmodelc',
          assets: [
            require('../assets/ggml-tiny.en-encoder.mlmodelc/weights/weight.bin'),
            require('../assets/ggml-tiny.en-encoder.mlmodelc/model.mil'),
            require('../assets/ggml-tiny.en-encoder.

Whisper.rn

Install / Use

README