Yapykaldi
Yet another PyKaldi
Install / Use
/learn @tue-robotics-graveyard/YapykaldiREADME
yapykaldi
This repository is under active development
Yet Another PyKaldi
This is a simple Python wrapper around parts of kaldi which is intended to be easy to install and setup with(out) a ROS environment.
The wrappers are generated with pybind11.
Target audience are developers who work with Docker and/or ROS to and would like to use kaldi-asr as the speech recognition system in their application on GNU/Linux operating systems (preferably Ubuntu>=18.04).
Getting Started
Requirements
- Python 2.7 or 3.6
- numpy
- pybind11
- setuptools
- pkgconfig
- pyaudio
- future
- kaldi-asr or tue-robotics fork of kaldi-asr
Installation
Using tue-env (Recommended for members of tue-robotics)
tue-get install python-yapykaldi
This does not install the test scripts and data directory at the moment.
From source
-
Install dependencies
sudo apt-get install build-essential portaudio19-dev pip install setuptools numpy pybind11 pkgconfig pyaudio -
[Recommended] Install kaldi-asr from tue-robotics fork. This fork has some modifications to the cmake generate script and comes with installation scripts that ensure the pkgconfig file is generated correctly and is available to the bash environment
git clone https://github.com/tue-robotics/kaldi.git cd kaldi ./install.bash --tue echo "source ~/kaldi/setup.bash" >> ~/.bashrc -
[Alternative] Install kaldi-asr from the upstream kaldi repository using CMake with -DBUILD_SHARED_LIBS=ON. Create a pkgconfig file in
dist/lib/pkgconfig/(relative to repository root) and add the following to~/.bashrcif [[ :$PKG_CONFIG_PATH: != *:$KALDI_ROOT/dist/lib/pkgconfig:* ]] then export PKG_CONFIG_PATH=$KALDI_ROOT/dist/lib/pkgconfig${PKG_CONFIG_PATH:+:${PKG_CONFIG_PATH}} fi -
Install yapykaldi
git clone https://github.com/ar13pit/yapykaldi cd yapykaldi pip install . -
[Optional] Download nnet3 models to run examples
cd yapykaldi/data wget https://github.com/tue-robotics/yapykaldi/releases/download/v0.1.0/kaldi-generic-en-tdnn_fl-r20190609.tar.xz tar xf kaldi-generic-en-tdnn_fl-r20190609.tar.xz mv kaldi-generic-en-tdnn_fl-r20190609 kaldi-generic-en-tdnn_fl-latest
Examples
To run the examples the optional step from installation from source needs to be completed.
- Test kaldi nnet3 model using test_nnet3.py
- Test simple audo recording using test_audio.py
- Test continuous live speech recognition using test_live.py
Developer Guide
Basic Workflow
Open Grammar
yapykaldi can be used for both online and offline speech recognition with open grammar.
The idea behind online speech recognition workflow is that a microphone stream (created using pyaudio) is connected to yapykaldi OnlineDecoder object using IPC. The microphone stream writes a stream chunk to the shared queue which is sequentially read by the OnlineDecoder object to generate a stream of recognized words. A signal handler listens for an interrupt signal which tells the OnlineDecoder to stop and finalize the recognition, and signals the microphone process to cleanly close the stream and write the heard data in a wav file. Refer to test_live.py for this workflow.
The offline speech recognition workflow follows two approaches. First is to read the entire wav file and do the recognition over the entire file at once. Second is to create a data stream from the wav file to emulate a microphone and recognize data in chunks. Refer to test_nnet3.py for this workflow.
References
Related Skills
node-connect
351.8kDiagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps
claude-opus-4-5-migration
110.9kMigrate prompts and code from Claude Sonnet 4.0, Sonnet 4.5, or Opus 4.1 to Opus 4.5
frontend-design
110.9kCreate distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.
model-usage
351.8kUse CodexBar CLI local cost usage to summarize per-model usage for Codex or Claude, including the current (most recent) model or a full model breakdown. Trigger when asked for model-level usage/cost data from codexbar, or when you need a scriptable per-model summary from codexbar cost JSON.
