Yapykaldi

Yet another PyKaldi

Generate Convert Improve

Install / Use

/learn @tue-robotics-graveyard/Yapykaldi

About this skill

Quality Score

0/100

README

yapykaldi

This repository is under active development

Yet Another PyKaldi

This is a simple Python wrapper around parts of kaldi which is intended to be easy to install and setup with(out) a ROS environment.

The wrappers are generated with pybind11.

Target audience are developers who work with Docker and/or ROS to and would like to use kaldi-asr as the speech recognition system in their application on GNU/Linux operating systems (preferably Ubuntu>=18.04).

Getting Started

Requirements

Python 2.7 or 3.6
numpy
pybind11
setuptools
pkgconfig
pyaudio
future
kaldi-asr or tue-robotics fork of kaldi-asr

Installation

Using `tue-env` (Recommended for members of tue-robotics)

tue-get install python-yapykaldi

This does not install the test scripts and data directory at the moment.

From source

Install dependencies

sudo apt-get install build-essential portaudio19-dev
pip install setuptools numpy pybind11 pkgconfig pyaudio

[Recommended] Install kaldi-asr from tue-robotics fork. This fork has some modifications to the cmake generate script and comes with installation scripts that ensure the pkgconfig file is generated correctly and is available to the bash environment
```
git clone https://github.com/tue-robotics/kaldi.git
cd kaldi
./install.bash --tue
echo "source ~/kaldi/setup.bash" >> ~/.bashrc
```
[Alternative] Install kaldi-asr from the upstream kaldi repository using CMake with -DBUILD_SHARED_LIBS=ON. Create a pkgconfig file in dist/lib/pkgconfig/ (relative to repository root) and add the following to ~/.bashrc
```
if [[ :$PKG_CONFIG_PATH: != *:$KALDI_ROOT/dist/lib/pkgconfig:* ]]
then
    export PKG_CONFIG_PATH=$KALDI_ROOT/dist/lib/pkgconfig${PKG_CONFIG_PATH:+:${PKG_CONFIG_PATH}}
fi
```

Install yapykaldi

git clone https://github.com/ar13pit/yapykaldi
cd yapykaldi
pip install .

[Optional] Download nnet3 models to run examples

cd yapykaldi/data
wget https://github.com/tue-robotics/yapykaldi/releases/download/v0.1.0/kaldi-generic-en-tdnn_fl-r20190609.tar.xz
tar xf kaldi-generic-en-tdnn_fl-r20190609.tar.xz
mv kaldi-generic-en-tdnn_fl-r20190609 kaldi-generic-en-tdnn_fl-latest

Examples

To run the examples the optional step from installation from source needs to be completed.

Test kaldi nnet3 model using test_nnet3.py
Test simple audo recording using test_audio.py
Test continuous live speech recognition using test_live.py

Developer Guide

Basic Workflow

Open Grammar

yapykaldi can be used for both online and offline speech recognition with open grammar.

The idea behind online speech recognition workflow is that a microphone stream (created using pyaudio) is connected to yapykaldi OnlineDecoder object using IPC. The microphone stream writes a stream chunk to the shared queue which is sequentially read by the OnlineDecoder object to generate a stream of recognized words. A signal handler listens for an interrupt signal which tells the OnlineDecoder to stop and finalize the recognition, and signals the microphone process to cleanly close the stream and write the heard data in a wav file. Refer to test_live.py for this workflow.

The offline speech recognition workflow follows two approaches. First is to read the entire wav file and do the recognition over the entire file at once. Second is to create a data stream from the wav file to emulate a microphone and recognize data in chunks. Refer to test_nnet3.py for this workflow.

References

Related Skills

node-connect

351.8k

Diagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps

claude-opus-4-5-migration

110.9k

Migrate prompts and code from Claude Sonnet 4.0, Sonnet 4.5, or Opus 4.1 to Opus 4.5

frontend-design

110.9k

Create distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.

model-usage

351.8k

Use CodexBar CLI local cost usage to summarize per-model usage for Codex or Claude, including the current (most recent) model or a full model breakdown. Trigger when asked for model-level usage/cost data from codexbar, or when you need a scriptable per-model summary from codexbar cost JSON.

tue-robotics-graveyard

View profile

View on GitHub

GitHub Stars7

CategoryDevelopment

Updated1y ago

Forks1

tue-robotics-graveyard/yapykaldi

Languages

Python

Security Score

55/100

Audited on Jul 22, 2024

No findings