SkillAgentSearch skills...

LookOnceToHear

A novel human-interaction method for real-time speech extraction on headphones.

Install / Use

/learn @vb000/LookOnceToHear
About this skill

Quality Score

0/100

Supported Platforms

Universal

README

Look Once to Hear

Ab Gradio demo

This repository provides code for the paper, Look Once to Hear: Target Speech Hearing with Noisy Examples. Look Once to Hear is an intelligent hearable system where users choose to hear a target speaker by just looking at them for a few seconds. This paper won best paper honorable mention 🏆 at CHI 2024.

https://github.com/vb000/LookOnceToHear/assets/16723254/49483e4d-9ebe-4c56-a84e-43c30d1cc3b9

Setup

conda create -n ts-hear python=3.9
conda activate ts-hear
pip install -r requirements.txt

Training

Training data includes clean speech, background sounds, head-related transfer functions (HRTFs) and binaural room impulse responses (BRIRs). We use Scaper toolkit to synthetically generate audio mixtures. Each audio mixture is generated on-the-fly, during training or evaluation, using Scaper's generate_from_jams function on a .jams specification file.

We provide self-contained datasets here, with the source .jams specifications we used for training. To perform a training run, it is sufficient to download the .zip files provided there, unzip the contents to data/ directory and run this command:

python -m src.trainer --config <configs/tsh.json> --run_dir <runs/tsh> [--frac <0.05 (% train/val batches)>]

To resume a partial run:

python -m src.trainer --config <configs/tsh.json> --run_dir <runs/tsh>

Evaluation

Evaluation is done on speech mixture in similar format as training samples. Checkpoints of the embedding model and the target speech hearing (TSH) model are available here.

python -m src.ts_hear_test
View on GitHub
GitHub Stars597
CategoryDevelopment
Updated2d ago
Forks66

Languages

Python

Security Score

80/100

Audited on Apr 1, 2026

No findings