Bregmisi

Phase recovery with the Bregman divergence for audio source separation

Generate Convert Improve

Install / Use

/learn @magronp/Bregmisi

About this skill

Quality Score

0/100

README

Phase recovery with Bregman divergences for audio source separation

This repository contains the code for reproducing the experiments in our paper entitled Phase recovery with the Bregman divergence for audio source separation, published at the IEEE International Conference on Audio, Speech and Signal Processing (ICASSP) 2021.

Getting the data

After cloning or downloading this repository, you will need to get the speech and noise data to reproduce the results.

The speech data is obtained from the VoiceBank dataset available here. You should download the clean_testset_wav.zip file, and unzip it in the data/VoiceBank/ folder. Note that you can change the folder structure, as long as you change the path accordingly in the code.
The noise data is obtained from the DEMAND dataset available here. You should download the DLIVING_16k.zip, SPSQUARE_16k.zip and TBUS_16k.zip files, and unzip them in the data/DEMAND/ folder.

Note that you can change the folder structures, as long as you change the speech and noise directory paths accordingly in the code.

Then, simply execute the prepare_data.py script to create the noisy mixtures.

Getting the pre-trained model

To run the experiments, you will need to first estimate the spectrograms of the sources, which is done using the pytorch implementation of the Open Unmix model trained for a speech enhancement task.

The pre-trained model for estimating the speech and noise spectrograms is available here. You should place the .json and .pth files in the open_unmx/ folder. Note that you should also rename the .pth files simply as speech.pth and noise.pth.

Reproducing the experiments

Now that you're all set, simply run the following scripts:

validation.py will perform a grid search over the gradient step size on the validation subset to determine its optimal value for every setting. It will also reproduce Fig. 1 from the paper.
testing.py will run the algorithms (proposed gradient descent and MISI) on the test subset and plot the results corresponding to Fig. 2 in the paper.

Reference

<details><summary>If you use any of this code for your research, please cite our paper:</summary>

@inproceedings{Magron2021,  
  author={P. Magron and P.-H. Vial and T. Oberlin and C. F{\'e}votte},  
  title={Phase recovery with {B}regman divergences for audio source separation},  
  booktitle={Proc. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)},  
  year={2021},
  month={June}
}

</p> </details>

Related Skills

node-connect

352.0k

Diagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps

frontend-design

111.1k

Create distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.

openai-whisper-api

352.0k

Transcribe audio via OpenAI Audio Transcriptions API (Whisper).

qqbot-media

352.0k

QQBot 富媒体收发能力。使用 <qqmedia> 标签，系统根据文件扩展名自动识别类型（图片/语音/视频/文件）。