MixCycle

This repository contains the audio samples and the source code that accompany the paper: "MixCycle: Unsupervised Speech Separation via Cyclic Mixture Permutation Invariant Training"

Generate Convert Improve

Install / Use

/learn @ertug/MixCycle

About this skill

Quality Score

0/100

README

MixCycle: Unsupervised Speech Separation via Cyclic Mixture Permutation Invariant Training

This repository contains the audio samples and the source code that accompany the paper.

Audio samples

We provide audio samples to demonstrate the results of the MixCycle method on two different datasets: LibriMix and REAL-M.

Also note that the provided REAL-M samples were used in the informal listening test.

We also provide audio samples from the baseline methods on LibriMix: PIT-DM and MixIT.

Source code

We provide the source code under the src directory for reproducibility.

Running the experiments

Prepare the datasets

LibriMix: GitHub
REAL-M: Download

Create the environment

Install Anaconda and run the following command:

$ conda env create -f environment.yml

See more info on how to manage conda environments.

Activate the environment

$ conda activate mixcycle

Run the experiments

$ cd src
$ python experiment.py --librimix-root ~/datasets/librimix --exp-root ~/experiments --run librimix_irm
$ python experiment.py --librimix-root ~/datasets/librimix --exp-root ~/experiments --run librimix_5p
$ python experiment.py --librimix-root ~/datasets/librimix --exp-root ~/experiments --run librimix_100p
$ python experiment.py --librimix-root ~/datasets/librimix --realm-root ~/datasets/REAL-M-v0.1.0 --exp-root ~/experiments --run realm

Optionally, you can monitor the training process with TensorBoard by running:

$ tensorboard --logdir experiments

Citation (BibTeX)

If you find this repository useful, please cite our work:

@article{karamatli2022unsupervised,
  title={MixCycle: Unsupervised Speech Separation via Cyclic Mixture Permutation Invariant Training},
  author={Karamatl{\i}, Ertu{\u{g}} and K{\i}rb{\i}z, Serap},
  journal={IEEE Signal Processing Letters},
  volume={29},
  number={},
  pages={2637-2641},
  year={2022},
  doi={10.1109/LSP.2022.3232276}
}

Related Skills

node-connect

352.2k

Diagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps

frontend-design

111.1k

Create distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.

openai-whisper-api

352.2k

Transcribe audio via OpenAI Audio Transcriptions API (Whisper).

qqbot-media

352.2k

QQBot 富媒体收发能力。使用 <qqmedia> 标签，系统根据文件扩展名自动识别类型（图片/语音/视频/文件）。