UPIT

Utterance-level Permutation Invariant Training

Generate Convert Improve

Install / Use

/learn @fchest/UPIT

About this skill

Quality Score

0/100

README

uPIT

Utterance-level Permutation Invariant Training

This project is used for uPIT training of two speakers.

We use Tensorflow(1.0) LSTM(BLSTM) to do PIT.

Reference:

Kolbæk, M., Yu, D., Tan, Z.-H., & Jensen, J. (2017). Multi-talker Speech Separation and Tracing with Permutation Invariant Training of Deep Recurrent Neural Networks, 1–10. Retrieved from http://arxiv.org/abs/1703.06284

Adapted from https://github.com/snsun/pit-speech-separation Several Improvements： 1.using the cmvn of STFT as the input features 2.the learning rate can be larger, for example 0.0005

Related Skills

node-connect

352.2k

Diagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps

frontend-design

111.1k

Create distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.

openai-whisper-api

352.2k

Transcribe audio via OpenAI Audio Transcriptions API (Whisper).

qqbot-media

352.2k

QQBot 富媒体收发能力。使用 <qqmedia> 标签，系统根据文件扩展名自动识别类型（图片/语音/视频/文件）。