DuplexMamba
No description available
Install / Use
/learn @khfs/DuplexMambaREADME
DuplexMamba 
Architecture
<img src="figures/DuplexMamba.png" alt="DuplexMamba" width="80%" align="center"> <img src="figures/duplex_decoding.png" alt="duplex_decoding" width="80%" align="center">Prerequisites
Install Packages
conda create --name DuplexMamba python=3.9
conda activate DuplexMamba
pip install -r requirements.txt
pip install -e src/transformers/
pip install -e src/speechbrain/
You may need to install lower or higher versions of torch, torchaudio, causal-conv1d and mamba-ssm based on your hardware and system. Make sure they are compatible. If the installation of causal-conv1d and mamba-ssm fails, you can manually download the corresponding .whl files from causal-conv1d releases and mamba releases and install them.
Pretrained Model and Checkpoints
-
Download the mamba-2.8b-hf into the
modelfolder, then run:python safetensor2bin.py -
Download the checkpoint of our trained ASR model and the checkpoints for all four stages of the DuplexMamba model from DuplexMamba and save them in the
checkpointsfolder. If you only need the model for inference, you can simply download the Stage 4 checkpoint.
Training
<img src="figures/table1.png" alt="training_data" width="60%" align="center">datasets
- Our training code requires all data to be stored in a format similar to LibriSpeech.
- For the raw data of Stage 1 and Stage 2, you can download LibriSpeech, TED-LIUM, mls_eng_10k, and VoiceAssistant-400K.
- The state discrimination dataset we used can be accessed here.
- For the preprocessed data for Stage 3 and Stage 4, you can download it from here.
Stage1 Multimodal Alignment:
torchrun --nproc-per-node 6 train_stage1.py hparams/S2S/train_stage1.yaml --data_folder <YOUR_PATH_TO_DATASETS> --precision bf16
Stage2 Multimodal Instruction Tuning:
torchrun --nproc-per-node 6 train_stage2.py hparams/S2S/train_stage2.yaml --data_folder <YOUR_PATH_TO_DATASETS> --precision bf16
Stage3 Input State Discrimination:
torchrun --nproc-per-node 6 train_stage3.py hparams/S2S/train_stage3.yaml --data_folder <YOUR_PATH_TO_DATASETS> --precision bf16
Stage4 Streaming Alignment:
torchrun --nproc-per-node 1 train_stage4.py hparams/S2S/train_stage4.yaml --data_folder <YOUR_PATH_TO_DATASETS> --precision bf16
Inference
python CustomGenerator.py duplex/duplex.yaml --precision bf16 --wav_path example/rlhf-57762.flac
We also provide the duplex_voice_assistant() method in the duplex_inference.py script for simulating duplex conversations. You can modify wav_list on line 236 and output_dir on line 239 of the script, then run the following command to start the experiment:
python duplex_inference.py duplex/duplex.yaml --precision bf16
A simple Case
<img src="figures/case_study.png" alt="case" width="75%" align="center">Acknowledgement
We acknowledge the wonderful work of Mamba, Vision Mamba, and ConMamba. We borrowed their implementation of Mamba, bidirectional Mamba, and ConMamba. The training recipes are adapted from SpeechBrain.
Citation
If you find this work helpful, please consider citing:
@misc{lu2025duplexmambaenhancingrealtimespeech,
title={DuplexMamba: Enhancing Real-time Speech Conversations with Duplex and Streaming Capabilities},
author={Xiangyu Lu and Wang Xu and Haoyu Wang and Hongyun Zhou and Haiyan Zhao and Conghui Zhu and Tiejun Zhao and Muyun Yang},
year={2025},
eprint={2502.11123},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2502.11123},
}
License
This project is licensed under the GNU General Public License v3.0. It is based on Mamba-ASR, which is also licensed under the GPL.
Related Skills
node-connect
341.6kDiagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps
frontend-design
84.6kCreate distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.
openai-whisper-api
341.6kTranscribe audio via OpenAI Audio Transcriptions API (Whisper).
commit-push-pr
84.6kCommit, push, and open a PR
