LiteFocus
[Interspeech 2024] LiteFocus is a tool designed to accelerate diffusion-based TTA model, now implemented with the base model AudioLDM2.
Install / Use
/learn @Yuanshi9815/LiteFocusREADME
LiteFocus
<!-- <img src="assets/LOGO.jpg" height="128px" style="border-radius: 28px;"/> --> <!-- <br> --><a href="https://arxiv.org/abs/2407.10468"><img src="https://img.shields.io/badge/ariXv-2407.10468-A42C25.svg" alt="arXiv"></a> <br>
<!-- </div> -->LiteFocus: Accelerated Diffusion Inference for Long Audio Synthesis <br> Zhenxiong Tan, Xinyin Ma, Gongfan Fang, and Xinchao Wang <br> Learning and Vision Lab, National University of Singapore <br>
TL;DR (Too Long; Didn't Read)
LiteFocus is a tool designed to accelerate diffusion-based TTA model, now implemented with the base model AudioLDM2. It doubles the processing speed and enhances audio quality.
Setup
- Prepare Environment (optional)
conda create -n litefocus python=3.10
conda activate litefocus
- Install Base Model
pip3 install git+https://github.com/haoheliu/AudioLDM2.git
Usage
Basic Usage
from audioldm2 import text_to_audio, build_model
import scipy
+ from litefocus import inject_lite_focus, disable_lite_focus
model = build_model(model_name='audioldm2-full')
+ inject_lite_focus(model)
waveform = text_to_audio(
latent_diffusion=model,
duration=40,
text='Musical constellations twinkling in the night sky, forming a cosmic melody.',
)
scipy.io.wavfile.write("out.wav", rate=16000, data=waveform)
Disable LiteFocus
disable_lite_focus(model)
Configuration
config = {
'same_frequency': True,
'cross_frequency': True,
'sparse_ratio': 0.1
}
inject_lite_focus(model, config)
| Parameter | Description | Default Value |
| ----------------- | ---------------------------------------------------------------------- | ------------- |
| same_frequency | Enables attention to tokens sharing the same-frequency. | True |
| cross_frequency | Enables attention to tokens in cross-frequency compensation. | True |
| sparse_ratio | Specifies the sparsity ratio for cross_frequency. | 0.1 |
To-Do
- [x] AudioLDM2 Integration
- [ ] Diffusers pipeline Integration
Citation
@article{tan2024litefocus,
title={LiteFocus: Accelerated Diffusion Inference for Long Audio Synthesis},
author={Tan, Zhenxiong and Ma, Xinyin and Fang, Gongfan and Wang, Xinchao},
booktitle={Proc. Interspeech 2024},
pages={4878--4882},
year={2024}
}
