SkillAgentSearch skills...

LiteFocus

[Interspeech 2024] LiteFocus is a tool designed to accelerate diffusion-based TTA model, now implemented with the base model AudioLDM2.

Install / Use

/learn @Yuanshi9815/LiteFocus
About this skill

Quality Score

0/100

Category

Design

Supported Platforms

Universal

README

<!-- <div align="center"> -->

LiteFocus

<!-- <img src="assets/LOGO.jpg" height="128px" style="border-radius: 28px;"/> --> <!-- <br> -->

<a href="https://arxiv.org/abs/2407.10468"><img src="https://img.shields.io/badge/ariXv-2407.10468-A42C25.svg" alt="arXiv"></a> <br>

<!-- </div> -->

LiteFocus: Accelerated Diffusion Inference for Long Audio Synthesis <br> Zhenxiong Tan, Xinyin Ma, Gongfan Fang, and Xinchao Wang <br> Learning and Vision Lab, National University of Singapore <br>

TL;DR (Too Long; Didn't Read)

LiteFocus is a tool designed to accelerate diffusion-based TTA model, now implemented with the base model AudioLDM2. It doubles the processing speed and enhances audio quality.

Setup

  • Prepare Environment (optional)
conda create -n litefocus python=3.10
conda activate litefocus
  • Install Base Model
pip3 install git+https://github.com/haoheliu/AudioLDM2.git

Usage

Basic Usage

from audioldm2 import text_to_audio, build_model
import scipy

+ from litefocus import inject_lite_focus, disable_lite_focus

model = build_model(model_name='audioldm2-full')

+ inject_lite_focus(model)

waveform = text_to_audio(
    latent_diffusion=model,
    duration=40,
    text='Musical constellations twinkling in the night sky, forming a cosmic melody.',
)

scipy.io.wavfile.write("out.wav", rate=16000, data=waveform)

Disable LiteFocus

disable_lite_focus(model)

Configuration

config = {
    'same_frequency': True,
    'cross_frequency': True,
    'sparse_ratio': 0.1
}

inject_lite_focus(model, config)

| Parameter | Description | Default Value | | ----------------- | ---------------------------------------------------------------------- | ------------- | | same_frequency | Enables attention to tokens sharing the same-frequency. | True | | cross_frequency | Enables attention to tokens in cross-frequency compensation. | True | | sparse_ratio | Specifies the sparsity ratio for cross_frequency. | 0.1 |

To-Do

  • [x] AudioLDM2 Integration
  • [ ] Diffusers pipeline Integration

Citation

@article{tan2024litefocus,
  title={LiteFocus: Accelerated Diffusion Inference for Long Audio Synthesis},
  author={Tan, Zhenxiong and Ma, Xinyin and Fang, Gongfan and Wang, Xinchao},
  booktitle={Proc. Interspeech 2024},
  pages={4878--4882},
  year={2024}
}
View on GitHub
GitHub Stars34
CategoryDesign
Updated6mo ago
Forks0

Languages

Python

Security Score

87/100

Audited on Sep 24, 2025

No findings