Muse

Muse: Towards Reproducible Long-Form Song Generation with Fine-Grained Style Control

Generate Convert Improve

Install / Use

/learn @yuhui1038/Muse

About this skill

Quality Score

0/100

README

Muse: Towards Reproducible Long-Form Song Generation with Fine-Grained Style Control

<p align="center"> 🎵 <a href="http://113.207.49.217:50000/">Demo</a> • 📄 <a href="https://arxiv.org/abs/2601.03973">Paper</a> • 📊 <a href="https://huggingface.co/datasets/bolshyC/Muse">Dataset</a> • 🤖 <a href="https://huggingface.co/bolshyC/models">Model</a> • 📚 <a href="#citation">Citation</a> </p>

This repository is the official repository for "Muse: Towards Reproducible Long-Form Song Generation with Fine-Grained Style Control". In this repository, we provide the Muse model, training and inference scripts, pretrained checkpoints, and evaluation pipelines.

News and Updates

2026.02.11 🎵: Demo website is now publicly available!
2026.01.11 🔥: We are excited to announce that all datasets and models are now fully open-sourced! 🎶 The complete training dataset (116k songs), pretrained model weights, training and evaluation code, and data pipeline are publicly available.

Installation

Requirements: Python 3.10 is required.

To set up the environment for Muse:

For training: Install the training framework:
```
pip install ms-swift -U
```
For inference: Install vLLM:
```
pip install vllm
```
For audio encoding/decoding: Some dependencies (e.g., av) require system-level packages. On Ubuntu/Debian, install FFmpeg 4.4+ first:
```
sudo apt-get update
sudo apt-get install -y software-properties-common
sudo add-apt-repository ppa:savoury1/ffmpeg4 -y
sudo apt-get update
sudo apt-get install -y pkg-config ffmpeg libavformat-dev libavcodec-dev libavdevice-dev libavutil-dev libswscale-dev libswresample-dev libavfilter-dev
```
We recommend creating a new conda environment with Python 3.10. Note: Since omegaconf==2.0.6 is required and has compatibility issues with pip 24.1+, you need to downgrade pip first:
```
pip install "pip<24.1"
```
Then install dependencies:
```
pip install --default-timeout=1000 -r requirements_mucodec.txt
```
For more details, please refer to the MuCodec official repository.
For data pipeline and evaluation: If you need to run data processing scripts (lyrics generation, metadata processing) or evaluation scripts, install additional dependencies:
```
pip install -r requirements_data_eval.txt
```

Repository Structure

This repository contains the following main directories:

train/: Training scripts and utilities for fine-tuning the Muse model. See train/README.md for details.
infer/: Inference scripts for generating music with the Muse model. See infer/README.md for details.
eval_pipeline/: Evaluation scripts for assessing model performance (Mulan-T, PER, AudioBox, SongEval, etc.).
data_pipeline/: Scripts for building and processing training data, including lyrics generation, metadata processing, and music generation utilities.

Model Architecture

Acknowledgments

We thank Qwen3 for providing the base language model, ms-swift for the training framework, and MuCodec for discrete audio tokenization.

Citation

If you find our work useful, please cite our paper:

@article{jiang2026muse,
  title={Muse: Towards Reproducible Long-Form Song Generation with Fine-Grained Style Control},
  author={Jiang, Changhao and Chen, Jiahao and Xiang, Zhenghao and Yang, Zhixiong and Wang, Hanchen and Zhuang, Jiabao and Che, Xinmeng and Sun, Jiajun and Li, Hui and Cao, Yifei and others},
  journal={arXiv preprint arXiv:2601.03973},
  year={2026}
}

Related Skills

node-connect

353.3k

Diagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps

frontend-design

111.7k

Create distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.

openai-whisper-api

353.3k

Transcribe audio via OpenAI Audio Transcriptions API (Whisper).

qqbot-media

353.3k

QQBot 富媒体收发能力。使用 <qqmedia> 标签，系统根据文件扩展名自动识别类型（图片/语音/视频/文件）。