Muse
Muse: Towards Reproducible Long-Form Song Generation with Fine-Grained Style Control
Install / Use
/learn @yuhui1038/MuseREADME
Muse: Towards Reproducible Long-Form Song Generation with Fine-Grained Style Control
<p align="center"> 🎵 <a href="http://113.207.49.217:50000/">Demo</a> • 📄 <a href="https://arxiv.org/abs/2601.03973">Paper</a> • 📊 <a href="https://huggingface.co/datasets/bolshyC/Muse">Dataset</a> • 🤖 <a href="https://huggingface.co/bolshyC/models">Model</a> • 📚 <a href="#citation">Citation</a> </p>This repository is the official repository for "Muse: Towards Reproducible Long-Form Song Generation with Fine-Grained Style Control". In this repository, we provide the Muse model, training and inference scripts, pretrained checkpoints, and evaluation pipelines.
News and Updates
- 2026.02.11 🎵: Demo website is now publicly available!
- 2026.01.11 🔥: We are excited to announce that all datasets and models are now fully open-sourced! 🎶 The complete training dataset (116k songs), pretrained model weights, training and evaluation code, and data pipeline are publicly available.
Installation
Requirements: Python 3.10 is required.
To set up the environment for Muse:
-
For training: Install the training framework:
pip install ms-swift -U -
For inference: Install vLLM:
pip install vllm -
For audio encoding/decoding: Some dependencies (e.g.,
av) require system-level packages. On Ubuntu/Debian, install FFmpeg 4.4+ first:sudo apt-get update sudo apt-get install -y software-properties-common sudo add-apt-repository ppa:savoury1/ffmpeg4 -y sudo apt-get update sudo apt-get install -y pkg-config ffmpeg libavformat-dev libavcodec-dev libavdevice-dev libavutil-dev libswscale-dev libswresample-dev libavfilter-devWe recommend creating a new conda environment with Python 3.10. Note: Since
omegaconf==2.0.6is required and has compatibility issues with pip 24.1+, you need to downgrade pip first:pip install "pip<24.1"Then install dependencies:
pip install --default-timeout=1000 -r requirements_mucodec.txtFor more details, please refer to the MuCodec official repository.
-
For data pipeline and evaluation: If you need to run data processing scripts (lyrics generation, metadata processing) or evaluation scripts, install additional dependencies:
pip install -r requirements_data_eval.txt
Repository Structure
This repository contains the following main directories:
train/: Training scripts and utilities for fine-tuning the Muse model. Seetrain/README.mdfor details.infer/: Inference scripts for generating music with the Muse model. Seeinfer/README.mdfor details.eval_pipeline/: Evaluation scripts for assessing model performance (Mulan-T, PER, AudioBox, SongEval, etc.).data_pipeline/: Scripts for building and processing training data, including lyrics generation, metadata processing, and music generation utilities.
Model Architecture
<p align="center"> <img src="assets/intro.jpg" width="800"/> </p>Acknowledgments
We thank Qwen3 for providing the base language model, ms-swift for the training framework, and MuCodec for discrete audio tokenization.
Citation
If you find our work useful, please cite our paper:
@article{jiang2026muse,
title={Muse: Towards Reproducible Long-Form Song Generation with Fine-Grained Style Control},
author={Jiang, Changhao and Chen, Jiahao and Xiang, Zhenghao and Yang, Zhixiong and Wang, Hanchen and Zhuang, Jiabao and Che, Xinmeng and Sun, Jiajun and Li, Hui and Cao, Yifei and others},
journal={arXiv preprint arXiv:2601.03973},
year={2026}
}
Related Skills
node-connect
353.3kDiagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps
frontend-design
111.7kCreate distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.
openai-whisper-api
353.3kTranscribe audio via OpenAI Audio Transcriptions API (Whisper).
qqbot-media
353.3kQQBot 富媒体收发能力。使用 <qqmedia> 标签,系统根据文件扩展名自动识别类型(图片/语音/视频/文件)。
