FunMusic

A fundamental toolkit designed for music, song, and audio generation

Generate Convert Improve

Install / Use

/learn @FunAudioLLM/FunMusic

About this skill

Quality Score

0/100

README

GitHub Repo stars Please support our community by starring it 感谢大家支持

<a name="highlight"></a> InspireMusic focuses on music generation, song generation, and audio generation.

A unified toolkit designed for music, song, and audio generation.
Music generation tasks with high audio quality.
Long-form music generation.

Introduction

InspireMusic is a toolkit for music, song, and audio generation. It consists of an autoregressive transformer with a flow-matching based model. This toolkit is for users to generate music, song, and audio. InspireMusic can generate high-quality music in long-form with text-to-music and music continuation. InspireMusic incorporates audio tokenizers with autoregressive transformer and flow-matching modeling to generate music, song, and audio with text and music prompts. The toolkit currently supports music generation.

InspireMusic

<table><tr><td style="text-align:center;"><img alt="Light" src="asset/InspireMusic.png" width="100%" /></tr><tr><td style="text-align:center;"> Figure 1: An overview of the InspireMusic. We introduce InspireMusic, a toolkit designed for music, song, audio generation capable of producing high-quality long-form music. InspireMusic consists of the following three key components. Audio Tokenizers convert the raw audio waveform into discrete audio tokens that can be efficiently processed and trained by the autoregressive transformer model. Audio waveform of lower sampling rate has converted to discrete tokens via a high bitrate compression audio tokenizer<a href="https://openreview.net/forum?id=yBlVlS2Fd9" target="_blank">[1]</a>. Autoregressive Transformer model is based on Qwen2.5<a href="https://arxiv.org/abs/2412.15115" target="_blank">[2]</a> as the backbone model and is trained using a next-token prediction approach on both text and audio tokens, enabling it to generate coherent and contextually relevant token sequences. The audio and text tokens are the inputs of an autoregressive model with the next token prediction to generate tokens. Super-Resolution Flow-Matching Model based on flow modeling method, maps the generated tokens to latent features with high-resolution fine-grained acoustic details<a href="https://arxiv.org/abs/2305.02765" target="_blank">[3]</a> obtained from a higher sampling rate of audio to ensure the acoustic information flow connected with high fidelity through models. A vocoder then generates the final audio waveform from these enhanced latent features. InspireMusic supports a range of tasks including text-to-music, music continuation, music reconstruction and super resolution.. </td></tr></table>

Installation

Clone

Clone the repo

git clone --recursive https://github.com/FunAudioLLM/InspireMusic.git
# If you failed to clone submodule due to network failures, please run the following command until success
cd InspireMusic
git submodule update --recursive
# or you can download the third_party repo Matcha-TTS manually
cd third_party && git clone https://github.com/shivammehta25/Matcha-TTS.git

Install from Source

InspireMusic requires Python>=3.8, PyTorch>=2.0.1, flash attention==2.6.2/2.6.3, CUDA>=11.8. You can install the dependencies with the following commands:

Install Conda: please see https://docs.conda.io/en/latest/miniconda.html
Create Conda env:

conda create -n inspiremusic python=3.8
conda activate inspiremusic
cd InspireMusic
# pynini is required by WeTextProcessing, use conda to install it as it can be executed on all platforms.
conda install -y -c conda-forge pynini==2.1.5
pip install -r requirements.txt -i https://mirrors.aliyun.com/pypi/simple/ --trusted-host=mirrors.aliyun.com
# install flash attention to speedup training
pip install flash-attn --no-build-isolation

Install within the package:

cd InspireMusic
# You can run to install the packages
python setup.py install
pip install flash-attn --no-build-isolation

We also recommend having sox or ffmpeg installed, either through your system or Anaconda:

# # Install sox
# ubuntu
sudo apt-get install sox libsox-dev
# centos
sudo yum install sox sox-devel

# Install ffmpeg
# ubuntu
sudo apt-get install ffmpeg
# centos
sudo yum install ffmpeg

Use Docker

Example command to build a docker image from Dockerfile provided.

docker build -t inspiremusic .

Example command to start the docker container in interactive mode.

docker run -ti --gpus all -v .:/workspace/InspireMusic inspiremusic

Use Docker Compose

Example command to build a docker compose environment and docker image from the docker-compose.yml file.

docker compose up -d --build

Example command to attach to the docker container in interactive mode.

docker exec -ti inspire-music bash

Quick Start

an example command for music generation infer.

cd InspireMusic
mkdir -p pretrained_models

# Download models
# ModelScope
git clone https://www.modelscope.cn/iic/InspireMusic-1.5B-Long.git pretrained_models/InspireMusic-1.5B-Long
# HuggingFace
git clone https://huggingface.co/FunAudioLLM/InspireMusic-1.5B-Long.git pretrained_models/InspireMusic-1.5B-Long

cd examples/music_generation
# run a quick inference example
sh infer_1.5b_long.sh

an example running script to run music generation task.

cd InspireMusic/examples/music_generation/
sh run.sh

Inference

Text-to-music Task

Example script for text-to-music task.

cd examples/music_generation
# with flow matching, use one-line command to get a quick try
python -m inspiremusic.cli.inference

# custom the config like the following one-line command
python -m inspiremusic.cli.inference --task text-to-music -m "InspireMusic-1.5B-Long" -g 0 -t "Experience soothing and sensual instrumental jazz with a touch of Bossa Nova, perfect for a relaxing restaurant or spa ambiance." -c intro -s 0.0 -e 30.0 -r "exp/inspiremusic" -o output -f wav 

# without flow matching, use one-line command to get a quick try
python -m inspiremusic.cli.inference --task text-to-music -g 0 -t "Experience soothing and sensual instrumental jazz with a touch of Bossa Nova, perfect for a relaxing restaurant or spa ambiance." --fast True

from inspiremusic.cli.inference import InspireMusicModel, env_variables
if __name__ == "__main__":
  env_variables()
  model = InspireMusicModel(model_name = "InspireMusic-Base")
  model.inference("text-to-music", "Experience soothing and sensual instrumental jazz with a touch of Bossa Nova, perfect for a relaxing restaurant or spa ambiance.")

Music Continuation Task

Example script for music continuation task.

cd examples/music_generation
# with flow matching
python -m inspiremusic.cli.inference --task continuation -g 0 -a audio_prompt.wav
# without flow matching
python -m inspiremusic.cli.inference --task continuation -g 0 -a audio_prompt.wav --fast True

from inspiremusic.cli.inference import InspireMusicModel
from inspiremusic.cli.inference import env_variables
if __name__ == "__main__":
  env_variables()
  model = InspireMusicModel(model_name = "InspireMusic-Base")
  # just use audio prompt
  model.inference("continuation", None, "audio_prompt.wav")
  # use both text prompt and audio prompt
  model.inference("continuation", "Continue to generate jazz music.", "audio_prompt.wav")

Models

You may download our pretrained InspireMusic models for music generation.

# use git to download models，please make sure git lfs is installed.
mkdir -p pretrained_models
git clone https://www.modelscope.cn/iic/InspireMusic.git pretrained_models/InspireMusic

Available Models

Currently, we open source the music generation models support 24KHz mono and 48KHz stereo audio. The table below presents the links to the ModelScope and Huggingface model hub.

| Model name | Model Links | Remarks

Related Skills

clearshot

Structured screenshot analysis for UI implementation and critique. Analyzes every UI screenshot with a 5×5 spatial grid, full element inventory, and design system extraction — facts and taste together, every time. Escalates to full implementation blueprint when building. Trigger on any digital interface image file (png, jpg, gif, webp — websites, apps, dashboards, mockups, wireframes) or commands like 'analyse this screenshot,' 'rebuild this,' 'match this design,' 'clone this.' Skip for non-UI images (photos, memes, charts) unless the user explicitly wants to build a UI from them. Does NOT trigger on HTML source code, CSS, SVGs, or any code pasted as text.

openpencil

2.0k

The world's first open-source AI-native vector design tool and the first to feature concurrent Agent Teams. Design-as-Code. Turn prompts into UI directly on the live canvas. A modern alternative to Pencil.

ui-ux-designer

Use this agent when you need to design, implement, or improve user interface components and user experience flows. Examples include: creating new pages or components, improving existing UI layouts, implementing responsive designs, optimizing user interactions, building forms or dashboards, analyzing existing UI through browser snapshots, or when you need to ensure UI components follow design system standards and shadcn/ui best practices.\n\n<example>\nContext: User needs to create a new dashboard page for team management.\nuser: "I need to create a team management dashboard where users can view team members, invite new members, and manage roles"\nassistant: "I'll use the ui-ux-designer agent to design and implement this dashboard with proper UX considerations, using shadcn/ui components and our design system tokens."\n</example>\n\n<example>\nContext: User wants to improve the user experience of an existing form.\nuser: "The signup form feels clunky and users are dropping off. Can you improve it?"\nassistant: "Let me use the ui-ux-designer agent to analyze the current form UX and implement improvements using our design system and shadcn/ui components."\n</example>\n\n<example>\nContext: User wants to evaluate and improve existing UI.\nuser: "Can you take a look at our pricing page and see how we can make it more appealing and user-friendly?"\nassistant: "I'll use the ui-ux-designer agent to take a snapshot of the current pricing page, analyze the UX against Notion-inspired design principles, and implement improvements using our design tokens."\n</example>

HappyColorBlend

HappyColorBlendVibe Project Guidelines Project Overview HappyColorBlendVibe is a Figma plugin for color palette generation with advanced tint/shade blending capabilities. It allows designers to

FunAudioLLM

View profile

View on GitHub

GitHub Stars1.3k

CategoryDesign

Updated9h ago

Forks134

FunAudioLLM/FunMusic

Languages

Python

Security Score

100/100

Audited on Apr 3, 2026

No findings