MuseDiffusion

YAI 11 x @POZAlabs : Music generation & modification from Unclear midi SEquence with Diffusion model

Generate Convert Improve

Install / Use

/learn @YAIxPOZAlabs/MuseDiffusion

About this skill

Quality Score

0/100

README

<p align="center"><a href="#"> <img width="100%" height="100%" src="https://capsule-render.vercel.app/api?type=waving&color=0:B993D6,100:8CA6DB&height=220&section=header&fontSize=40&fontColor=ffffff&animation=fadeIn&fontAlignY=40&text=%E2%97%A6%20%CB%9A%20%20%EF%BC%AD%20%EF%BD%95%EF%BD%93%EF%BD%85%20%EF%BC%A4%EF%BD%89%EF%BD%86%EF%BD%86%EF%BD%95%EF%BD%93%EF%BD%89%EF%BD%8F%EF%BD%8E%20%CB%9A%20%E2%97%A6" alt="header" /> </a></p> <h3 align="center">Music generation from Unclear midi SEquence with Diffusion model</h3> <p align="center"><a href="https://github.com/YAIxPOZAlabs"><img src="assets/figure00_logo.png" width=50% height=50% alt="logo"></a></p> <p align="center">This project was carried out by <b><a href="https://github.com/yonsei-YAI">YAI 11th</a></b>, in cooperation with <b><a href="https://github.com/POZAlabs">POZAlabs</a></b>.</p> <p align="center"> <br> <a href="mailto:dhakim@yonsei.ac.kr"> <img src="https://img.shields.io/badge/-Gmail-D14836?style=flat-square&logo=gmail&logoColor=white" alt="Gmail"/> </a> <a href="https://dhakim.notion.site/1e7dc19fd1064e698a389f75404883c7"> <img src="https://img.shields.io/badge/-Project%20Page-000000?style=flat-square&logo=notion&logoColor=white" alt="NOTION"/> </a> <a href="./README.pdf"> <img src="https://img.shields.io/badge/-Full%20Report-dddddd?style=flat-square&logo=latex&logoColor=black" alt="REPORT"/> </a> </p> <p align="center"> <br> <a href="./README.pdf"> 🔎 For the details, please refer to <b>Project Full Report</b>. </a> </p> <br> <hr>  <h3 align="center"><br>✨  Contributors  ✨<br><br></h3> <p align="center"> <b>🛠️ <a href="https://github.com/kdha0727">KIM DONGHA</a></b>  :  YAI 8th  /  AI Dev Lead      <br> <b>       🚀 <a href="https://github.com/ta3h30nk1m">KIM TAEHEON</a></b>  :  YAI 10th  /  AI Research & Dev <br> <b>👑 <a href="https://github.com/san9min">LEE SANGMIN</a></b>  :  YAI 9th  /  Team Leader      <br>  <b>🐋 <a href="https://github.com/Tim3s">LEE SEUNGJAE</a></b>  :  YAI 9th  /  AI Research Lead <br> <b>🌈 <a href="https://github.com/jeongwoo1213">CHOI JEONGWOO</a></b>  :  YAI 10th  /  AI Research & Dev <br> <b>🌟 <a href="https://github.com/starwh03">CHOI WOOHYEON</a></b>  :  YAI 10th  /  AI Research & Dev <br> <br><br> <hr> <h3 align="center"><br>🎼 Generated Samples 🎵<br><br></h3> <div align="center">

https://user-images.githubusercontent.com/61076953/223916377-c4c317b5-66dc-49a0-b42a-20132f638128.mp4

</div>

<h2> How to run</h2> <h3>0. Clone repository and cd</h3>

git clone https://github.com/YAIxPOZAlabs/MuseDiffusion.git
cd MuseDiffusion

<br> <h3>1. Prepare environment and data</h3> <h4>Set environment with python 3.8 and install pytorch</h4>

python3 -m pip install virtualenv && \
python3 -m virtualenv venv --python=python3.8 && \
source venv/bin/activate && \
pip3 install -r requirements.txt

<details> <summary>(Optional) If required, install python 3.8 for venv usage.</summary>  

sudo apt update && \
sudo apt install -y software-properties-common && \
sudo add-apt-repository -y ppa:deadsnakes/ppa && \
sudo apt install -y python3.8 python3.8-distutils

</details> <details> <summary>(Optional) If anaconda is available, you can set environments by anaconda instead of given code.</summary>  

conda create -n MuseDiffusion python=3.8 pip wheel
conda activate MuseDiffusion
pip3 install -r requirements.txt

</details> <details> <summary>(Recommended) <b>If docker is available, use Dockerfile instead</b>.</summary>  

docker build -f Dockerfile -t musediffusion:v1 .

</details> <br> <h3>2. Download and Preprocess dataset</h3>

python3 -m MuseDiffusion dataprep

If you want to use custom commu-like dataset, make dataset to npy files (refer to this issue) and preprocess it by this command.

python3 -m MuseDiffusion dataprep --data_dir path/to/dataset

<details> <summary>After this step, your directory structure would be like:</summary>  

MuseDiffusion
├── MuseDiffusion
│   ├── __init__.py
│   ├── config
│   │   ├── __init__.py
│   │   ├── __main__.py
│   │   └── base.py
│   ├── data
│   │   ├── __init__.py
│   │   ├── __main__.py
│   │   ├── corruption.py
│   │   └── ...
│   ├── models
│   │   ├── __init__.py
│   │   ├── denoising_model.py
│   │   ├── gaussian_diffusion.py
│   │   ├── nn.py
│   │   └── ...
│   ├── run
│   │   ├── __init__.py
│   │   ├── sample_generation.py
│   │   ├── sample_seq2seq.py
│   │   └── train.py
│   └── utils
│       ├── __init__.py
│       ├── decode_util.py
│       ├── dist_util.py
│       ├── train_util.py
│       └── ...
├── assets
│   └── (files for readme...)
├── commu
│   └── (same code as https://github.com/POZAlabs/ComMU-code/blob/master/commu/)
├── datasets
│   └── ComMU-processed
│       └── (preprocessed commu dataset files...)
├── scripts
│   ├── run_train.sh
│   ├── sample_seq2seq.sh
│   └── sample_generation.sh
├── README.md
└── requirements.txt

</details> <br> <h3>3. Prepare model weight and configuration</h3> <h4>With downloading pretrained one</h4>

mkdir diffusion_models
mkdir diffusion_models/pretrained_weights
cd diffusion_models/pretrained_weights
wget https://github.com/YAIxPOZAlabs/MuseDiffusion/releases/download/1.0.0/pretrained_weights.zip
unzip pretrained_weights.zip && rm pretrained_weights.zip
cd ../..

<h4>With Manual Training</h4>

python3 -m MuseDiffusion train --distributed

<details> <summary>How to customize arguments</summary> <h5>  Method 1: Using JSON Config File</h5>

With --config_json train_cfg.json required arguments above will be automatically loaded.

# Copy config file to root directory
python3 -c "from MuseDiffusion.config import TrainSettings as T; print(T().json(indent=2))" \
>> train_cfg.json

# Customize config on your own
vi train_cfg.json

# Run training script
python3 -m MuseDiffusion train --distributed --config_json train_cfg.json

<h5>  Method 2: Using Arguments</h5>

Add your arguments refer to python3 -m MuseDiffusion train --help.

Refer to example below:

python3 -m MuseDiffusion train --distributed \
--lr 0.0001 \
--batch_size 2048 \
--microbatch 64 \
--learning_steps 320000 \
--log_interval 20 \
--save_interval 1000 \
--eval_interval 500 \
--ema_rate 0.5,0.9,0.99 \
--seed 102 \
--diffusion_steps 2000 \
--schedule_sampler lossaware \
--noise_schedule sqrt \
--seq_len 2096 \
--pretrained_denoiser diffuseq.pt \
--pretrained_embedding pozalabs_embedding.pt \
--freeze_embedding false \
--use_bucketing true \
--dataset ComMU \
--data_dir datasets/ComMU-processed \
--data_loader_workers 4 \
--use_corruption true \
--corr_available mt,mn,rn,rr \
--corr_max 4 \
--corr_p 0.5 \
--corr_kwargs "{'p':0.4}" \
--hidden_t_dim 128 \
--hidden_dim 128 \
--dropout 0.4 \
--weight_decay 0.1 \
--gradient_clipping -1.0

</details> <details> <summary>With regard to <b><u>--distributed</u></b> argument (torch.distributed runner)</summary> <h5>  Arguments related with torch.distributed:</h5>

Argument --distributed will run python -m MuseDiffusion train with torch.distributed runner
- you can customize options, or environs.
commandline option --nproc_per_node - number of training node (GPU) to use.
- default: number of GPU in CUDA_VISIBLE_DEVICES environ.
commandline option --master_port - master port for distributed learning.
- default: will automatically be found if available, otherwise 12233
environ CUDA_VISIBLE_DEVICES - specific GPU index. e.g: CUDA_VISIBLE_DEVICES=4,5,6,7
- default: not set - in this case, trainer will use all available GPUs.
environ OPT_NUM_THREADS - number of threads for each node.
- default: will automatically be set to $CPU_CORE / / $TOTAL_GPU
In windows, torch.distributed is disabled in default. to enable, edit USE_DIST_IN_WINDOWS flag in MuseDiffusion/utils/dist_util.py.

Refer to example below:

CUDA_VISIBLE_DEVICES=4,5,6,7 python3 -m MuseDiffusion train --distributed --master_port 12233

</details>

After training, weights and configs will be saved into ./diffusion_models/{name-of-model-folder}/.

<br> <h3>4. Sample with model - Modify or Generate Midi!</h3> <h4>From corrupted samples</h4>

python3 -m MuseDiffusion modification --distributed \
--use_corruption True \
--corr_available rn,rr \
--corr_max 2 \
--corr_p 0.5 \
--step 500 \
--strength 0.75 \
--model_path ./diffusion_models/{name-of-model-folder}/{weight-file}

You can use arguments for torch.distributed, which is same as training script.
Type python3 -m MuseDiffusion modification --help for detailed usage.
You can omit --model_path argument, if you want to use pretrained weights.

<h4>From metadata</h4>

python3 -m MuseDiffusion generation --distributed \
--bpm {BPM} \
--audio_key {AUDIO_KEY} \
--time_signature {TIME_SIGNATURE} \
--pitch_range {PITCH_RANGE} \
--num_measures {NUM_MEASURES} \
--inst {INST} \
--genre {GENRE} \
--min_velocity {MIN_VELOCITY} \
--max_velocity {MAX_VELOCITY} \
--track_role {TRACK_ROLE} \
--rhythm {RHYTHM} \
--chord_progression {CHORD_PROGRESSION} \
--num_samples 1000 \
--step 500 \
--model_path diffusion_models/{name-of-model-folder}/{weight-file}

In generation, MidiMeta arguments (bpm, audio_key, ..., chord_progression) are essential.
You can use arguments for torch.distributed, which is same as training script.
Type python3 -m MuseDiffusion generation --help for detailed usage.
**You can omit --model_path argument, if yo

Related Skills

node-connect

348.5k

Diagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps

frontend-design

109.1k

Create distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.

openai-whisper-api

348.5k

Transcribe audio via OpenAI Audio Transcriptions API (Whisper).

qqbot-media

348.5k

QQBot 富媒体收发能力。使用 <qqmedia> 标签，系统根据文件扩展名自动识别类型（图片/语音/视频/文件）。