SkillAgentSearch skills...

MozartsTouch

Official implementation of Mozart's Touch: A Lightweight Multi-modal Music Generation Framework Based on Pre-Trained Large Models

Install / Use

/learn @TiffanyBlews/MozartsTouch
About this skill

Quality Score

0/100

Supported Platforms

Universal

README

Mozart's Touch: Multi-Modal Music Generation with Pre-Trained Models

arXiv githubio GitHub Stars

This is the official implementation of Mozart's Touch: A Lightweight Multi-modal Music Generation Framework Based on Pre-Trained Large Models (Accepted by AIGC 2024)

Package Description

This repository is structured as follows:

Diancai-Backend
├─MozartsTouch/: source code for the implementation of Mozart's Touch
│  ├─model/: pre-trained models
│  ├─static/: static source for test purpose
│  ├─utils/: source code for the modules
│  ├─download_model.py: download pre-trained model to ./model/
│  ├─config.yaml: configurations such as LLM model URLs, API keys
│  └─main.py: Main program of Mozart's Touch
│ outputs/: directory to store generation result music
├─backend_app.py: program for backend web application of Mozart's Touch
└─start_server.py: start the backend server of Mozart's Touch

Setup

  1. Before running, please configure config.yaml.
  2. Install dependencies using pip install -r requirements.txt.
  3. Run download_model.py to download model parameters needed.
  4. Use MozartsTouch.img_to_music_generate() to generate music.

To test codes without importing large models, set TEST_MODE to True in config.yaml.

Usage

Running as a Command Line Tool

With the setup complete, you can now run the following command to generate music:

python main.py

or debug with no model imported:

python main.py --test_mode

Running as a Web Backend Server

  1. Install dependencies using pip install -r requirements_for_server.txt.
  2. Configure port number and other parameters instart_server.py.
  3. Run python start_server.py.
  4. Access http://localhost:3001/docs#/ to view the backend documentation and test the APIs.

The related frontend project is at https://github.com/ScientificW/MozartFrontEndConnect

TO-DO List

  • ~~增加用户输入提示词功能~~
  • 删除API中的mode
  • ~~更新到最新的代码,将 Video-BLIP2 整合到我们的项目中。~~
  • 将评估代码整合进来
  • ~~Use argparse to set and pass config~~
  • ~~MusicGen部分重构策略模式~~
  • ~~Use API instead of loading models manually~~
  • ~~Add support for other models as an alternative e.g. LLaMa.~~

远期任务

  • 尝试Florence-2等最新模型
  • 优化音乐生成部分MusicGen模型的代码(主要需求:优化生成效率)

Citation

@inproceedings{10.1117/12.3067408,
author = {Jiajun Li and Tianze Xu and Xuesong Chen and Xinrui Yao and Jingchou Han and Shuchang Liu},
title = {{Mozart’s Touch: a lightweight multimodal music generation framework based on pre-trained large models}},
volume = {13649},
booktitle = {International Conference on AI-Generated Content (AIGC 2024)},
editor = {Feng Zhao and Duoqian Miao},
organization = {International Society for Optics and Photonics},
publisher = {SPIE},
pages = {136490R},
keywords = {AIGC, Multi-modal, Neural Network, Large Language Model, Music Generation},
year = {2025},
doi = {10.1117/12.3067408},
URL = {https://doi.org/10.1117/12.3067408}
}
View on GitHub
GitHub Stars43
CategoryDevelopment
Updated17d ago
Forks8

Languages

Python

Security Score

90/100

Audited on Mar 17, 2026

No findings