JamendoMaxCaps
JamendoMaxCaps is a large-scale dataset of 362,000 instrumental creative commons tracks
Install / Use
/learn @AMAAI-Lab/JamendoMaxCapsREADME
📌 Overview
JamendoMaxCaps is a large-scale dataset of 362,000 instrumental creative commons tracks sourced from the Jamendo platform. It includes generated music captions and enhanced imputed metadata. We also introduce a retrieval system that leverages both musical features and metadata to identify similar songs, which are then used to fill in missing metadata using a local large language model (LLLM). This dataset supports research in music-language understanding, retrieval, representation learning, and AI-generated music tasks.
✨ Features
✅ 362,000 Instrumental Tracks from Jamendo
✅ State-of-the-Art Music Captions generated using a cutting-edge model
✅ Metadata Imputation using a retrieval-enhanced LLM (Llama-2)
✅ Comprehensive Musical and Metadata Features:
- 🔍 Imputed metadata fields (genre, tempo, mood, instrumentation)
⚡ Installation Guide
git clone https://github.com/AMAAI-Lab/JamendoMaxCaps.git
cd JamendoMaxCaps
conda create -n jamendomaxcaps python=3.10
pip install -r requirements.txt
🚀 Usage
🎼 Extract MERT Features
python extract_mert.py
Ensure input and output folders are correctly configured.
📝 Get Metadata Features
python process_metadata.py
Adjust input and output folder paths accordingly.
🔍 Build Unified Retrieval System
python build_retrival_system.py --weight_audio <weight_audio> --weight_metadata <weight_metadata>
🎶 Find Top Similar Songs
python retrieve_similar_entries.py --config <config_file_path>
🛠️ Run Metadata Imputation
python metadata_imputation.py
📖 Citation
If you use JamendoMaxCaps, please cite:
@article{royjamendomaxcaps2025,
author = {Abhinaba Roy, Renhang Liu, Tongyu Lu, Dorien Herremans},
title = {JamendoMaxCaps: A Large-Scale Music-Caption Dataset with Imputed Metadata},
year = {2025},
journal = {arXiv:2502.07461}
}
🤝 Acknowledgments
JamendoMaxCaps is built upon Creative Commons-licensed music from the Jamendo platform and leverages advanced AI models, including MERT, Flan-T5, and Llama-2. Special thanks to the research community for their invaluable contributions to open-source AI development!
