MusicBoundariesCNN
Code of the paper "Music Boundary Detection using Convolutional Neural Networks: A comparative analysis of combined input features" in Pytorch
Install / Use
/learn @carlosholivan/MusicBoundariesCNNREADME
MUSIC BOUNDARIES DETECTION (STRUCTURE SEGMENTATION)
Code repository for the paper Music Boundary Detection using Convolutional Neural Networks: A comparative analysis of combined input features. See arXiv preprint Currently being reviewed in the International Journal of Interactive Multimedia and Artifcial Intelligence Journal.
Introduction
Music Structure Segmentation is a research part in Music Information Retrieval (MIR). Since 2009, MIREX's campaigns have been tested this algorithms which are composed by unsupervised and supervised neural networks methods. This methods take as inputs audio features such as MFCCs, chroma vectors or spectrograms, and the well-known self-similarity (lag) matrices SSM or SSLM.
Check SelfSimilarityMatrices repository and see the notebooks to follow the procedure of the inputs (SSLMs) calculation step by step.
<img src="results/song1076_mls.png" width="480">TODOs
- [ ] Fix Training script
- [ ] Fix Evaluation script
- [ ] Make a test script where an audio file is taken and the prediction is given as the output
Package Structure
boundariesdetectioncnn/<br/> module containing VAE architecture, training and data uitilities.
boundariesdetectioncnn/models<br/> CNN models.
boundariesdetectioncnn/data<br/> data handling tools.
boundariesdetectioncnn/train<br/> model training tools.
boundariesdetectioncnn/evaluation<br/> model evaluation tools.
notebooks/<br/> tutorial notebooks.
tests/<br/> unit tests.
Installation
cd .path/to/timbre-vae
python setup.py install
Prerequisites
Python 3.5 or later. In Ubuntu, Mint and Debian Python 3 can be installed like this:
sudo apt-get install python3 python3-pip
sudo pip install librosa
If you use conda/Anaconda environments, librosa can be installed from the conda-forge channel:
conda install -c conda-forge librosa
Databases
Here is a list of the databases that are used in Music Structural Analysis. This model has been trained, evaluated and tested with SALAMI 2.0 dataset.
RCW Database
RCW Goto Annotations: http://staff.aist.go.jp/m.goto/RWC-MDB/AIST-Annotation
RCW Quaero Project Annotations (MIREX10): http://musicdata.gforge.inria.fr/
Beatles Database
Beatles-TUT Annotations: http://www.cs.tut.fi/sgn/arg/paulus/beatles_sections_TUT.zip
Isophonic Beatles or Beatles-ISO Annotations: http://isophonics.net/content/reference-annotations
SALAMI 2.0 Database
2.0 version: https://ddmal.music.mcgill.ca/research/SALAMI/
Webs of Interest
References
| | | |---|---| | [1] | Cohen-Hadria, A., & Peeters, G. (2017, June). Music structure boundaries estimation using multiple self-similarity matrices as input depth of convolutional neural networks. In Audio Engineering Society Conference: 2017 AES International Conference on Semantic Audio. Audio Engineering Society. | | [2] | Grill, T., & Schlüter, J. (2015, October). Music Boundary Detection Using Neural Networks on Combined Features and Two-Level Annotations. In ISMIR (pp. 531-537). | | [3] | Grill, T., & Schluter, J. (2015, August). Music boundary detection using neural networks on spectrograms and self-similarity lag matrices. In 2015 23rd European Signal Processing Conference (EUSIPCO) (pp. 1296-1300). IEEE. | | [4] | Serra, J., Müller, M., Grosche, P., & Arcos, J. L. (2012, July). Unsupervised detection of music boundaries by time series structure features. In Proceedings of the AAAI Conference on Artificial Intelligence (Vol. 26, No. 1). |
Authors
- Carlos Hernández - carloshero@unizar.es
- David Díaz-Guerra - ddga@uniza.es
- José Ramón Beltrán - jrbelbla@unizar.es
Related Skills
YC-Killer
2.7kA library of enterprise-grade AI agents designed to democratize artificial intelligence and provide free, open-source alternatives to overvalued Y Combinator startups. If you are excited about democratizing AI access & AI agents, please star ⭐️ this repository and use the link in the readme to join our open source AI research team.
groundhog
398Groundhog's primary purpose is to teach people how Cursor and all these other coding agents work under the hood. If you understand how these coding assistants work from first principles, then you can drive these tools harder (or perhaps make your own!).
last30days-skill
13.8kAI agent skill that researches any topic across Reddit, X, YouTube, HN, Polymarket, and the web - then synthesizes a grounded summary
000-main-rules
Project Context - Name: Interactive Developer Portfolio - Stack: Next.js (App Router), TypeScript, React, Tailwind CSS, Three.js - Architecture: Component-driven UI with a strict separation of conce
