SkillAgentSearch skills...

AudioTagging

Code and dataset for paper Region-Specific Audio Tagging for Spatial Sound

Install / Use

/learn @KawhiZhao/AudioTagging
About this skill

Quality Score

0/100

Supported Platforms

Universal

README

Audio Tagging

Build the Environment

conda env create -f  pl.yaml

Data Preparation

Download the dataset file through here

For the simulated dataset by SpatialScaper, you can download through here and here.

For the seld dataset, you can download through here.

Tagging with the angular region training

python main.py

The default model is cnn14, you can change the model to eff_net_attention or ast_model.

The default feature combination is LPS + IPD + df, you can try different features by adding --gccphat, --accumulated_df, etc

If you do not want to use df and only want to use spectral and spatial features, you can use --dataset simulated_no_df

If you want to use DCASE seld 2024 T3 dataset, you can use --dataset seld2024

If you want to use salsa feature, you can use --dataset simulated_salsa

Tagging with the distance training

python main_distance.py

Test the model performance

python main_test.py

The ckpt path can be set via --load_path. The pretrained weight can be obtained via here

For evaluating regional tagging model, use dataset simulated with MInterface.

For extending regional tagging model to omni tagging model, use dataset simulated_omni_evaluation with MInterfaceOmni. The dataset can be created using data/create_gt_omni.py for fixed region or data/create_gt_omni_v2.py for location-aware.

Reference

https://github.com/iranroman/SpatialScaper

https://github.com/funcwj/setk

https://github.com/partha2409/DCASE2024_seld_baseline

https://github.com/qiuqiangkong/audioset_tagging_cnn

https://github.com/YuanGongND/ast

https://github.com/YuanGongND/psla

Related Skills

View on GitHub
GitHub Stars4
CategoryProduct
Updated8d ago
Forks0

Languages

Python

Security Score

70/100

Audited on Mar 31, 2026

No findings