AudioTagging
Code and dataset for paper Region-Specific Audio Tagging for Spatial Sound
Install / Use
/learn @KawhiZhao/AudioTaggingREADME
Audio Tagging
Build the Environment
conda env create -f pl.yaml
Data Preparation
Download the dataset file through here
For the simulated dataset by SpatialScaper, you can download through here and here.
For the seld dataset, you can download through here.
Tagging with the angular region training
python main.py
The default model is cnn14, you can change the model to eff_net_attention or ast_model.
The default feature combination is LPS + IPD + df, you can try different features by adding --gccphat, --accumulated_df, etc
If you do not want to use df and only want to use spectral and spatial features, you can use --dataset simulated_no_df
If you want to use DCASE seld 2024 T3 dataset, you can use --dataset seld2024
If you want to use salsa feature, you can use --dataset simulated_salsa
Tagging with the distance training
python main_distance.py
Test the model performance
python main_test.py
The ckpt path can be set via --load_path. The pretrained weight can be obtained via here
For evaluating regional tagging model, use dataset simulated with MInterface.
For extending regional tagging model to omni tagging model, use dataset simulated_omni_evaluation with MInterfaceOmni. The dataset can be created using data/create_gt_omni.py for fixed region or data/create_gt_omni_v2.py for location-aware.
Reference
https://github.com/iranroman/SpatialScaper
https://github.com/funcwj/setk
https://github.com/partha2409/DCASE2024_seld_baseline
https://github.com/qiuqiangkong/audioset_tagging_cnn
https://github.com/YuanGongND/ast
https://github.com/YuanGongND/psla
Related Skills
next
A beautifully designed, floating Pomodoro timer that respects your workspace.
product-manager-skills
50PM skill for Claude Code, Codex, Cursor, and Windsurf: diagnose SaaS metrics, critique PRDs, plan roadmaps, run discovery, and coach PM career transitions.
pm
PM Agent Rule This rule is triggered when the user types `@pm` and activates the Product Manager agent persona.
devplan-mcp-server
3MCP server for generating development plans, project roadmaps, and task breakdowns for Claude Code. Turn project ideas into paint-by-numbers implementation plans.
