AudioTagging

Code and dataset for paper Region-Specific Audio Tagging for Spatial Sound

Generate Convert Improve

Install / Use

/learn @KawhiZhao/AudioTagging

About this skill

Quality Score

0/100

README

Audio Tagging

Build the Environment

conda env create -f  pl.yaml

Data Preparation

Download the dataset file through here

For the simulated dataset by SpatialScaper, you can download through here and here.

For the seld dataset, you can download through here.

Tagging with the angular region training

python main.py

The default model is cnn14, you can change the model to eff_net_attention or ast_model.

The default feature combination is LPS + IPD + df, you can try different features by adding --gccphat, --accumulated_df, etc

If you do not want to use df and only want to use spectral and spatial features, you can use --dataset simulated_no_df

If you want to use DCASE seld 2024 T3 dataset, you can use --dataset seld2024

If you want to use salsa feature, you can use --dataset simulated_salsa

Tagging with the distance training

python main_distance.py

Test the model performance

python main_test.py

The ckpt path can be set via --load_path. The pretrained weight can be obtained via here

For evaluating regional tagging model, use dataset simulated with MInterface.

For extending regional tagging model to omni tagging model, use dataset simulated_omni_evaluation with MInterfaceOmni. The dataset can be created using data/create_gt_omni.py for fixed region or data/create_gt_omni_v2.py for location-aware.

Reference

https://github.com/iranroman/SpatialScaper

https://github.com/funcwj/setk

https://github.com/partha2409/DCASE2024_seld_baseline

https://github.com/qiuqiangkong/audioset_tagging_cnn

https://github.com/YuanGongND/ast

https://github.com/YuanGongND/psla

Related Skills

A beautifully designed, floating Pomodoro timer that respects your workspace.

product-manager-skills

PM skill for Claude Code, Codex, Cursor, and Windsurf: diagnose SaaS metrics, critique PRDs, plan roadmaps, run discovery, and coach PM career transitions.

PM Agent Rule This rule is triggered when the user types `@pm` and activates the Product Manager agent persona.

devplan-mcp-server

MCP server for generating development plans, project roadmaps, and task breakdowns for Claude Code. Turn project ideas into paint-by-numbers implementation plans.

KawhiZhao

View profile

View on GitHub

GitHub Stars4

CategoryProduct

Updated8d ago

Forks0

KawhiZhao/AudioTagging

Languages

Python

Security Score

70/100

Audited on Mar 31, 2026

No findings