BirdSet
A benchmark dataset collection for bird sound classification
Install / Use
/learn @DBD-research-group/BirdSetREADME
$\texttt{BirdSet}$ - A Large-Scale Dataset for Audio Classification in Avian Bioacoustics 🤗
<a href="https://huggingface.co/datasets/DBD-research-group/BirdSet"><img alt="Hugging Face" src="https://img.shields.io/badge/HuggingFace-ffcc00?logo=huggingface&logoColor=white"></a>
<a href="https://pytorch.org/get-started/locally/"><img alt="PyTorch" src="https://img.shields.io/badge/PyTorch-ee4c2c?logo=pytorch&logoColor=white"></a>
<a href="https://www.pytorchlightning.ai/"><img alt="PyTorch Lightning" src="https://img.shields.io/badge/PyTorch_Lightning-792ee5?logo=pytorch-lightning&logoColor=white"></a>
<a href="https://hydra.cc/"><img alt="Config: Hydra" src="https://img.shields.io/badge/Config-Hydra-89b8cd"></a>
Deep learning (DL) has greatly advanced audio classification, yet the field is limited by the scarcity of large-scale benchmark datasets that have propelled progress in other domains. While AudioSet aims to bridge this gap as a universal-domain dataset, its restricted accessibility and lack of diverse real-world evaluation use cases challenge its role as the only resource. Additionally, to maximize the potential of cost-effective and minimal-invasive passive acoustic monitoring (PAM), models must analyze bird vocalizations across a wide range of species and environmental conditions. Therefore, we introduce $\texttt{BirdSet}$, a large-scale benchmark dataset for audio classification focusing on avian bioacoustics. $\texttt{BirdSet}$ surpasses AudioSet with over 6,800 recording hours ($\uparrow!17%$) from nearly 10,000 classes ($\uparrow!18\times$) for training and more than 400 hours ($\uparrow!7\times$) across eight strongly labeled evaluation datasets. It serves as a versatile resource for use cases such as multi-label classification, covariate shift or self-supervised learning.
<br> <div align="center"> <img src="https://github.com/DBD-research-group/BirdSet/blob/main/resources/graphical_abstract.png" alt="logo", width=950> </div> <br>TL;DR
<div align="center">
- Explore our datasets shared on Hugging Face 🤗 in the BirdSet repository.
- Birdset works up until datasets<=3.6.0, we are looking into updates to support the newest version.
- This accompanying code provides comprehensive support tool for data preparation, model training, and evaluation.
- Participate in our Hugging Face leaderboard by submitting new results and comparing performance across models.
- Access our pre-trained model checkpoints on Hugging Face, ready to fine-tune or evaluate for various tasks.
- A Q&A section is included at the end of this README. If you have further questions or encounter any issues, please raise an issue. <br>
| | Task | Description | # Train Recordings | # Test_5s Segments | Pielou’s evenness J | # Species | |----------------------------|----------------------------------------------|-----------------|-----------|--------------|-------|----------| | Large Train | XCL | Complete Xeno-Canto snapshot with focals for large (pre-) training. | 528,434 | - | - | 9,734 | | | XCM | Smaller subset of XCL only containing focals of bird species available in test datasets. | 89,798 | - | - | 409 | | Auxiliary | POW | Powdermill Nature soundscape validation dataset and class-dedicated focal training subset of XCL. | 14,911 | 4,560 | 0.66 | 48 | | | VOX | BirdVox-DCASE soundscape background dataset without bird vocalizations. | 20,331 | - | - | - | | Test & Dedicated Train | PER | Amazon Basin soundscape test dataset and class-dedicated focal training subset. | 16,802 | 15,120 | 0.78 | 132 | | Train Subsets XCL! | NES | Columbia Costa Rica soundscape test dataset and class-dedicated focal training subset. | 16,117 | 24,480 | 0.76 | 89 | | | UHH | Hawaiian Islands soundscape test dataset and class-dedicated focal training subset. | 3,626 | 36,637 | 0.64 | 25 | | | HSN | High Sierras Nevada soundscape test dataset and class-dedicated focal training subset. | 5,460 | 12,000 | 0.54 | 21 | | | NBP | NIPS4BPlus test dataset and class-dedicated focal training subset. | 24,327 | 563 | 0.92 | 51 | | | SSW | Sapsucker Woods soundscape test dataset and class-dedicated focal training. | 28,403 | 205,200 | 0.77 | 81 | | | SNE | Sierre Nevada soundscape test dataset and class-dedicated focal training subset. | 19,390 | 23,756 | 0.70 | 56 |
</div>User Installation 🐣
The simplest way to install $\texttt{BirdSet}$ is to clone this repository and install it as an editable package using conda and pip:
conda create -n birdset python=3.10
pip install -e .
or editable in your own repository:
pip install -e git+https://github.com/DBD-research-group/BirdSet.git#egg=birdset
<!--
You can also use the [devcontainer](https://code.visualstudio.com/docs/devcontainers/containers) configured as as git submodule:
```bash
git submodule update --init --recursive
```
Or [poetry](https://python-poetry.org/).
```
poetry install
poetry shell
```
-->
Examples 🐤
We offer an in-depth tutorial notebook on how to use this repository. In the following, we provide simple code snippets:
Manual Data Preparation
You can manually download the datasets from Hugging Face. We offer a uniform metadata format but also provide flexibility on how to prepare the data (e.g. you can manually decide which events to filter from the training data). The dataset dictionary comes with:
train: Focal instance with variable lengths. Possibledetected_eventsand corresponding event clusters are provided.test_5s: Processed test datasets where each soundscape instance corresponds to a 5-second clip with aebird_code_multilabelformat.test: Unprocessed test datasets where each soundscape instance points to the full soundscape recording and the correspodingebird_codewith ground truthstart_timeandend_time.
from datasets import load_dataset, Audio
# download the dataset
dataset = load_dataset("DBD-research-group/BirdSet","HSN")
# set HF decoder (decodes the complete file!)
dataset = dataset.cast_column("audio", Audio(sampling_rate=32_000))
The audio column natively contains only file paths. While automatic decoding via HF can be enabled (as shown above), decoding the entire audio files can introduce computational redundancies. This is because we provide flexible event decoding with varying file lengths that are often much longer than the targeted 5 seconds. To optimize, consider using a custom decoding scheme (e.g., with soundfile/BirdSet) or preprocessing the dataset with .map to include only the relevant audio segments.
BirdSet: Data Preparation :bird:
This code snippet utilizes the datamodule for an example dataset $\texttt{HSN}$.
prepare_data
- downloads the data (or loads from cache)
- preprocesses the data
- event_mapping (extract n events from each sample. this could expand the training dataset and provides event timestamps for each sample)
- one-hot encoding (classses for multi-label)
- create splits
- saves dataset to disk (path can be accessed with
dm.disk_save_pathand loaded withdatasets.load_from_disk)
from birdset.configs.datamodule_configs import DatasetConfig, LoadersConfig
from birdset.datamodule.components.transforms import BirdSetTransformsWrapper
from birdset.datamodule.birdset_datamodule import BirdSetDataModule
from datasets import load_from_disk
# initiate the data module
dm = BirdSetDataModule(
dataset= DatasetConfig(
data_dir='data_birdset/HSN', # specify your data directory!
hf_path='DBD-research-group/BirdSet',
hf_name='HSN',
n_workers=3,
val_split=0.2,
task="multilabel",
classlimit=500, #limit of samples per class
eventlimit=5, #limit of events that are extracted for each sample
sampling_rate=32000,
),
loaders=LoadersConfig(), # only utilized in setup; default settings
transforms=BirdSetTransformsWrapper() # set_transform in setup; default setting
Related Skills
proje
Interactive vocabulary learning platform with smart flashcards and spaced repetition for effective language acquisition.
YC-Killer
2.7kA library of enterprise-grade AI agents designed to democratize artificial intelligence and provide free, open-source alternatives to overvalued Y Combinator startups. If you are excited about democratizing AI access & AI agents, please star ⭐️ this repository and use the link in the readme to join our open source AI research team.
best-practices-researcher
The most comprehensive Claude Code skills registry | Web Search: https://skills-registry-web.vercel.app
groundhog
400Groundhog's primary purpose is to teach people how Cursor and all these other coding agents work under the hood. If you understand how these coding assistants work from first principles, then you can drive these tools harder (or perhaps make your own!).
