SkillAgentSearch skills...

BirdSet

A benchmark dataset collection for bird sound classification

Install / Use

/learn @DBD-research-group/BirdSet
About this skill

Quality Score

0/100

Supported Platforms

Universal

README

<div align="center"> <img src="https://github.com/DBD-research-group/BirdSet/blob/main/resources/perch/birdsetsymbol.png" alt="logo" width="100"> </div>

$\texttt{BirdSet}$ - A Large-Scale Dataset for Audio Classification in Avian Bioacoustics 🤗

python <a href="https://huggingface.co/datasets/DBD-research-group/BirdSet"><img alt="Hugging Face" src="https://img.shields.io/badge/HuggingFace-ffcc00?logo=huggingface&logoColor=white"></a> <a href="https://pytorch.org/get-started/locally/"><img alt="PyTorch" src="https://img.shields.io/badge/PyTorch-ee4c2c?logo=pytorch&logoColor=white"></a> <a href="https://www.pytorchlightning.ai/"><img alt="PyTorch Lightning" src="https://img.shields.io/badge/PyTorch_Lightning-792ee5?logo=pytorch-lightning&logoColor=white"></a> <a href="https://hydra.cc/"><img alt="Config: Hydra" src="https://img.shields.io/badge/Config-Hydra-89b8cd"></a> arXiv paper

Deep learning (DL) has greatly advanced audio classification, yet the field is limited by the scarcity of large-scale benchmark datasets that have propelled progress in other domains. While AudioSet aims to bridge this gap as a universal-domain dataset, its restricted accessibility and lack of diverse real-world evaluation use cases challenge its role as the only resource. Additionally, to maximize the potential of cost-effective and minimal-invasive passive acoustic monitoring (PAM), models must analyze bird vocalizations across a wide range of species and environmental conditions. Therefore, we introduce $\texttt{BirdSet}$, a large-scale benchmark dataset for audio classification focusing on avian bioacoustics. $\texttt{BirdSet}$ surpasses AudioSet with over 6,800 recording hours ($\uparrow!17%$) from nearly 10,000 classes ($\uparrow!18\times$) for training and more than 400 hours ($\uparrow!7\times$) across eight strongly labeled evaluation datasets. It serves as a versatile resource for use cases such as multi-label classification, covariate shift or self-supervised learning.

<br> <div align="center"> <img src="https://github.com/DBD-research-group/BirdSet/blob/main/resources/graphical_abstract.png" alt="logo", width=950> </div> <br>

TL;DR

  • Explore our datasets shared on Hugging Face 🤗 in the BirdSet repository.
  • Birdset works up until datasets<=3.6.0, we are looking into updates to support the newest version.
  • This accompanying code provides comprehensive support tool for data preparation, model training, and evaluation.
  • Participate in our Hugging Face leaderboard by submitting new results and comparing performance across models.
  • Access our pre-trained model checkpoints on Hugging Face, ready to fine-tune or evaluate for various tasks.
  • A Q&A section is included at the end of this README. If you have further questions or encounter any issues, please raise an issue. <br>
<div align="center">

| | Task | Description | # Train Recordings | # Test_5s Segments | Pielou’s evenness J | # Species | |----------------------------|----------------------------------------------|-----------------|-----------|--------------|-------|----------| | Large Train | XCL | Complete Xeno-Canto snapshot with focals for large (pre-) training. | 528,434 | - | - | 9,734 | | | XCM | Smaller subset of XCL only containing focals of bird species available in test datasets. | 89,798 | - | - | 409 | | Auxiliary | POW | Powdermill Nature soundscape validation dataset and class-dedicated focal training subset of XCL. | 14,911 | 4,560 | 0.66 | 48 | | | VOX | BirdVox-DCASE soundscape background dataset without bird vocalizations. | 20,331 | - | - | - | | Test & Dedicated Train | PER | Amazon Basin soundscape test dataset and class-dedicated focal training subset. | 16,802 | 15,120 | 0.78 | 132 | | Train Subsets XCL! | NES | Columbia Costa Rica soundscape test dataset and class-dedicated focal training subset. | 16,117 | 24,480 | 0.76 | 89 | | | UHH | Hawaiian Islands soundscape test dataset and class-dedicated focal training subset. | 3,626 | 36,637 | 0.64 | 25 | | | HSN | High Sierras Nevada soundscape test dataset and class-dedicated focal training subset. | 5,460 | 12,000 | 0.54 | 21 | | | NBP | NIPS4BPlus test dataset and class-dedicated focal training subset. | 24,327 | 563 | 0.92 | 51 | | | SSW | Sapsucker Woods soundscape test dataset and class-dedicated focal training. | 28,403 | 205,200 | 0.77 | 81 | | | SNE | Sierre Nevada soundscape test dataset and class-dedicated focal training subset. | 19,390 | 23,756 | 0.70 | 56 |

</div>

User Installation 🐣

The simplest way to install $\texttt{BirdSet}$ is to clone this repository and install it as an editable package using conda and pip:

conda create -n birdset python=3.10
pip install -e .

or editable in your own repository:

pip install -e git+https://github.com/DBD-research-group/BirdSet.git#egg=birdset
<!-- You can also use the [devcontainer](https://code.visualstudio.com/docs/devcontainers/containers) configured as as git submodule: ```bash git submodule update --init --recursive ``` Or [poetry](https://python-poetry.org/). ``` poetry install poetry shell ``` -->

Examples 🐤

We offer an in-depth tutorial notebook on how to use this repository. In the following, we provide simple code snippets:

Manual Data Preparation

You can manually download the datasets from Hugging Face. We offer a uniform metadata format but also provide flexibility on how to prepare the data (e.g. you can manually decide which events to filter from the training data). The dataset dictionary comes with:

  • train: Focal instance with variable lengths. Possible detected_events and corresponding event clusters are provided.
  • test_5s: Processed test datasets where each soundscape instance corresponds to a 5-second clip with a ebird_code_multilabel format.
  • test: Unprocessed test datasets where each soundscape instance points to the full soundscape recording and the correspoding ebird_code with ground truth start_time and end_time.
from datasets import load_dataset, Audio

# download the dataset 
dataset = load_dataset("DBD-research-group/BirdSet","HSN")

# set HF decoder (decodes the complete file!)
dataset = dataset.cast_column("audio", Audio(sampling_rate=32_000))

The audio column natively contains only file paths. While automatic decoding via HF can be enabled (as shown above), decoding the entire audio files can introduce computational redundancies. This is because we provide flexible event decoding with varying file lengths that are often much longer than the targeted 5 seconds. To optimize, consider using a custom decoding scheme (e.g., with soundfile/BirdSet) or preprocessing the dataset with .map to include only the relevant audio segments.

BirdSet: Data Preparation :bird:

This code snippet utilizes the datamodule for an example dataset $\texttt{HSN}$.

prepare_data

  • downloads the data (or loads from cache)
  • preprocesses the data
    • event_mapping (extract n events from each sample. this could expand the training dataset and provides event timestamps for each sample)
    • one-hot encoding (classses for multi-label)
    • create splits
  • saves dataset to disk (path can be accessed with dm.disk_save_path and loaded with datasets.load_from_disk)
from birdset.configs.datamodule_configs import DatasetConfig, LoadersConfig
from birdset.datamodule.components.transforms import BirdSetTransformsWrapper
from birdset.datamodule.birdset_datamodule import BirdSetDataModule
from datasets import load_from_disk

# initiate the data module
dm = BirdSetDataModule(
    dataset= DatasetConfig(
        data_dir='data_birdset/HSN', # specify your data directory!
        hf_path='DBD-research-group/BirdSet',
        hf_name='HSN',
        n_workers=3,
        val_split=0.2,
        task="multilabel",
        classlimit=500, #limit of samples per class 
        eventlimit=5, #limit of events that are extracted for each sample
        sampling_rate=32000,
    ),
    loaders=LoadersConfig(), # only utilized in setup; default settings
    transforms=BirdSetTransformsWrapper() # set_transform in setup; default setting

Related Skills

View on GitHub
GitHub Stars69
CategoryEducation
Updated15d ago
Forks22

Languages

Jupyter Notebook

Security Score

100/100

Audited on Mar 23, 2026

No findings