BirdSet

A benchmark dataset collection for bird sound classification

Generate Convert Improve

Install / Use

/learn @DBD-research-group/BirdSet

About this skill

Quality Score

0/100

README

$\texttt{BirdSet}$ - A Large-Scale Dataset for Audio Classification in Avian Bioacoustics 🤗

Deep learning (DL) has greatly advanced audio classification, yet the field is limited by the scarcity of large-scale benchmark datasets that have propelled progress in other domains. While AudioSet aims to bridge this gap as a universal-domain dataset, its restricted accessibility and lack of diverse real-world evaluation use cases challenge its role as the only resource. Additionally, to maximize the potential of cost-effective and minimal-invasive passive acoustic monitoring (PAM), models must analyze bird vocalizations across a wide range of species and environmental conditions. Therefore, we introduce $\texttt{BirdSet}$, a large-scale benchmark dataset for audio classification focusing on avian bioacoustics. $\texttt{BirdSet}$ surpasses AudioSet with over 6,800 recording hours ($\uparrow!17%$) from nearly 10,000 classes ($\uparrow!18\times$) for training and more than 400 hours ($\uparrow!7\times$) across eight strongly labeled evaluation datasets. It serves as a versatile resource for use cases such as multi-label classification, covariate shift or self-supervised learning.

TL;DR

Explore our datasets shared on Hugging Face 🤗 in the BirdSet repository.

Birdset works up until datasets<=3.6.0, we are looking into updates to support the newest version.

This accompanying code provides comprehensive support tool for data preparation, model training, and evaluation.

Participate in our Hugging Face leaderboard by submitting new results and comparing performance across models.

Access our pre-trained model checkpoints on Hugging Face, ready to fine-tune or evaluate for various tasks.

A Q&A section is included at the end of this README. If you have further questions or encounter any issues, please raise an issue. <br>

| | Task | Description | # Train Recordings | # Test_5s Segments | Pielou’s evenness J | # Species | |----------------------------|----------------------------------------------|-----------------|-----------|--------------|-------|----------| | Large Train | XCL | Complete Xeno-Canto snapshot with focals for large (pre-) training. | 528,434 | - | - | 9,734 | | | XCM | Smaller subset of XCL only containing focals of bird species available in test datasets. | 89,798 | - | - | 409 | | Auxiliary | POW | Powdermill Nature soundscape validation dataset and class-dedicated focal training subset of XCL. | 14,911 | 4,560 | 0.66 | 48 | | | VOX | BirdVox-DCASE soundscape background dataset without bird vocalizations. | 20,331 | - | - | - | | Test & Dedicated Train | PER | Amazon Basin soundscape test dataset and class-dedicated focal training subset. | 16,802 | 15,120 | 0.78 | 132 | | Train Subsets XCL! | NES | Columbia Costa Rica soundscape test dataset and class-dedicated focal training subset. | 16,117 | 24,480 | 0.76 | 89 | | | UHH | Hawaiian Islands soundscape test dataset and class-dedicated focal training subset. | 3,626 | 36,637 | 0.64 | 25 | | | HSN | High Sierras Nevada soundscape test dataset and class-dedicated focal training subset. | 5,460 | 12,000 | 0.54 | 21 | | | NBP | NIPS4BPlus test dataset and class-dedicated focal training subset. | 24,327 | 563 | 0.92 | 51 | | | SSW | Sapsucker Woods soundscape test dataset and class-dedicated focal training. | 28,403 | 205,200 | 0.77 | 81 | | | SNE | Sierre Nevada soundscape test dataset and class-dedicated focal training subset. | 19,390 | 23,756 | 0.70 | 56 |

</div>

User Installation 🐣

The simplest way to install $\texttt{BirdSet}$ is to clone this repository and install it as an editable package using conda and pip:

conda create -n birdset python=3.10
pip install -e .

or editable in your own repository:

pip install -e git+https://github.com/DBD-research-group/BirdSet.git#egg=birdset

Examples 🐤

We offer an in-depth tutorial notebook on how to use this repository. In the following, we provide simple code snippets:

Manual Data Preparation

You can manually download the datasets from Hugging Face. We offer a uniform metadata format but also provide flexibility on how to prepare the data (e.g. you can manually decide which events to filter from the training data). The dataset dictionary comes with:

train: Focal instance with variable lengths. Possible detected_events and corresponding event clusters are provided.
test_5s: Processed test datasets where each soundscape instance corresponds to a 5-second clip with a ebird_code_multilabel format.
test: Unprocessed test datasets where each soundscape instance points to the full soundscape recording and the correspoding ebird_code with ground truth start_time and end_time.

from datasets import load_dataset, Audio

# download the dataset 
dataset = load_dataset("DBD-research-group/BirdSet","HSN")

# set HF decoder (decodes the complete file!)
dataset = dataset.cast_column("audio", Audio(sampling_rate=32_000))

The audio column natively contains only file paths. While automatic decoding via HF can be enabled (as shown above), decoding the entire audio files can introduce computational redundancies. This is because we provide flexible event decoding with varying file lengths that are often much longer than the targeted 5 seconds. To optimize, consider using a custom decoding scheme (e.g., with soundfile/BirdSet) or preprocessing the dataset with .map to include only the relevant audio segments.

BirdSet: Data Preparation :bird:

This code snippet utilizes the datamodule for an example dataset $\texttt{HSN}$.

prepare_data

downloads the data (or loads from cache)

preprocesses the data

event_mapping (extract n events from each sample. this could expand the training dataset and provides event timestamps for each sample)

one-hot encoding (classses for multi-label)

create splits

saves dataset to disk (path can be accessed with dm.disk_save_path and loaded with datasets.load_from_disk)

from birdset.configs.datamodule_configs import DatasetConfig, LoadersConfig
from birdset.datamodule.components.transforms import BirdSetTransformsWrapper
from birdset.datamodule.birdset_datamodule import BirdSetDataModule
from datasets import load_from_disk

# initiate the data module
dm = BirdSetDataModule(
    dataset= DatasetConfig(
        data_dir='data_birdset/HSN', # specify your data directory!
        hf_path='DBD-research-group/BirdSet',
        hf_name='HSN',
        n_workers=3,
        val_split=0.2,
        task="multilabel",
        classlimit=500, #limit of samples per class 
        eventlimit=5, #limit of events that are extracted for each sample
        sampling_rate=32000,
    ),
    loaders=LoadersConfig(), # only utilized in setup; default settings
    transforms=BirdSetTransformsWrapper() # set_transform in setup; default setting

Related Skills

proje

Interactive vocabulary learning platform with smart flashcards and spaced repetition for effective language acquisition.

YC-Killer

2.7k

A library of enterprise-grade AI agents designed to democratize artificial intelligence and provide free, open-source alternatives to overvalued Y Combinator startups. If you are excited about democratizing AI access & AI agents, please star ⭐️ this repository and use the link in the readme to join our open source AI research team.

best-practices-researcher

The most comprehensive Claude Code skills registry | Web Search: https://skills-registry-web.vercel.app

groundhog

400

Groundhog's primary purpose is to teach people how Cursor and all these other coding agents work under the hood. If you understand how these coding assistants work from first principles, then you can drive these tools harder (or perhaps make your own!).

DBD-research-group

View profile

View on GitHub

GitHub Stars69

CategoryEducation

Updated15d ago

Forks22

DBD-research-group/BirdSet

Languages

Jupyter Notebook

Security Score

100/100

Audited on Mar 23, 2026

No findings