PyHa
A repo designed to convert audio-based "weak" labels to "strong" intraclip labels. Provides a pipeline to compare automated moment-to-moment labels to human labels. Methods range from DSP based foreground-background separation, cross-correlation based template matching, as well as bird presence sound event detection deep learning models!
Install / Use
/learn @UCSD-E4E/PyHaREADME
PyHa
<!-- ## Automated Audio Labeling System -->A tool designed to convert audio-based "weak" labels to "strong" moment-to-moment labels. Provides a pipeline to compare automated moment-to-moment labels to human labels. Current proof of concept work being fulfilled on Bird Audio clips using Microfaune predictions.
This package is being developed and maintained by the Engineers for Exploration Acoustic Species Identification Team in collaboration with the San Diego Zoo Wildlife Alliance.
PyHa = Python + Piha (referring to a bird species of our interest known as the screaming-piha)
Contents
Installation and Setup
- Navigate to a desired folder and clone the repository onto your local machine.
git clone https://github.com/UCSD-E4E/PyHa.git
- If you wish to reduce the size of the repository on your local machine you can alternatively use
git clone https://github.com/UCSD-E4E/PyHa.git --depth 1which will only install the most up-to-date version of the repo without its history.
- Install Python 3.8, Python 3.9, or Python 3.10
- Create a
venvby runningpython3.x -m venv .venvwherepython3.xis the appropriate python. - Activate the
venvwith the following commands:
- Windows:
.venv\Scripts\activate - macOS/Linux:
source .venv/bin/activate
- Install the build tools:
python -m pip install --upgrade pip poetry - Install the environment:
poetry install - Here you can download the Xeno-canto Screaming Piha test set used in our demos: https://drive.google.com/drive/u/0/folders/1lIweB8rF9JZhu6imkuTg_No0i04ClDh1
- Run
jupyter notebookwhile in the proper folder to activate the PyHa_Tutorial.ipynb notebook and make sure PyHa is running properly. Make sure the paths are properly aligned to the TEST folder in the notebook as well as in the ScreamingPiha_Manual_Labels.csv file
Functions

This image shows the design of the automated audio labeling system.
isolation_parameters
Many of the functions take in the isolation_parameters argument, and as such it will be defined globally here.
The isolation_parameters dictionary definition depends on the model used. The currently supported models are BirdNET-Lite, Microfaune, and TweetyNET.
The BirdNET-Lite isolation_parameters dictionary is as follows:
isolation_parameters = {
"model" : "birdnet",
"output_path" : "",
"lat" : 0.0,
"lon" : 0.0,
"week" : 0,
"overlap" : 0.0,
"sensitivity" : 0.0,
"min_conf" : 0.0,
"custom_list" : "",
"filetype" : "",
"num_predictions" : 0,
"write_to_csv" : False,
"verbose" : True
}
<br>
The Microfaune isolation_parameters dictionary is as follows:
isolation_parameters = {
"model" : "microfaune",
"technique" : "",
"threshold_type" : "",
"threshold_const" : 0.0,
"threshold_min" : 0.0,
"window_size" : 0.0,
"chunk_size" : 0.0,
"verbose" : True
}
The technique parameter can be: Simple, Stack, Steinberg, and Chunk. This input must be a string in all lowercase.
The threshold_type parameter can be: median, mean, average, standard deviation, or pure. This input must be a string in all lowercase.
The remaining parameters are floats representing their respective values.
<br>The TweetyNET isolation_parameters dictionary is as follows:
isolation_parameters = {
"model" : "tweetynet",
"tweety_output": False,
"technique" : "",
"threshold_type" : "",
"threshold_const" : 0.0,
"threshold_min" : 0.0,
"window_size" : 0.0,
"chunk_size" : 0.0,
"verbose" : True
}
The tweety_output parameter sets whether to use TweetyNET's original output or isolation techniques. If set to False, TweetyNET will use the specified technique parameter.
The Foreground-Background Separation technique isolation_parameters is as follows:
isolation_parameters = {
"model" : "fg_bg_dsp_sep",
"technique" : "",
"threshold_type" : "",
"threshold_const" : 0.0,
"kernel_size" : 4,
"power_threshold" : 0.0,
"threshold_min" : 0.0,
"verbose" : True
}
The kernel_size parameter is an integer n that specifies the size of the kernel used in the morphological opening process. For the opening of the binary mask, this will be an n by n kernel. For the processing of the indicator vector, this will be a 1 by n kernel. <br>
The power_threshold parameter is a float that determines by how many times the power of a pixel must be larger than its row and column medians. For example, if this value is set to 3.0, each pixel will have to have a power of at least 3 times its row and column medians to be included in the binary mask.
The Template Matching isolation_parameters is as follows:
isolation_parameters = {
"model" : "template_matching",
"template_path" : "",
"technique" : "",
"window_size" : 0.0,
"threshold_type" : "",
"threshold_const" : 0.0,
"cutoff_freq_low" : 0,
"cutoff_freq_high" : 0,
"verbose" : True,
"write_confidence" : True
}
The template_path parameter should be set to the path to the template to use, stored as a .wav file. <br>
The window_size parameter should be a float corresponding to the length (in seconds) of the template. This is so the Steinberg isolation can correctly convert the local score array into labels. <br>
cutoff_freq_low and cutoff_freq_high should be integer values. If both are defined, both signal and template will be put through a butterworth bandpass filter set to those cutoff frequencies. This is recommended to ensure that the signal and template are the same shape on the frequency axis. <br>
write_confidence determines whether or not the confidence of each label is written to the array, determined by the max score in the local score array for each label.
annotation_chunker
Found in annotation_post_processing.py
This function converts a Kaleidoscope-formatted Dataframe containing annotations to uniform chunks of chunk_length. Drops any annotation that less than chunk_length.
| Parameter | Type | Description |
| ----------------- | --------- | ------------------------------------------------------------- |
| kaleidoscope_df | Dataframe | Dataframe of automated or human labels in Kaleidoscope format |
| chunk_length | int | Duration in seconds of each annotation chunk |
This function returns a dataframe with annotations converted to uniform second chunks.
Usage: annotation_chunker(kaleidoscope_df, chunk_length)
write_confidence
Found in IsoAutio.py
This function adds a new column to a clip dataframe that has had automated labels generated, going through all of the annotations and adding to said row a confidence metric based on the maximum value of said annotation.
| Parameter | Type | Description |
| --------------------- | ---------------- | -------------------------------------------------------------------------------------- |
| local_score_arr | list of floats | Array of small predictions of bird presence. |
| automated_labels_df | Pandas Dataframe | Dataframe of labels derived from the local score array using the isolate() function. |
This function returns a Pandas Dataframe with an additional column of confidence scores from the local score array.
Usage: write_confidence(local_score_arr, automated_labels_df)
isolate
Found in IsoAutio.py
This function is the wrapper function for audio isolation techniques, and will call the respective function based on isolation_parameters "technique" key.
| Parameter | Type | Description |
| ---------------------- | -------------- | ------------------------------------------------------------------------------------ |
| local_scores | list of floats | Local scores of the audio clip as determined by Microfaune Recurrent Neural Network. |
| SIGNAL | list of ints | Samples that make up the audio signal. |
| SAMPLE_RATE | int | Sampling rate of the audio clip, usually 44100. |
| audio_dir | string | Directory of the audio clip. |
| filename | string | Name of the audio clip file. |
| isolation_parameters | dict | Python Dictionary that controls the various label creation techniques. |
This function returns a dataframe of automated labels for
