Wavencoder

WavEncoder is a Python library for encoding audio signals, transforms for audio augmentation, and training audio classification models with PyTorch backend.

Generate Convert Improve

Install / Use

/learn @shangeth/Wavencoder

About this skill

Quality Score

0/100

README

PyPI PyPI - Python Version GitHub last commit GitHub code size in bytes GitHub

WavEncoder

WavEncoder is a Python library for encoding audio signals, transforms for audio augmentation, and training audio classification models with PyTorch backend.

Package Contents

<table class="tg"> <thead> <tr> <th class="tg-7btt">Layers</th> <th class="tg-7btt">Models</th> <th class="tg-7btt">Transforms</th> <th class="tg-7btt">Trainer and utils</th> </tr> </thead> <tbody> <tr> <td class="tg-0pky"> <ul> <li>Attention</li> <ul> <li>Dot</li> <li>Soft</li> <li>Additive</li> <li>Multiplicative</li> </ul> <li>SincNet layer</li> <li>Time Delay Neural Network(TDNN)</li> </ul> </td> <td class="tg-0pky"> <ul> <li>PreTrained</li> <ul> <li>wav2vec</li> <li>wav2vec2(base, large, xlsr53)</li> <li>SincNet</li> <li>RawNet</li> </ul> <li>Baseline</li> <ul> <li>1DCNN</li> <li>LSTM Classifier</li> <li>LSTM Attention Classifier</li> </ul> </ul> </td> <td class="tg-0pky"> <ul> <li>Noise(Environmet/Gaussian White Noise)</li> <li>Speed Change</li> <li>PadCrop</li> <li>Clip</li> <li>Reverberation</li> <li>TimeShift</li> <li>TimeMask</li> <li>FrequencyMask</li> </ul> </td> <td class="tg-0pky"> <ul> <li>Classification Trainer</li> <li>Classification Testing</li> <li>Download Noise Dataset</li> <li>Download Impulse Response Dataset</li> </ul> </td> </tr> </tbody> </table>

Wav Models to be added

[x] wav2vec [1]
[x] wav2vec2 [2]
[x] SincNet [3]
[ ] PASE [4]
[ ] MockingJay [5]
[x] RawNet [6]
[ ] GaborNet [7]
[ ] LEAF [8]
[x] CNN-1D
[x] CNN-LSTM
[x] CNN-LSTM-Attn

Check the Demo Colab Notebook.

Installation

Use the package manager pip to install wavencoder.

pip install wavencoder

Usage

Import pretrained encoder, baseline models and classifiers

import torch
import wavencoder

x = torch.randn(1, 16000) # [1, 16000]
encoder = wavencoder.models.Wav2Vec(pretrained=True)
z = encoder(x) # [1, 512, 98]

classifier = wavencoder.models.LSTM_Attn_Classifier(512, 64, 2,                          
                                                    return_attn_weights=True, 
                                                    attn_type='soft')
y_hat, attn_weights = classifier(z) # [1, 2], [1, 98]

Use wavencoder with PyTorch Sequential or class modules

import torch
import torch.nn as nn
import wavencoder

model = nn.Sequential(
        wavencoder.models.Wav2Vec(),
        wavencoder.models.LSTM_Attn_Classifier(512, 64, 2,                          
                                               return_attn_weights=True, 
                                               attn_type='soft')
)

x = torch.randn(1, 16000) # [1, 16000]
y_hat, attn_weights = model(x) # [1, 2], [1, 98]

import torch
import torch.nn as nn
import wavencoder

class AudioClassifier(nn.Module):
    def __init__(self):
        super(AudioClassifier, self).__init__()
        self.encoder = wavencoder.models.Wav2Vec(pretrained=True)
        self.classifier = nn.Linear(512, 2)

    def forward(self, x):
        z = self.encoder(x)
        z = torch.mean(z, dim=2)
        out = self.classifier(z)
        return out

model = AudioClassifier()
x = torch.randn(1, 16000) # [1, 16000]
y_hat = model(x) # [1, 2]

Train the encoder-classifier models

from wavencoder.models import Wav2Vec, LSTM_Attn_Classifier
from wavencoder.trainer import train, test_evaluate_classifier, test_predict_classifier

model = nn.Sequential(
    Wav2Vec(pretrained=False),
    LSTM_Attn_Classifier(512, 64, 2)
)

trainloader = ...
valloader = ...
testloader = ...

trained_model, train_dict = train(model, trainloader, valloader, n_epochs=20)
test_prediction_dict = test_predict_classifier(trained_model, testloader)

Add Transforms to your DataLoader for Augmentation/Processing the wav signal

from wavencoder.transforms import Compose, AdditiveNoise, SpeedChange, Clipping, PadCrop, Reverberation

audio, _ = torchaudio.load('test.wav')

transforms = Compose([
                    AdditiveNoise('path-to-noise-folder', p=0.5, snr_levels=[5, 10, 15], p=0.5), 
                    SpeedChange(factor_range=(-0.5, 0.0), p=0.5), 
                    Clipping(p=0.5),
                    PadCrop(48000, crop_position='random', pad_position='random') 
                    ])

transformed_audio = transforms(audio)

Contributing

Pull requests are welcome. For major changes, please open an issue first to discuss what you would like to change.

Please make sure to update tests as appropriate.

License

MIT

Reference

Related Skills

YC-Killer

2.7k

A library of enterprise-grade AI agents designed to democratize artificial intelligence and provide free, open-source alternatives to overvalued Y Combinator startups. If you are excited about democratizing AI access & AI agents, please star ⭐️ this repository and use the link in the readme to join our open source AI research team.

best-practices-researcher

The most comprehensive Claude Code skills registry | Web Search: https://skills-registry-web.vercel.app

groundhog

398

Groundhog's primary purpose is to teach people how Cursor and all these other coding agents work under the hood. If you understand how these coding assistants work from first principles, then you can drive these tools harder (or perhaps make your own!).

isf-agent

a repo for an agent that helps researchers apply for isf funding