AuxiliaryRawNet
Auxiliary Raw Net (ARawNet) is a ASVSpoof detection model taking both raw waveform and handcrafted features as inputs, to balance the trade-off between performance and model complexity.
Install / Use
/learn @magnumresearchgroup/AuxiliaryRawNetREADME
Overview
This repository is an implementation of the Auxiliary Raw Net (ARawNet), which is ASVSpoof detection system taking both raw waveform and handcrafted features as inputs,to balance the trade-off between performance and model complexity. The paper can be checked here.
The model performance is tested on the ASVSpoof 2019 Dataset.

Setup
Environment
<details><summary>Show details</summary> <p>- speechbrain==0.5.7
- pandas
- torch==1.9.1
- torchaudio==0.9.1
- nnAudio==0.2.6
- ptflops==0.6.6
- Create a conda environment with
conda env create -f environment.yml. - Activate the conda environment with
conda activate.
``
Data preprocessing
.
├── data
│ │
│ ├── PA
│ │ └── ...
│ └── LA
│ ├── ASVspoof2019_LA_asv_protocols
│ ├── ASVspoof2019_LA_asv_scores
│ ├── ASVspoof2019_LA_cm_protocols
│ ├── ASVspoof2019_LA_train
│ ├── ASVspoof2019_LA_dev
│
│
└── ARawNet
-
Download dataset. Our experiment is trained on the Logical access (LA) scenario of the ASVspoof 2019 dataset. Dataset can be downloaded here.
-
Unzip and save the data to a folder
datain the same directory asARawNetas shown in below. -
Run
python preprocess.pyOr you can use our processed data directly under "/processed_data".
Train
python train_raw_net.py yaml/RawSNet.yaml --data_parallel_backend -data_parallel_count=2
Evaluate
python eval.py
Check Model Size and multiply-and-accumulates (MACs)
python check_model_size.py yaml/RawSNet.yaml
Model Performance
Accuracy metric
min t−DCF =min{βPcm (s)+Pcm(s)}
Explanations can be found here: t-DCF
Experiment Results
| |Front-end | Main Encoder| E_A | EER | min-tDCF | |---| ----------- | ----------- | ---- | --- | ---- | |Res2Net| Spec | Res2Net |- | 8.783 | 0.2237 | | | LFCC | |- | 2.869 | 0.0786 | | | CQT | |- | 2.502 | 0.0743 | | Rawnet2 | Raw waveforms |Rawnet2 |- | 5.13 | 0.1175| |ARawNet|Mel-Spectrogram | XVector | :white_check_mark: | 1.32| 0.03894| | | | | - | 2.39320 | 0.06875 | | ARawNet | Mel-Spectrogram | ECAPA-TDNN | :white_check_mark: | 1.39 | 0.04316 | | | | | - | 2.11 | 0.06425 | | ARawNet | CQT | XVector | :white_check_mark: | 1.74| 0.05194 | | | | | - | 3.39875 | 0.09510 | | ARawNet | CQT | ECAPA-TDNN | :white_check_mark: | 1.11| 0.03645 | | | | | - | 1.72667 | 0.05077 |
| Main Encoder | Auxiliary Encoder | Parameters | MACs | | --- | --- | --- | --- | Rawnet2 | - | 25.43 M | 7.61 GMac Res2Net | - | 0.92 M | 1.11 GMac XVector | :white_check_mark: | 5.81 M | 2.71 GMac XVector | - | 4.66M | 1.88 GMac ECAPA-TDNN | :white_check_mark: | 7.18 M | 3.19 GMac ECAPA-TDNN | - | 6.03M | 2.36 GMac
Cite Our Paper
If you use this repository, please consider citing:
@inproceedings{Teng2021ComplementingHF, title={Complementing Handcrafted Features with Raw Waveform Using a Light-weight Auxiliary Model}, author={Zhongwei Teng and Quchen Fu and Jules White and M. Powell and Douglas C. Schmidt}, year={2021} }
@inproceedings{Fu2021FastAudioAL, title={FastAudio: A Learnable Audio Front-End for Spoof Speech Detection}, author={Quchen Fu and Zhongwei Teng and Jules White and M. Powell and Douglas C. Schmidt}, year={2021} }
