15 skills found
FunAudioLLM / SenseVoiceMultilingual Voice Understanding Model
FireRedTeam / FireRedASR2SA SOTA Industrial-Grade All-in-One ASR system with ASR, VAD, LID, and Punc modules. FireRedASR2 supports Chinese (Mandarin, 20+ dialects/accents), English, code-switching, and both speech and singing ASR. FireRedVAD supports speech/singing/music in 100+ langs. FireRedLID supports 100+ langs and 20+ zh dialects. FireRedPunc supports zh and en.
FireRedTeam / FireRedVADA SOTA Industrial-Grade Voice Activity Detection & Audio Event Detection, supporting 100+ languages, outperforming Silero-VAD, TEN-VAD, FunASR-VAD and WebRTC-VAD
Jungjee / DcaseNetAuthor's repository for reproducing DcaseNet, an integrated pre-trained DNN that performs acoustic scene classification, audio tagging, and sound event detection. Implemented using PyTorch.
Hazrat-Ali9 / Urban Sound Classification With UrbanSound8K Dataset🕌 Urban 🚃 Sound 🚋 Classification 🚞 with ⛩ UrbanSound8K 🚁 Dataset is 🛳 a deep 🚠 learning 🚢 that ✈ classifies 🚀 urban 🛼 sound ⛱ events 🚈 the 🚂 UrbanSound8K 🚋 dataset 🏛 demonstrates 🏘 how 🧱 audio 🧸 signal 🏆 processing ⚽ neural 🏀 networks 🥎 can ⚾ power 🏐 sound 🎮 recognition 📟 systems 🔋smart 🧰 cities 💣 surveillance 🪣
Hadryan / TFNet For Environmental Sound ClassificationLearning discriminative and robust time-frequency representations for environmental sound classification: Convolutional neural networks (CNN) are one of the best-performing neural network architectures for environmental sound classification (ESC). Recently, attention mechanisms have been used in CNN to capture the useful information from the audio signal for sound classification, especially for weakly labelled data where the timing information about the acoustic events is not available in the training data, apart from the availability of sound class labels. In these methods, however, the inherent time-frequency characteristics and variations are not explicitly exploited when obtaining the deep features. In this paper, we propose a new method, called time-frequency enhancement block (TFBlock), which temporal attention and frequency attention are employed to enhance the features from relevant frames and frequency bands. Compared with other attention mechanisms, in our method, parallel branches are constructed which allow the temporal and frequency features to be attended respectively in order to mitigate interference from the sections where no sound events happened in the acoustic environments. The experiments on three benchmark ESC datasets show that our method improves the classification performance and also exhibits robustness to noise.
jaehwlee / Tf2 Harmonic CnnTensorflow2 implementation of Data-driven Harmonic Filters for Audio Representation Learning
ta012 / MaxAST[ICASSP 2024] Max-AST: Combining Convolution, Local and Global Self-Attentions for Audio Event Classification
ta012 / DTFAT[AAAI 2024] DTF-AT: Decoupled Time-Frequency Audio Transformer for Event Classification
Listening-Lab / AnnotatorListening Lab audio analysis and annotation tool. Develop audio classification models to detect sparse audio events within field recordings
WangHelin1997 / FPNetA signal segmentation method of CNN for audio event classification
h-sami-ullah / Audio Event Analysis And Feature Extraction Using MATLAB# Audio processing in MATLAB SVM based gunshot detection and classification using hand designed features. This repo contains SVM based audio event detection and classification. It has dataset including positives and negative examples. The main is to detect the gunshot in an audio signal and classify them into two type of gun i.e., sniper and rifle. # How to use the codes? To run the code please compile the **"classify.m"** file which loads the precomputed feature matrix from the given files and produce the result. If you want to extract the feature from scratch, that is also possible. Below are the features used for training, one can also add other features that represent more enriched information in each signal. # Features Extraction This project extracts feature matrix to train support vector machine. The feature used to represent the audio bag are given below: (1) **Spectral Roll-Off** (2) **energy** (3) **Zero Crossing Rate** (4) **Spectral Roll-Off** (5) **Spectral Centroid** (6) **Spectral Spread** (7) **Volume** (8) **Spectral Flux** Other than the presented features it is easy to add **"IMFCC"** coefficients to improve the results. The see how the feature extraction works please go through **"all_parameters.m"** file.
thtran97 / Deep Learning For AudioAudio Classification & Sound Event Detection in Pytorch
hjleed / Acoustic Environment Classification Using Discrete Hartley Transform FeaturesThis paper presents a new approach for acoustic environment classification based on the discrete Hartley transform. The approach applies a Hidden Markov Model based classifier on test data composed of audio clips, in order to determine which environment is surrounding these audio clips. The approach uses features obtained from the discrete Hartley transform, leading to a set of features that require only real arithmetic computations. This can make the technique advantageous in terms of simplicity and/or in terms of computational speed. The proposed approach performance is evaluated on benchmark datasets provided from the 2013 and 2016 Detection and Classification of Acoustic Scenes and Events (DCASE) challenges. Experiments show that the proposed method is competitive compared to other recently proposed methods, and that the use of the discrete Hartley transform improves the classification performance.
vijay-2012 / High Spot Cricket Highlights GeneratorA Flask application that automates the process of extracting cricket highlights from cricket match videos. This application uses deep neural network model to make audio based classification of events and video editing module extracts highlight moments from video based on the classification.