43 skills found · Page 1 of 2
mravanelli / Pytorch Kaldipytorch-kaldi is a project for developing state-of-the-art DNN/RNN hybrid speech recognition systems. The DNN part is managed by pytorch, while feature extraction, label computation, and decoding are performed with the kaldi toolkit.
jcvasquezc / DisVoicefeature extraction from speech signals
gionanide / Speech Signal Processing And ClassificationFront-end speech processing aims at extracting proper features from short- term segments of a speech utterance, known as frames. It is a pre-requisite step toward any pattern recognition problem employing speech or audio (e.g., music). Here, we are interesting in voice disorder classification. That is, to develop two-class classifiers, which can discriminate between utterances of a subject suffering from say vocal fold paralysis and utterances of a healthy subject.The mathematical modeling of the speech production system in humans suggests that an all-pole system function is justified [1-3]. As a consequence, linear prediction coefficients (LPCs) constitute a first choice for modeling the magnitute of the short-term spectrum of speech. LPC-derived cepstral coefficients are guaranteed to discriminate between the system (e.g., vocal tract) contribution and that of the excitation. Taking into account the characteristics of the human ear, the mel-frequency cepstral coefficients (MFCCs) emerged as descriptive features of the speech spectral envelope. Similarly to MFCCs, the perceptual linear prediction coefficients (PLPs) could also be derived. The aforementioned sort of speaking tradi- tional features will be tested against agnostic-features extracted by convolu- tive neural networks (CNNs) (e.g., auto-encoders) [4]. The pattern recognition step will be based on Gaussian Mixture Model based classifiers,K-nearest neighbor classifiers, Bayes classifiers, as well as Deep Neural Networks. The Massachussets Eye and Ear Infirmary Dataset (MEEI-Dataset) [5] will be exploited. At the application level, a library for feature extraction and classification in Python will be developed. Credible publicly available resources will be 1used toward achieving our goal, such as KALDI. Comparisons will be made against [6-8].
aishoot / Speech Feature ExtractionFeature extraction of speech signal is the initial stage of any speech recognition system.
abhishek305 / PyBot A ChatBot For Answering Python Queries Using NLPPybot can change the way learners try to learn python programming language in a more interactive way. This chatbot will try to solve or provide answer to almost every python related issues or queries that the user is asking for. We are implementing NLP for improving the efficiency of the chatbot. We will include voice feature for more interactivity to the user. By utilizing NLP, developers can organize and structure knowledge to perform tasks such as automatic summarization, translation, named entity recognition, relationship extraction, sentiment analysis, speech recognition, and topic segmentation. NLTK has been called “a wonderful tool for teaching and working in, computational linguistics using Python,” and “an amazing library to play with natural language.The main issue with text data is that it is all in text format (strings). However, the Machine learning algorithms need some sort of numerical feature vector in order to perform the task. So before we start with any NLP project we need to pre-process it to make it ideal for working. Converting the entire text into uppercase or lowercase, so that the algorithm does not treat the same words in different cases as different Tokenization is just the term used to describe the process of converting the normal text strings into a list of tokens i.e words that we actually want. Sentence tokenizer can be used to find the list of sentences and Word tokenizer can be used to find the list of words in strings.Removing Noise i.e everything that isn’t in a standard number or letter.Removing Stop words. Sometimes, some extremely common words which would appear to be of little value in helping select documents matching a user need are excluded from the vocabulary entirely. These words are called stop words.Stemming is the process of reducing inflected (or sometimes derived) words to their stem, base or root form — generally a written word form. Example if we were to stem the following words: “Stems”, “Stemming”, “Stemmed”, “and Stemtization”, the result would be a single word “stem”. A slight variant of stemming is lemmatization. The major difference between these is, that, stemming can often create non-existent words, whereas lemmas are actual words. So, your root stem, meaning the word you end up with, is not something you can just look up in a dictionary, but you can look up a lemma. Examples of Lemmatization are that “run” is a base form for words like “running” or “ran” or that the word “better” and “good” are in the same lemma so they are considered the same.
a-n-rose / Python Sound ToolSoundPy (alpha stage) is a research-based python package for speech and sound. Applications include deep-learning, filtering, speech-enhancement, audio augmentation, feature extraction and visualization, dataset and audio file conversion, and beyond.
jameslyons / Matlab Speech FeaturesA set of speech feature extraction functions for ASR and speaker identification written in matlab.
mravanelli / Pytorch MLP For ASRThis code implements a basic MLP for speech recognition. The MLP is trained with pytorch, while feature extraction, alignments, and decoding are performed with Kaldi. The current implementation supports dropout and batch normalization. An example for phoneme recognition using the standard TIMIT dataset is provided.
manthanthakker / SpeakerIdentificationNeuralNetworks⇨ The Speaker Recognition System consists of two phases, Feature Extraction and Recognition. ⇨ In the Extraction phase, the Speaker's voice is recorded and typical number of features are extracted to form a model. ⇨ During the Recognition phase, a speech sample is compared against a previously created voice print stored in the database. ⇨ The highlight of the system is that it can identify the Speaker's voice in a Multi-Speaker Environment too. Multi-layer Perceptron (MLP) Neural Network based on error back propagation training algorithm was used to train and test the system. ⇨ The system response time was 74 µs with an average efficiency of 95%.
EmergenceAI / Kotlin Speech FeaturesThis library provides common speech features for ASR including MFCCs and filterbank energies for Android and iOS.
tabahi / FormantfeaturesExtract frequency, power, width and dissonance of formants from wav files
vaibhavsundharam / Speech Emotion AnalysisHuman emotions are one of the strongest ways of communication. Even if a person doesn’t understand a language, he or she can very well understand the emotions delivered by an individual. In other words, emotions are universal.The idea behind the project is to develop a Speech Emotion Analyzer using deep-learning to correctly classify a human’s different emotions, such as, neutral speech, angry speech, surprised speech, etc. We have deployed three different network architectures namely 1-D CNN, LSTMs and Transformers to carryout the classification task. Also, we have used two different feature extraction methodologies (MFCC & Mel Spectrograms) to capture the features in a given voice signal and compared the two in their ability to produce high quality results, especially in deep-learning models.
flyingshan / Chinese Speech Feature ExtractionSpliting the ASR probability distribution results into the chinese pinyin, so as to extract more effective feature for the chinese speech.
vmarpadge / Parkinsons Detection Using Machine LearningParkinson’s disease can be detected using speech. The phonation is the most affected in speech i.e. the sound when we pronounce the vowels. We have used the database of the speech samples containing the phonation from the affected and healthy people. Various database of the speech sample is available from JASA (Journal of Acoustic Society of America), UCI. Speech signals or the voice samples have been taken from the standard UCI voice dataset which consists of voice samples of people. The samples of healthy people are also collected for the comparative study. The Test data belongs to 56 subjects. During the collection of this dataset, 56 people are asked to say only the sustained vowels 'a' and 'o' three times respectively. Total of 336 recordings are obtained from the repository. In the training phase the pre-processing of these signals is done for feature extraction by PRAAT software. The features extracted are jitter, shimmer, NHR, HNR, mean and median pitch, number of pulses and periods, minimum and maximum period, SD, SD of period, number and degree of voice breaks. All these features differ from patient to patient depending upon the fact how much Parkinson’s disease has progressed. After extracting all the features we will do dimensionality reduction of the features using particle swarm optimization(PSO),In this optimization method it works like swarm particle and reduce the features selection to a minimum, optimization involves in achieving better result in less computation, after selection of the features, The features are used to train the SVM classifier and the model is trained.
jcvasquezc / AEspeechFeature extraction from speech signals based on representation learning strategies using pre-trained autoencoders
tbright17 / Accent FeatFeature extraction for accented-speech or pathological speech
yueqiusun / Twitter Hate Speech ClassifierIt is a project that aim to detect and classify hate speech and offensive speech on Twitter using bag of words model. The ipython notebook include the process of data cleaning, feature extraction and SVM model building.
bootphon / Sustained Phonation FeaturesPython package for the extraction of speech features for sustained phonation
AlinaBaber / Arabic Speech Recognition By Machine Learning And Feature ExtractionThis project implements an Arabic Speech Recognition system using an ensemble voting classifier. The model is built with Python and utilizes the Librosa library for preprocessing and feature extraction.
hyyoka / Acoustic Featuresaudio/speech feature extraction using parselmouth, librosa, disvoice