570 skills found · Page 1 of 19
cjpais / HandyA free, open source, and extensible speech-to-text application that works completely offline.
google / Live Transcribe Speech EngineLive Transcribe is an Android application that provides real-time captioning for people who are deaf or hard of hearing. This repository contains the Android client libraries for communicating with Google's Cloud Speech API that are used in Live Transcribe.
watson-developer-cloud / Speech To Text Nodejs:microphone: Sample Node.js Application for the IBM Watson Speech to Text Service
PromtEngineer / VerbiA modular voice assistant application for experimenting with state-of-the-art transcription, response generation, and text-to-speech models. Supports OpenAI, Groq, Elevanlabs, CartesiaAI, and Deepgram APIs, plus local models via Ollama. Ideal for research and development in voice technology.
Yuan-ManX / AI Audio DatasetsAI Audio Datasets (AI-ADS) 🎵, including Speech, Music, and Sound Effects, which can provide training data for Generative AI, AIGC, AI model training, intelligent audio tool development, and audio applications.
igorantun / Node Chat:speech_balloon: Chat application built with NodeJS and Material Design
jsxc / Jsxc:speech_balloon: Real-time xmpp chat application with video calls, file transfer and encrypted communication.
Dadangdut33 / Speech TranslateA realtime speech transcription and translation application using Whisper OpenAI and free translation API. Interface made using Tkinter. Code written fully in Python.
drowe67 / Codec2 DevOpen source speech codec designed for communications quality speech between 450 and 3200 bit/s. The main application is low bandwidth HF/VHF digital radio.
adrianhajdin / Project News Alan AIIn this video, we're going to build a Conversational Voice Controlled React News Application using Alan AI. Alan AI is a revolutionary speech recognition software that allows you to add voice capabilities to your applications.
Femoon / Tts Azure WebTTS Azure Web 是一个 Azure 文本转语音(TTS)网页应用,可以在本地或者云端使用你的 Azure Key 一键部署。TTS Azure Web is an Azure Text-to-Speech (TTS) web application. It allows you to run it locally or deploy it with a single click using your Azure Key.
modelscope / FunCodecFunCodec is a research-oriented toolkit for audio quantization and downstream applications, such as text-to-speech synthesis, music generation et.al.
petewarden / Open Speech RecordingWeb application to record speech for an open data set
dectalk / DectalkModern builds for the 90s/00s DECtalk text-to-speech application.
drowe67 / Codec2Open source speech codec designed for communications quality speech between 700 and 3200 bit/s. The main application is low bandwidth HF/VHF digital radio.
WangHelin1997 / CapSpeechCapSpeech: Enabling Downstream Applications in Style-Captioned Text-to-Speech
danidee10 / Chatire:speech_balloon: Real time Chat application built with Vue, Django, RabbitMQ and uWSGI WebSockets.
Devansh-47 / Sign Language To Text And Speech ConversionThis is a python application which converts american sign language into text and speech which helps Dumb/Deaf people to start conversation with normal people who dont understand this language
intel-iot-devkit / Smart Video WorkshopLearn about the workflow using Intel® Distribution of OpenVINO™ toolkit to accelerate vision, automatic speech recognition, natural language processing, recommendation systems and many other applications.
gionanide / Speech Signal Processing And ClassificationFront-end speech processing aims at extracting proper features from short- term segments of a speech utterance, known as frames. It is a pre-requisite step toward any pattern recognition problem employing speech or audio (e.g., music). Here, we are interesting in voice disorder classification. That is, to develop two-class classifiers, which can discriminate between utterances of a subject suffering from say vocal fold paralysis and utterances of a healthy subject.The mathematical modeling of the speech production system in humans suggests that an all-pole system function is justified [1-3]. As a consequence, linear prediction coefficients (LPCs) constitute a first choice for modeling the magnitute of the short-term spectrum of speech. LPC-derived cepstral coefficients are guaranteed to discriminate between the system (e.g., vocal tract) contribution and that of the excitation. Taking into account the characteristics of the human ear, the mel-frequency cepstral coefficients (MFCCs) emerged as descriptive features of the speech spectral envelope. Similarly to MFCCs, the perceptual linear prediction coefficients (PLPs) could also be derived. The aforementioned sort of speaking tradi- tional features will be tested against agnostic-features extracted by convolu- tive neural networks (CNNs) (e.g., auto-encoders) [4]. The pattern recognition step will be based on Gaussian Mixture Model based classifiers,K-nearest neighbor classifiers, Bayes classifiers, as well as Deep Neural Networks. The Massachussets Eye and Ear Infirmary Dataset (MEEI-Dataset) [5] will be exploited. At the application level, a library for feature extraction and classification in Python will be developed. Credible publicly available resources will be 1used toward achieving our goal, such as KALDI. Comparisons will be made against [6-8].