Results for "speech-application"

Claude Code Claude Desktop GitHub Copilot Cursor Windsurf Cline Zed JetBrains

📄SKILL.md 🤖CLAUDE.md ⚡Claude Commands 📐.cursorrules 📐Cursor Rules 🕹️AGENTS.md 🧬codex.md 🏄.windsurfrules 🔧.clinerules 🧑‍✈️Copilot Instructions

All Development Operations Data Product Marketing Customer Design Sales

570 skills found · Page 1 of 19

cjpais / Handy

19.4k

A free, open source, and extensible speech-to-text application that works completely offline.

universal

accessibilitycross-platformspeech-to-text+1

Updated just now

google / Live Transcribe Speech Engine

1.5k

Live Transcribe is an Android application that provides real-time captioning for people who are deaf or hard of hearing. This repository contains the Android client libraries for communicating with Google's Cloud Speech API that are used in Live Transcribe.

universal

Updated 7d ago

watson-developer-cloud / Speech To Text Nodejs

1.1k

:microphone: Sample Node.js Application for the IBM Watson Speech to Text Service

universal

Updated 4d ago

PromtEngineer / Verbi

1.1k

A modular voice assistant application for experimenting with state-of-the-art transcription, response generation, and text-to-speech models. Supports OpenAI, Groq, Elevanlabs, CartesiaAI, and Deepgram APIs, plus local models via Ollama. Ideal for research and development in voice technology.

universal

Updated 3d ago

Yuan-ManX / AI Audio Datasets

922

AI Audio Datasets (AI-ADS) 🎵, including Speech, Music, and Sound Effects, which can provide training data for Generative AI, AIGC, AI model training, intelligent audio tool development, and audio applications.

universal

aigcartificial-intelligenceaudio+6

Updated 7h ago

igorantun / Node Chat

768

:speech_balloon: Chat application built with NodeJS and Material Design

universal

chatnodejswebsocket

Updated 1mo ago

jsxc / Jsxc

733

:speech_balloon: Real-time xmpp chat application with video calls, file transfer and encrypted communication.

universal

file-transferjavascriptotr+4

Updated 9d ago

Dadangdut33 / Speech Translate

643

A realtime speech transcription and translation application using Whisper OpenAI and free translation API. Interface made using Tkinter. Code written fully in Python.

universal

pythonspeech-transcriptionspeech-translation+3

Updated 6d ago

drowe67 / Codec2 Dev

629

Open source speech codec designed for communications quality speech between 450 and 3200 bit/s. The main application is low bandwidth HF/VHF digital radio.

universal

Updated 4d ago

adrianhajdin / Project News Alan AI

515

In this video, we're going to build a Conversational Voice Controlled React News Application using Alan AI. Alan AI is a revolutionary speech recognition software that allows you to add voice capabilities to your applications.

universal

reactreact-projectreactjs+2

Updated 10d ago

Femoon / Tts Azure Web

475

TTS Azure Web 是一个 Azure 文本转语音（TTS）网页应用，可以在本地或者云端使用你的 Azure Key 一键部署。TTS Azure Web is an Azure Text-to-Speech (TTS) web application. It allows you to run it locally or deploy it with a single click using your Azure Key.

universal

azurenextjsnextui+3

Updated 1d ago

modelscope / FunCodec

443

FunCodec is a research-oriented toolkit for audio quantization and downstream applications, such as text-to-speech synthesis, music generation et.al.

universal

audio-generationaudio-quantizationcodec+5

Updated 6d ago

petewarden / Open Speech Recording

426

Web application to record speech for an open data set

universal

Updated 18d ago

dectalk / Dectalk

421

Modern builds for the 90s/00s DECtalk text-to-speech application.

universal

dectalktext-to-speechtts

Updated 17h ago

drowe67 / Codec2

405

Open source speech codec designed for communications quality speech between 700 and 3200 bit/s. The main application is low bandwidth HF/VHF digital radio.

universal

Updated 16h ago

WangHelin1997 / CapSpeech

369

CapSpeech: Enabling Downstream Applications in Style-Captioned Text-to-Speech

universal

Updated 5d ago

danidee10 / Chatire

362

:speech_balloon: Real time Chat application built with Vue, Django, RabbitMQ and uWSGI WebSockets.

universal

djangorabbitmquwsgi-websocket+2

Updated 4d ago

Devansh-47 / Sign Language To Text And Speech Conversion

307

This is a python application which converts american sign language into text and speech which helps Dumb/Deaf people to start conversation with normal people who dont understand this language

universal

cnn-modelgooglecolabmachine-learning+3

Updated 4d ago

intel-iot-devkit / Smart Video Workshop

302

Learn about the workflow using Intel® Distribution of OpenVINO™ toolkit to accelerate vision, automatic speech recognition, natural language processing, recommendation systems and many other applications.

universal

Updated 4mo ago

gionanide / Speech Signal Processing And Classification

257

Front-end speech processing aims at extracting proper features from short- term segments of a speech utterance, known as frames. It is a pre-requisite step toward any pattern recognition problem employing speech or audio (e.g., music). Here, we are interesting in voice disorder classification. That is, to develop two-class classifiers, which can discriminate between utterances of a subject suffering from say vocal fold paralysis and utterances of a healthy subject.The mathematical modeling of the speech production system in humans suggests that an all-pole system function is justified [1-3]. As a consequence, linear prediction coefficients (LPCs) constitute a first choice for modeling the magnitute of the short-term spectrum of speech. LPC-derived cepstral coefficients are guaranteed to discriminate between the system (e.g., vocal tract) contribution and that of the excitation. Taking into account the characteristics of the human ear, the mel-frequency cepstral coefficients (MFCCs) emerged as descriptive features of the speech spectral envelope. Similarly to MFCCs, the perceptual linear prediction coefficients (PLPs) could also be derived. The aforementioned sort of speaking tradi- tional features will be tested against agnostic-features extracted by convolu- tive neural networks (CNNs) (e.g., auto-encoders) [4]. The pattern recognition step will be based on Gaussian Mixture Model based classifiers,K-nearest neighbor classifiers, Bayes classifiers, as well as Deep Neural Networks. The Massachussets Eye and Ear Infirmary Dataset (MEEI-Dataset) [5] will be exploited. At the application level, a library for feature extraction and classification in Python will be developed. Credible publicly available resources will be 1used toward achieving our goal, such as KALDI. Comparisons will be made against [6-8].