653 skills found · Page 1 of 22
SWivid / F5 TTSOfficial code for "F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching"
Rudrabha / Wav2LipThis repository contains the codes of "A Lip Sync Expert Is All You Need for Speech to Lip Generation In the Wild", published at ACM Multimedia 2020. For HD commercial model, please try out Sync Labs
xorbitsai / InferenceSwap GPT for any LLM by changing a single line of code. Xinference lets you run open-source, speech, and multimodal models on cloud, on-prem, or your laptop — all through one unified, production-ready inference API.
Azure-Samples / Cognitive Services Speech SDKSample code for the Microsoft Cognitive Services Speech SDK
openai / Openai FmCode for openai.fm, a demo for the OpenAI Speech API
Robitx / Gp.nvimGp.nvim (GPT prompt) Neovim AI plugin: ChatGPT sessions & Instructable text/code operations & Speech to text [OpenAI, Ollama, Anthropic, ..]
ddlBoJack / Emotion2vec[ACL 2024] Official PyTorch code for extracting features and training downstream models with emotion2vec: Self-Supervised Pre-Training for Speech Emotion Representation
Azure-Samples / Cognitive Speech TTSMicrosoft Text-to-Speech API sample code in several languages, part of Cognitive Services.
yiranran / Audio Driven TalkingFace HeadPoseCode for "Audio-driven Talking Face Video Generation with Learning-based Personalized Head Pose" (Arxiv 2020) and "Predicting Personalized Head Movement From Short Video and Speech Signal" (TMM 2022)
Rudrabha / Lip2WavThis is the repository containing codes for our CVPR, 2020 paper titled "Learning Individual Speaking Styles for Accurate Lip to Speech Synthesis"
DmitryRyumin / INTERSPEECH 2023 24 PapersINTERSPEECH 2023-2024 Papers: A complete collection of influential and exciting research papers from the INTERSPEECH 2023-24 conference. Explore the latest advances in speech and language processing. Code included. Star the repository to support the advancement of speech technology!
ZhangXInFD / SpeechTokenizerThis is the code for the SpeechTokenizer presented in the SpeechTokenizer: Unified Speech Tokenizer for Speech Language Models. Samples are presented on
Dadangdut33 / Speech TranslateA realtime speech transcription and translation application using Whisper OpenAI and free translation API. Interface made using Tkinter. Code written fully in Python.
DmitryRyumin / ICASSP 2023 24 PapersICASSP 2023-2024 Papers: A complete collection of influential and exciting research papers from the ICASSP 2023-24 conferences. Explore the latest advancements in acoustics, speech and signal processing. Code included. Star the repository to support the advancement of audio and signal processing!
gemengtju / Tutorial SeparationThis repo summarizes the tutorials, datasets, papers, codes and tools for speech separation and speaker extraction task. You are kindly invited to pull requests.
YuanGongND / LtuCode, Dataset, and Pretrained Models for Audio and Speech Large Language Model "Listen, Think, and Understand".
facebookresearch / BrainmagickTraining and evaluation pipeline for MEG and EEG brain signal encoding and decoding using deep learning. Code for our paper "Decoding speech perception from non-invasive brain recordings" published in Nature Machine Intelligence, 2023.
FireRedTeam / FireRedASR2SA SOTA Industrial-Grade All-in-One ASR system with ASR, VAD, LID, and Punc modules. FireRedASR2 supports Chinese (Mandarin, 20+ dialects/accents), English, code-switching, and both speech and singing ASR. FireRedVAD supports speech/singing/music in 100+ langs. FireRedLID supports 100+ langs and 20+ zh dialects. FireRedPunc supports zh and en.
YuanGongND / Whisper AtCode and Pretrained Models for Interspeech 2023 Paper "Whisper-AT: Noise-Robust Automatic Speech Recognizers are Also Strong Audio Event Taggers"
facebookresearch / MeshtalkCode for MeshTalk: 3D Face Animation from Speech using Cross-Modality Disentanglement