381 skills found · Page 1 of 13
deezer / SpleeterDeezer source separation library including pretrained models.
Anjok07 / UltimatevocalremoverguiGUI for a Vocal Remover that uses Deep Neural Networks.
abus-aikorea / Voice ProGradio WebUI for creators and developers, featuring key TTS (Edge-TTS, kokoro) and zero-shot Voice Cloning (E2 & F5-TTS, CosyVoice), with Whisper audio processing, YouTube download, Demucs vocal isolation, and multilingual translation.
stemrollerapp / StemrollerIsolate vocals, drums, bass, and other instrumental stems from any song
Music-and-Culture-Technology-Lab / OmnizartOmniscient Mozart, being able to transcribe everything in the music, including vocal, drum, chord, beat, instruments, and more.
jianchang512 / Vocal Separatean extremely simple tool for separating vocals and background music, completely localized for web operation, using 2stems/4stems/5stems models 这是一个极简的人声和背景音乐分离工具,本地化网页操作,无需连接外网
tsurumeso / Vocal RemoverVocal Remover using Deep Neural Networks
ardha27 / AI Song Cover RVCAll in One Version : Youtube WAV Download, Separating Vocal, Splitting Audio, Training, and Inference Using Google Colab
nomadkaraoke / Python Audio SeparatorEasy to use stem (e.g. instrumental/vocals) separation from CLI or as a python package, using a variety of amazing pre-trained models (primarily from UVR)
YARC-Official / YARGYARG (a.k.a. Yet Another Rhythm Game) is a free, open-source, plastic guitar game that is still in development. It supports guitar (five fret), drums (plastic or e-kit), vocals, pro-keys, and more!
Eddycrack864 / UVR5 UIUltimate Vocal Remover 5 with Gradio UI. Separate an audio file into various stems, using multiple models
JeffreyCA / Spleeter WebSelf-hostable web app for isolating the vocal, accompaniment, bass, and drums of any song. Supports Spleeter, Demucs, BS-RoFormer. Built with React and Django.
rakuri255 / UltraSingerAI based tool to convert vocals lyrics and pitch from music to autogenerate Ultrastar Deluxe, Midi and notes. It automatic tapping, adding text, pitch vocals and creates karaoke files.
christian-byrne / Audio Separation Nodes ComfyuiSeparate stems (vocals, bass, drums, other) from audio. Recombine, tempo match, slice/crop audio
gabolsgabs / DALIDALI: a large Dataset of synchronised Audio, LyrIcs and vocal notes.
EtienneAb3d / WhisperHalluExperimental code: sound file preprocessing to optimize Whisper transcriptions without hallucinated texts
alexcrist / AutotoneA vocal pitch correction web application (like Autotune)
KakaruHayate / ColorSplitterA cli tool for split vocal timbre.
ardha27 / AI Song Cover SOVITSAll in One Version : Youtube WAV Download, Separating Vocal, Splitting Audio, Training, and Inference Using Google Colab
gionanide / Speech Signal Processing And ClassificationFront-end speech processing aims at extracting proper features from short- term segments of a speech utterance, known as frames. It is a pre-requisite step toward any pattern recognition problem employing speech or audio (e.g., music). Here, we are interesting in voice disorder classification. That is, to develop two-class classifiers, which can discriminate between utterances of a subject suffering from say vocal fold paralysis and utterances of a healthy subject.The mathematical modeling of the speech production system in humans suggests that an all-pole system function is justified [1-3]. As a consequence, linear prediction coefficients (LPCs) constitute a first choice for modeling the magnitute of the short-term spectrum of speech. LPC-derived cepstral coefficients are guaranteed to discriminate between the system (e.g., vocal tract) contribution and that of the excitation. Taking into account the characteristics of the human ear, the mel-frequency cepstral coefficients (MFCCs) emerged as descriptive features of the speech spectral envelope. Similarly to MFCCs, the perceptual linear prediction coefficients (PLPs) could also be derived. The aforementioned sort of speaking tradi- tional features will be tested against agnostic-features extracted by convolu- tive neural networks (CNNs) (e.g., auto-encoders) [4]. The pattern recognition step will be based on Gaussian Mixture Model based classifiers,K-nearest neighbor classifiers, Bayes classifiers, as well as Deep Neural Networks. The Massachussets Eye and Ear Infirmary Dataset (MEEI-Dataset) [5] will be exploited. At the application level, a library for feature extraction and classification in Python will be developed. Credible publicly available resources will be 1used toward achieving our goal, such as KALDI. Comparisons will be made against [6-8].