519 skills found · Page 1 of 18
supercollider / SupercolliderAn audio server, programming language, and IDE for sound synthesis and algorithmic composition.
gpt-omni / Mini Omniopen-source multimodal large language model that can hear, talk while thinking. Featuring real-time end-to-end speech input and streaming audio output conversational capabilities.
DAMO-NLP-SG / Video LLaMA[EMNLP 2023 Demo] Video-LLaMA: An Instruction-tuned Audio-Visual Language Model for Video Understanding
chaosprint / GlicolGraph-oriented live coding language and music/audio DSP library written in Rust
lucidrains / Audiolm PytorchImplementation of AudioLM, a SOTA Language Modeling Approach to Audio Generation out of Google Research, in Pytorch
LAION-AI / CLAPContrastive Language-Audio Pretraining
QwenLM / Qwen2 AudioThe official repo of Qwen2-Audio chat & pretrained large audio language model proposed by Alibaba Cloud.
PortAudio / PortaudioPortAudio is a cross-platform, open-source C language library for real-time audio input and output.
QwenLM / Qwen AudioThe official repo of Qwen-Audio (通义千问-Audio) chat & pretrained large audio language model proposed by Alibaba Cloud.
lyuchenyang / Macaw LLMMacaw-LLM: Multi-Modal Language Modeling with Image, Video, Audio, and Text Integration
vkohaupt / VokoscreenNGvokoscreenNG is a powerful screencast creator in many languages to record the screen, an area or a window (Linux only). Recording of audio from multiple sources is supported. With the built-in camera support, you can make your video more personal. Other tools such as systray, magnifying glass, countdown, timer, Showclick and Halo support will help
stepfun-ai / Step Audio2Step-Audio 2 is an end-to-end multi-modal large language model designed for industry-strength audio understanding and speech conversation.
jishengpeng / WavTokenizer[ICLR 2025] SOTA discrete acoustic codec models with 40/75 tokens per second for audio language modeling
kylestetz / Slang🎤 a simple audio programming language implemented in JS
thestk / StkThe Synthesis ToolKit in C++ (STK) is a set of open source audio signal processing and algorithmic synthesis classes written in the C++ programming language.
YouG-o / YouTube No TranslationWeb browser add-on that prevents YouTube's automatic translations! It keeps titles, descriptions, and audio in their original language.
xid32 / SoundMindWe introduce the Audio Logical Reasoning (ALR) dataset, consisting of 6,446 text-audio annotated samples specifically designed for complex reasoning tasks. Building on this resource, we propose SoundMind, a rule-based reinforcement learning (RL) algorithm tailored to endow audio language models (ALMs) with deep bimodal reasoning abilities.
OFA-Sys / ONE PEACEA general representation model across vision, audio, language modalities. Paper: ONE-PEACE: Exploring One General Representation Model Toward Unlimited Modalities
NVIDIA / Audio FlamingoPyTorch implementation of Audio Flamingo: Series of Advanced Audio Understanding Language Models
X-LANCE / SLAM LLMA Framework for Speech, Language, Audio, Music Processing with Large Language Model