3 skills found
jishengpeng / WavTokenizer[ICLR 2025] SOTA discrete acoustic codec models with 40/75 tokens per second for audio language modeling
jishengpeng / WavChatA Survey of Spoken Dialogue Models (60 pages)
mbzuai-oryx / LLMVoXLLMVoX: Autoregressive Streaming Text-to-Speech Model for Any LLM