Oobleck

open soundstream-ish VAE codecs for downstream neural audio synthesis

Generate Convert Improve

Install / Use

/learn @Harmonai-org/Oobleck

About this skill

Quality Score

0/100

README

Oobleck

"The more you compress it, the harder it gets"

Possible backronyms:

"Out-of-the-box Latent Encoder Construction Kit"
"Over-optimized Latent Encoder Construction Kit"

(Open to other names, we can vote at the end.)

What is it?

MIT-licensed soundstream-ish VAE audio codecs for downstream neural audio synthesis.

We will be creating at least

a continuous VAE (for downstream audio diffusion, etc)
a vector quantized VAE (for downstream MusicLM, etc)
a spherical VAE (for quantum circuits, etc)

We will validate them using

MUSHRA testing protocol with expert listeners for a subjective performance measure
visqol and si-snr expert listeners for an objective performance measure
FAD?
...
(waiting for the first models to be trained)

We will be experimenting with different loss functions

perceptual loss, etc
place loss functions into auraloss and import it

Installation

OOBLECK_VERSION=develop pip install git+https://github.com/Harmonai-org/oobleck.git

Usage

Instantiation of an OOBLECK autoencoder corresponding to a given .gin file can be done following

import gin
import torch

from oobleck import AudioAutoEncoder

gin.parse_config_file("base/base.gin")
model = AudioAutoEncoder()

inputs = {"waveform": torch.randn(1, 1, 2**16)}
outputs = model.loss(inputs)

print(outputs.keys())

# >>> dict_keys(['waveform', 'latent', 'reconstruction', \
# >>> 'score_waveform', 'score_reconstruction', 'features_reconstruction', \
# >>> 'features_waveform', 'generator_loss', 'MultiResolutionSTFTLoss', \
# >>> 'discriminator_loss'])

Related Skills

node-connect

342.5k

Diagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps

frontend-design

85.3k

Create distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.

openai-whisper-api

342.5k

Transcribe audio via OpenAI Audio Transcriptions API (Whisper).

qqbot-media

342.5k

QQBot 富媒体收发能力。使用 <qqmedia> 标签，系统根据文件扩展名自动识别类型（图片/语音/视频/文件）。