Oobleck
open soundstream-ish VAE codecs for downstream neural audio synthesis
Install / Use
/learn @Harmonai-org/OobleckREADME
Oobleck
"The more you compress it, the harder it gets"
Possible backronyms:
- "Out-of-the-box Latent Encoder Construction Kit"
- "Over-optimized Latent Encoder Construction Kit"
(Open to other names, we can vote at the end.)
What is it?
MIT-licensed soundstream-ish VAE audio codecs for downstream neural audio synthesis.
We will be creating at least
- a continuous VAE (for downstream audio diffusion, etc)
- a vector quantized VAE (for downstream MusicLM, etc)
- a spherical VAE (for quantum circuits, etc)
We will validate them using
- MUSHRA testing protocol with expert listeners for a subjective performance measure
- visqol and si-snr expert listeners for an objective performance measure
- FAD?
- ...
- (waiting for the first models to be trained)
We will be experimenting with different loss functions
- perceptual loss, etc
- place loss functions into auraloss and import it
Installation
OOBLECK_VERSION=develop pip install git+https://github.com/Harmonai-org/oobleck.git
Usage
Instantiation of an OOBLECK autoencoder corresponding to a given .gin file can be done following
import gin
import torch
from oobleck import AudioAutoEncoder
gin.parse_config_file("base/base.gin")
model = AudioAutoEncoder()
inputs = {"waveform": torch.randn(1, 1, 2**16)}
outputs = model.loss(inputs)
print(outputs.keys())
# >>> dict_keys(['waveform', 'latent', 'reconstruction', \
# >>> 'score_waveform', 'score_reconstruction', 'features_reconstruction', \
# >>> 'features_waveform', 'generator_loss', 'MultiResolutionSTFTLoss', \
# >>> 'discriminator_loss'])
Related Skills
node-connect
342.5kDiagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps
frontend-design
85.3kCreate distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.
openai-whisper-api
342.5kTranscribe audio via OpenAI Audio Transcriptions API (Whisper).
qqbot-media
342.5kQQBot 富媒体收发能力。使用 <qqmedia> 标签,系统根据文件扩展名自动识别类型(图片/语音/视频/文件)。
