StemForge
Music -- separate into stems, modify, convert to midi, add synth sounds, remix.
Install / Use
/learn @tsondo/StemForgeREADME
StemForge
Open-source, GPU-accelerated AI audio workstation for stem separation, MIDI extraction, audio generation, and song composition — running locally in your browser.
StemForge is a local, GPU-accelerated web application that chains multiple AI music pipelines into a single creative workflow. Upload a song, separate it into stems, extract MIDI, generate new audio or compose entirely new songs, transform vocals with AI voice conversion, generate or load special effect sound tracks, mix everything together, and export — all from one interface, all running on your own hardware. No cloud uploads, no subscriptions, no per-track fees.
Why StemForge?
Most AI audio tools do one thing — separate stems, or generate music, or extract MIDI. StemForge connects them all. Outputs from one pipeline flow directly into the next: separate a track into stems, extract MIDI from any stem, use those stems or MIDI as conditioning for new audio generation, compose an entirely new song with AI lyrics, transform any vocal with AI voice conversion, or generate literally any sound effect with Stable Audio, then mix and export the result. It is an open-source alternative to cloud-based stem separation services like LALAL.ai or iZotope RX, with the added ability to generate, compose, transform, and remix — not just separate.
StemForge runs entirely on your local machine with no internet connection required after initial model downloads. Your audio never leaves your computer.
Features
- Demucs — stem separation (vocals, drums, bass, other) — 4 models including fine-tuned and MDX variants
- BS-Roformer — high-quality AI stem separation with 2-stem vocal, 4-stem, and 6-stem (guitar + piano) models
- MIDI extraction — polyphonic BasicPitch for instruments, faster-whisper + pitch tracking for vocals; per-stem MIDI preview via FluidSynth
- Enhance — three-mode vocal enhancement: Clean Up (8 UVR denoise/dereverb presets), Tune (CREPE + Praat PSOLA auto-tune with key/scale snapping), Effects (planned)
- Stable Audio Open — text-conditioned audio generation up to 600 s, with optional audio and MIDI conditioning (Synth tab)
- SFX Stem Builder — DAW-style timeline canvas for placing audio clips with per-clip fades and volume, aligned to a reference stem (Synth tab)
- AceStep — full AI song generation from style descriptions + lyrics, with Create, Rework, Lego, and Complete modes (Compose tab); LoRA/LoKR adapter training pipeline with live loss chart, snapshots, and export
- RVC Voice Conversion — AI voice transformation via vendored Applio inference; 14 built-in voices, searchable HuggingFace model browser, pitch shift, F0 method selection (Compose tab → Voice mode)
- Mix — multi-track mixer combining audio stems, MIDI-rendered tracks, synth outputs, and composed songs; per-track instrument, volume, multi-track preview, and FLAC render
- Export — transcode any pipeline output (stems, MIDI, mix, generated audio, composed songs) to wav / flac / mp3 / ogg
Everything runs locally with deterministic environments via uv.
Tab bar: Separate · Enhance · MIDI · Synth · Compose · Mix · Export
See INSTRUCTIONS.md for a guide to every tab, or Future Plans for the roadmap.
Quick Start
git clone --recursive git@github.com:tsondo/StemForge.git
cd StemForge
uv sync
uv run python run.py
# Open http://localhost:8765
See Requirements and Install & Run below for full details including system dependencies.
Requirements
uv
StemForge uses uv to manage the Python version and all dependencies.
Install it once and uv sync takes care of the rest.
Ubuntu / Debian:
curl -LsSf https://astral.sh/uv/install.sh | sh
Fedora / RHEL / CentOS:
curl -LsSf https://astral.sh/uv/install.sh | sh
Arch / Manjaro:
sudo pacman -S uv
openSUSE:
curl -LsSf https://astral.sh/uv/install.sh | sh
Any distro (pipx fallback):
pipx install uv
After installing, open a new terminal (or run source $HOME/.local/bin/env) so the uv
command is on your PATH.
FFmpeg >= 5.1 (with development headers)
Required for audio decoding.
Ubuntu 22.04:
sudo add-apt-repository -y ppa:ubuntuhandbook1/ffmpeg7
sudo apt update
sudo apt install ffmpeg libavcodec-dev libavformat-dev libavdevice-dev \
libavfilter-dev libavutil-dev libswscale-dev libswresample-dev
Ubuntu 24.04+:
sudo apt install ffmpeg libavcodec-dev libavformat-dev
Fedora:
sudo dnf install ffmpeg-free ffmpeg-free-devel
Arch / Manjaro:
sudo pacman -S ffmpeg
Other distros:
- Ensure ffmpeg >= 5.1
- Ensure development headers are installed
FluidSynth + GM Soundfont (required for MIDI preview and Mix tab)
Fedora:
sudo dnf install fluidsynth fluidsynth-devel fluid-soundfont-gm
Ubuntu / Debian:
sudo apt install libfluidsynth3 libfluidsynth-dev fluid-soundfont-gm
Arch / Manjaro:
sudo pacman -S fluidsynth soundfont-fluid
The GM soundfont is auto-discovered at startup.
On Fedora it installs to
/usr/share/soundfonts/FluidR3_GM.sf2; use the Browse button on the Mix tab
to point StemForge at a different .sf2 file if needed.
jemalloc (optional, recommended for multi-user / long-running deployments)
jemalloc is a memory allocator that reduces heap fragmentation and malloc lock
contention under concurrent workloads. StemForge detects and uses it automatically
at startup — no configuration needed. Particularly beneficial when running with
multiple users (--max-users) or leaving the server up for extended periods.
Fedora:
sudo dnf install jemalloc
Ubuntu / Debian:
sudo apt install libjemalloc-dev
Arch / Manjaro:
sudo pacman -S jemalloc
Set STEMFORGE_NO_JEMALLOC=1 in the environment to disable even when installed.
macOS is not affected (jemalloc injection is Linux-only).
GPU (recommended)
- NVIDIA GPU with driver 580+ (required for CUDA 13.0 runtime)
- Check your driver version:
nvidia-smi→ top-right shows "Driver Version" - PyTorch 2.10.0+cu130 (pinned) will use the GPU automatically — no CUDA toolkit install needed
- CPU-only works but is significantly slower for all pipelines
WSL (Windows Subsystem for Linux)
StemForge is a web application — audio playback happens in the browser, so no PulseAudio or sounddevice setup is needed. Install FluidSynth for MIDI preview:
sudo apt install libfluidsynth3 libfluidsynth-dev fluid-soundfont-gm
Then follow the standard Install & Run steps below.
macOS Support
macOS on Apple Silicon (M1/M2/M3) is supported via MPS acceleration. Intel Macs will run CPU-only.
Setup
Step 1 — Copy the macOS pyproject file before installing:
cp pyproject.toml.MAC pyproject.toml
uv sync
Step 2 — Install FluidSynth:
brew install fluid-synth
Step 3 — Set the library path so pyfluidsynth can find it:
export DYLD_LIBRARY_PATH="$(brew --prefix fluid-synth)/lib:$DYLD_LIBRARY_PATH"
Add the export line to your ~/.zshrc so it persists across sessions.
macOS limitations
mdx_extra_qDemucs model is not available on macOS (requiresdiffq, which does not build on macOS). The model is automatically hidden from the UI.- BasicPitch MIDI extraction may have limited functionality on macOS —
ai-edge-litert(the TFLite runtime) is a Linux-only package. The MIDI tab will surface a clear error if this is attempted. - Vocal MIDI (faster-whisper) works on macOS.
- Stable Audio Open generation works on macOS via MPS.
- AceStep (Compose tab) works on macOS — the subprocess handles MPS detection independently.
Performance
MPS acceleration is used automatically when available (Apple Silicon). Expect significantly faster inference than CPU-only, but slower than CUDA on a discrete GPU.
HuggingFace Authentication (required for the Synth tab)
The Synth tab uses Stable Audio Open 1.0, a gated model. You must accept its license and authenticate before StemForge can download it. See the Synth section in INSTRUCTIONS.md for usage details.
Step 1 — Accept the license
Visit https://huggingface.co/stabilityai/stable-audio-open-1.0, sign in with a free HuggingFace account, and click Agree and access repository.
Step 2 — Create a token
Go to https://huggingface.co/settings/tokens and create a token with Read access.
Step 3 — Log in locally
huggingface-cli login
Paste your token when prompted. It is saved to ~/.cache/huggingface/token and
picked up automatically by StemForge on every subsequent run — you only need to do
this once.
The model weights (~2 GB) are downloaded on the first Synth run and cached under
~/.cache/stemforge/musicgen/.
Install & Run
Step 1 — Install system dependencies (see Requirements above): uv, FFmpeg, FluidSynth + GM soundfont.
Step 2 — Clone (use --recursive to pull the AceStep submodule and its nested vendor):
git clone
