SkillAgentSearch skills...

StemForge

Music -- separate into stems, modify, convert to midi, add synth sounds, remix.

Install / Use

/learn @tsondo/StemForge
About this skill

Quality Score

0/100

Supported Platforms

Universal

README

<p align="center"> <img src="StemForgeLogo.png" alt="StemForge — AI-powered stem separation, MIDI extraction, audio generation, and music production tool" width="260"/> </p>

StemForge

Open-source, GPU-accelerated AI audio workstation for stem separation, MIDI extraction, audio generation, and song composition — running locally in your browser.

Build CUDA PyTorch Demucs BS--Roformer Stable Audio Open AceStep RVC Voice License

StemForge is a local, GPU-accelerated web application that chains multiple AI music pipelines into a single creative workflow. Upload a song, separate it into stems, extract MIDI, generate new audio or compose entirely new songs, transform vocals with AI voice conversion, generate or load special effect sound tracks, mix everything together, and export — all from one interface, all running on your own hardware. No cloud uploads, no subscriptions, no per-track fees.

Why StemForge?

Most AI audio tools do one thing — separate stems, or generate music, or extract MIDI. StemForge connects them all. Outputs from one pipeline flow directly into the next: separate a track into stems, extract MIDI from any stem, use those stems or MIDI as conditioning for new audio generation, compose an entirely new song with AI lyrics, transform any vocal with AI voice conversion, or generate literally any sound effect with Stable Audio, then mix and export the result. It is an open-source alternative to cloud-based stem separation services like LALAL.ai or iZotope RX, with the added ability to generate, compose, transform, and remix — not just separate.

StemForge runs entirely on your local machine with no internet connection required after initial model downloads. Your audio never leaves your computer.

Features

  • Demucs — stem separation (vocals, drums, bass, other) — 4 models including fine-tuned and MDX variants
  • BS-Roformer — high-quality AI stem separation with 2-stem vocal, 4-stem, and 6-stem (guitar + piano) models
  • MIDI extraction — polyphonic BasicPitch for instruments, faster-whisper + pitch tracking for vocals; per-stem MIDI preview via FluidSynth
  • Enhance — three-mode vocal enhancement: Clean Up (8 UVR denoise/dereverb presets), Tune (CREPE + Praat PSOLA auto-tune with key/scale snapping), Effects (planned)
  • Stable Audio Open — text-conditioned audio generation up to 600 s, with optional audio and MIDI conditioning (Synth tab)
  • SFX Stem Builder — DAW-style timeline canvas for placing audio clips with per-clip fades and volume, aligned to a reference stem (Synth tab)
  • AceStep — full AI song generation from style descriptions + lyrics, with Create, Rework, Lego, and Complete modes (Compose tab); LoRA/LoKR adapter training pipeline with live loss chart, snapshots, and export
  • RVC Voice Conversion — AI voice transformation via vendored Applio inference; 14 built-in voices, searchable HuggingFace model browser, pitch shift, F0 method selection (Compose tab → Voice mode)
  • Mix — multi-track mixer combining audio stems, MIDI-rendered tracks, synth outputs, and composed songs; per-track instrument, volume, multi-track preview, and FLAC render
  • Export — transcode any pipeline output (stems, MIDI, mix, generated audio, composed songs) to wav / flac / mp3 / ogg

Everything runs locally with deterministic environments via uv.

Tab bar: Separate · Enhance · MIDI · Synth · Compose · Mix · Export

See INSTRUCTIONS.md for a guide to every tab, or Future Plans for the roadmap.


Quick Start

git clone --recursive git@github.com:tsondo/StemForge.git
cd StemForge
uv sync
uv run python run.py
# Open http://localhost:8765

See Requirements and Install & Run below for full details including system dependencies.


Requirements

uv

StemForge uses uv to manage the Python version and all dependencies. Install it once and uv sync takes care of the rest.

Ubuntu / Debian:

curl -LsSf https://astral.sh/uv/install.sh | sh

Fedora / RHEL / CentOS:

curl -LsSf https://astral.sh/uv/install.sh | sh

Arch / Manjaro:

sudo pacman -S uv

openSUSE:

curl -LsSf https://astral.sh/uv/install.sh | sh

Any distro (pipx fallback):

pipx install uv

After installing, open a new terminal (or run source $HOME/.local/bin/env) so the uv command is on your PATH.

FFmpeg >= 5.1 (with development headers)

Required for audio decoding.

Ubuntu 22.04:

sudo add-apt-repository -y ppa:ubuntuhandbook1/ffmpeg7
sudo apt update
sudo apt install ffmpeg libavcodec-dev libavformat-dev libavdevice-dev \
    libavfilter-dev libavutil-dev libswscale-dev libswresample-dev

Ubuntu 24.04+:

sudo apt install ffmpeg libavcodec-dev libavformat-dev

Fedora:

sudo dnf install ffmpeg-free ffmpeg-free-devel

Arch / Manjaro:

sudo pacman -S ffmpeg

Other distros:

  • Ensure ffmpeg >= 5.1
  • Ensure development headers are installed

FluidSynth + GM Soundfont (required for MIDI preview and Mix tab)

Fedora:

sudo dnf install fluidsynth fluidsynth-devel fluid-soundfont-gm

Ubuntu / Debian:

sudo apt install libfluidsynth3 libfluidsynth-dev fluid-soundfont-gm

Arch / Manjaro:

sudo pacman -S fluidsynth soundfont-fluid

The GM soundfont is auto-discovered at startup. On Fedora it installs to /usr/share/soundfonts/FluidR3_GM.sf2; use the Browse button on the Mix tab to point StemForge at a different .sf2 file if needed.

jemalloc (optional, recommended for multi-user / long-running deployments)

jemalloc is a memory allocator that reduces heap fragmentation and malloc lock contention under concurrent workloads. StemForge detects and uses it automatically at startup — no configuration needed. Particularly beneficial when running with multiple users (--max-users) or leaving the server up for extended periods.

Fedora:

sudo dnf install jemalloc

Ubuntu / Debian:

sudo apt install libjemalloc-dev

Arch / Manjaro:

sudo pacman -S jemalloc

Set STEMFORGE_NO_JEMALLOC=1 in the environment to disable even when installed. macOS is not affected (jemalloc injection is Linux-only).

GPU (recommended)

  • NVIDIA GPU with driver 580+ (required for CUDA 13.0 runtime)
  • Check your driver version: nvidia-smi → top-right shows "Driver Version"
  • PyTorch 2.10.0+cu130 (pinned) will use the GPU automatically — no CUDA toolkit install needed
  • CPU-only works but is significantly slower for all pipelines

WSL (Windows Subsystem for Linux)

StemForge is a web application — audio playback happens in the browser, so no PulseAudio or sounddevice setup is needed. Install FluidSynth for MIDI preview:

sudo apt install libfluidsynth3 libfluidsynth-dev fluid-soundfont-gm

Then follow the standard Install & Run steps below.


macOS Support

macOS on Apple Silicon (M1/M2/M3) is supported via MPS acceleration. Intel Macs will run CPU-only.

Setup

Step 1 — Copy the macOS pyproject file before installing:

cp pyproject.toml.MAC pyproject.toml
uv sync

Step 2 — Install FluidSynth:

brew install fluid-synth

Step 3 — Set the library path so pyfluidsynth can find it:

export DYLD_LIBRARY_PATH="$(brew --prefix fluid-synth)/lib:$DYLD_LIBRARY_PATH"

Add the export line to your ~/.zshrc so it persists across sessions.

macOS limitations

  • mdx_extra_q Demucs model is not available on macOS (requires diffq, which does not build on macOS). The model is automatically hidden from the UI.
  • BasicPitch MIDI extraction may have limited functionality on macOS — ai-edge-litert (the TFLite runtime) is a Linux-only package. The MIDI tab will surface a clear error if this is attempted.
  • Vocal MIDI (faster-whisper) works on macOS.
  • Stable Audio Open generation works on macOS via MPS.
  • AceStep (Compose tab) works on macOS — the subprocess handles MPS detection independently.

Performance

MPS acceleration is used automatically when available (Apple Silicon). Expect significantly faster inference than CPU-only, but slower than CUDA on a discrete GPU.


HuggingFace Authentication (required for the Synth tab)

The Synth tab uses Stable Audio Open 1.0, a gated model. You must accept its license and authenticate before StemForge can download it. See the Synth section in INSTRUCTIONS.md for usage details.

Step 1 — Accept the license

Visit https://huggingface.co/stabilityai/stable-audio-open-1.0, sign in with a free HuggingFace account, and click Agree and access repository.

Step 2 — Create a token

Go to https://huggingface.co/settings/tokens and create a token with Read access.

Step 3 — Log in locally

huggingface-cli login

Paste your token when prompted. It is saved to ~/.cache/huggingface/token and picked up automatically by StemForge on every subsequent run — you only need to do this once.

The model weights (~2 GB) are downloaded on the first Synth run and cached under ~/.cache/stemforge/musicgen/.


Install & Run

Step 1 — Install system dependencies (see Requirements above): uv, FFmpeg, FluidSynth + GM soundfont.

Step 2 — Clone (use --recursive to pull the AceStep submodule and its nested vendor):

git clone 
View on GitHub
GitHub Stars6
CategoryDevelopment
Updated3d ago
Forks2

Languages

Python

Security Score

90/100

Audited on Apr 3, 2026

No findings