HeartMuse
Local AI music generator with smart lyrics: Gradio web UI for HeartMuLa + Ollama/OpenAI, tags, history, and high-fidelity audio.
Install / Use
/learn @strnad/HeartMuseREADME
HeartMuse - AI Music Generator with Smart Lyrics
HeartMuse is an intuitive web-based interface for creating high-quality AI-generated music completely locally on your machine. It combines the power of HeartMuLa (state-of-the-art open-source music generation model) with intelligent lyrics generation using local LLMs, giving you complete creative control without relying on cloud services.

Listen to an Example
"Firewall Warrior" — a funny song about IT security, fully generated by HeartMuse (HeartMuLa 3B RL)
https://github.com/strnad/heartmuse/raw/master/docs/examples/firewall_warrior.mp3
Get Started in 2 Minutes
# 1. Clone and install
git clone https://github.com/strnad/heartmuse.git
cd heartmuse
./install.sh # Linux/macOS
# or: install.bat # Windows
# 2. Run
./run.sh # Linux/macOS
# or: run.bat # Windows
Open http://localhost:7860 and start creating!
That's it! The installer automatically creates a virtual environment, clones the HeartMuLa library, and installs all dependencies. AI models download automatically on first generation.
Features at a Glance
- Smart Text Generation - AI writes lyrics, titles, tags, and descriptions via Ollama (local) or OpenAI
- HeartMuLa Music Generation - two model variants (RL and Base), up to 240s songs
- Style Transfer - upload reference audio to influence the musical style (MuQ-MuLan)
- Audio Transcription - extract lyrics from existing audio (HeartTranscriptor)
- Batch Variants - generate 1-10 variations of the same song in one go
- Edit Instructions - refine generated text with natural language commands
- History & Playlist - browse, replay, and manage all past generations
- Live Memory Monitor - real-time GPU/RAM usage tracking
- Seed Control - reproduce exact results with fixed seeds
- 100% Local - everything runs on your machine with Ollama (or use OpenAI API)
Why HeartMuse?
Flexible Text Generation - Total Creative Control
HeartMuse features a modular, field-by-field generation system that adapts to your creative workflow. Every field has its own "Generate/Enhance" checkbox, giving you granular control over what AI generates and what you write yourself.
Four Independent Fields, Endless Combinations
| Field | What It Does | Generate/Enhance Checkbox | |-------|-------------|---------------------------| | Description | Your creative brief for the song | AI can expand vague ideas into detailed descriptions | | Title | Song name | AI suggests catchy, thematic titles | | Lyrics | Full song text with sections | AI writes/extends verses, choruses, bridges | | Tags | Music production tags for HeartMuLa | AI suggests genre, mood, instruments, tempo |
Each checkbox is independent - mix and match any combination:
- All four checked - AI generates everything from scratch
- Only lyrics checked - AI writes lyrics, you control title and tags
- Title + tags checked - You write lyrics, AI handles metadata
- Nothing checked - Use exactly what you entered, no AI changes
Context-Aware Intelligence
The magic happens when you provide partial content:
+----------------------------------+--------------------------------+
| You provide: | AI understands: |
+----------------------------------+--------------------------------+
| Description: "upbeat summer" | -> Uses as creative direction |
| Title: (empty) | -> Will generate |
| Lyrics: "Feel the sunshine" | -> Context for generation |
| Tags: (empty) | -> Will generate |
| |
| Checkboxes: [x] Title [x] Tags [ ] Lyrics |
| |
| Result: AI generates title and tags that match YOUR lyrics |
| and description. Your lyrics stay untouched. |
+-------------------------------------------------------------------+
Unchecked fields with values become context - AI reads them but won't modify them. This ensures coherent results that respect your creative input.
Smart Lyrics Preservation
When extending existing lyrics, HeartMuse offers two levels of protection:
| Syntax | Protection Level | Use Case |
|--------|-----------------|----------|
| "exact text" | 100% locked - never changed, character-for-character | Your signature lines, hooks |
| regular text | Preserved - meaning kept, minor polish allowed | Draft verses you want improved |
Example:
[chorus]
"This exact line will never change!"
This line might get slight improvements
Result: The quoted line is untouchable. The unquoted line may be refined while keeping its meaning.
Extend, Don't Replace
When "Generate/Enhance" is checked for lyrics that already have content:
- AI adds new sections (verses, bridge, outro)
- AI completes unfinished sections
- AI preserves everything you wrote
- AI never deletes your content
- AI never rewrites existing sections from scratch
Edit Instructions
Use the Edit Instructions field to give natural language commands for refining generated content:
Examples:
- "Change the name Eva to Ela"
- "Add two more verses"
- "Rework the chorus to be more upbeat"
- "Replace all references to rain with sunshine"
The AI applies your edits while preserving the rest of the content. Quoted text remains protected even during edits.
Duration-Aware Lyrics
Set your target song length, and AI adjusts lyrics accordingly:
| Duration | AI Behavior | |----------|-------------| | Under 60s | Short and punchy - 1-2 sections | | 60-120s | Standard structure - verse, chorus, verse | | Over 120s | Extended - multiple verses, bridge, outro |
From-Scratch Creative Mode
Leave all fields empty and click "Generate Text" - AI creates a completely original song concept:
- Invents a unique theme and story
- Writes complete lyrics with proper structure
- Suggests a fitting title
- Recommends appropriate musical tags
Each generation is different - use this for inspiration or when you want to be surprised!
Real-World Workflow Examples
Example 1: Full AI Generation
Input: (everything empty, all checkboxes checked)
Output: Complete original song - title, lyrics, tags, ready for music generation
Example 2: Your Lyrics, AI Metadata
Input: Your complete lyrics in the Lyrics field
[ ] Description [ ] Lyrics [x] Title [x] Tags
Output: AI suggests a perfect title and musical tags that match YOUR lyrics
Example 3: Expand a Hook
Input: Lyrics: "[chorus]\nDancing in the moonlight"
[x] Lyrics checked
Output: AI keeps your chorus and adds verses, bridge, outro around it
Example 4: Protected Refrain + AI Verses
Input: Lyrics:
[chorus]
"We are the champions, my friend"
"And we'll keep on fighting till the end"
[verse]
(something about struggle and victory)
[x] Lyrics checked
Output: Quoted chorus: 100% unchanged
Verse placeholder: AI writes full lyrics about struggle and victory
Example 5: Enhance Vague Description
Input: Description: "sad piano song"
[x] Description checked
Output: Description expanded to: "A melancholic piano ballad with introspective
lyrics about lost love, featuring gentle arpeggios and emotional vocal
delivery in a minor key"
Example 6: Iterative Refinement
Round 1: Generate title + lyrics from description
Round 2: Use Edit Instructions: "Make the chorus more energetic"
Round 3: Tweak tags manually, generate music
Round 4: Generate 3 batch variants, pick the best one
Style Transfer (Experimental)
Upload a reference audio track to influence the musical style of your generation. HeartMuse uses MuQ-MuLan to extract a style embedding from the reference and inject it into the HeartMuLa generation process.
- Style Strength slider (0-10x) - control how strongly the reference influences output
- Runs on CPU - no GPU memory impact, works alongside HeartMuLa
- Supports common audio formats (MP3, WAV, FLAC, etc.)
Works best with clear, well-produced reference tracks. The model captures high-level style characteristics (genre, mood, instrumentation) rather than copying melodies.
Audio Transcription
The Transcribe tab lets you extract lyrics from existing audio recordings using HeartTranscriptor (Whisper-based model).
- Upload any audio file and get transcribed lyrics
- Click "Send to Generator" to use transcribed lyrics as a starting point
- For best results, use source-separated vocal tracks (e.g., via Demucs)
Toggle with
TRANSCRIPTION=true/falsein.env
Batch Generation & Reproducibility
- Batch Variants (1-10) - generate multiple versions from the same lyrics/tags in one run
- Seed Control - set a specific seed to reproduce exact results, or use
-1for random - Post-generation Statistics - view timing breakdown (text gen, style extraction, music gen per variant), GPU peak VRAM, model variant, and seed value
History & Playlist
The History tab keeps all your generations organized:
- Playlist Player - sequential or shuffle playback with next/prev controls and seeking
- History Cards - title, description, tags, audio player, and generation parameters for each song
- Actions - Load to Generator (reuse settings), Load for Edit (reuse settings + seed), Delete
- Pagination - browse through all past generations, 10 per page
All generations are stored as MP3 + JSON metadata in the output/ directory.
--
Related Skills
node-connect
344.4kDiagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps
frontend-design
99.2kCreate distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.
openai-whisper-api
344.4kTranscribe audio via OpenAI Audio Transcriptions API (Whisper).
qqbot-media
344.4kQQBot 富媒体收发能力。使用 <qqmedia> 标签,系统根据文件扩展名自动识别类型(图片/语音/视频/文件)。
