SkillAgentSearch skills...

Milimomusic

A powerful music generation application powered by HeartMuLa foundation models.

Install / Use

/learn @mainza-ai/Milimomusic
About this skill

Quality Score

0/100

Supported Platforms

Universal

README

Milimo Music

A music generation application powered by HeartMuLa models.

[!NOTE] Cross-Platform Support: Optimized for both macOS (Apple Silicon/MPS) and Windows (CUDA).

Milimo Music Interface

🎵 Listen to a Sample Track generated with Milimo Music

Features & Architecture Deep Dive

Core Technology

Milimo Music integrates state-of-the-art AI models to provide a seamless music creation experience.

  • HeartMuLa-3B Model: The heart of the audio generation engine. HeartMuLa (Hear the Music Language) is a 3B parameter transformer model capable of generating high-fidelity music conditioned on lyrics and stylistic tags.
    • Multilingual Support: Capable of generating music with lyrics in multiple languages, including but not limited to English, Chinese, Japanese, Korean, and Spanish.
  • Audio Codec: Uses HeartCodec (12.5 Hz) for efficient and high-quality audio reconstruction.
  • Ollama Integration: Leverages local LLMs (like Llama 3) for:
    • Lyrics Generation: Automatically writes structured lyrics (Verse, Chorus, Bridge) based on a topic.
    • Prompt Enhancement: Expands simple concepts into detailed musical descriptors.
    • Auto-Titling: Generates creative titles based on the song content.
    • Inspiration Mode: Brainstorms unique song concepts and style combinations for you.

Key Capabilities

  1. Text-to-Music Generation: Create full 48kHz stereo tracks by simply describing a mood and style.

    Note: Music style integration is currently in beta. To get the best results, use only the following supported tags (HeartMuLa-3B):
    Warm, Reflection, Pop, Cafe, R&B, Keyboard, Regret, Drum machine, Electric guitar, Synthesizer, Soft, Energetic, Electronic, Self-discovery, Sad, Ballad, Longing, Meditation, Faith, Acoustic, Peaceful, Wedding, Piano, Strings, Acoustic guitar, Romantic, Drums, Emotional, Walking, Hope, Hopeful, Powerful, Epic, Driving, Rock.

  2. Lyrics-Conditioned Synthesis: The model aligns generated audio with provided lyrics, respecting prosody and structure.
  3. Track Extension: Continue generating from where a previous track left off, allowing for the creation of longer compositions segment by segment. <br> Track Extension Demo
  4. Repair Segment (Beta): Fix specific parts of your generated track without regenerating the entire song. Select a time range and let the AI rewrite just that segment while preserving the surrounding context.
  5. Training Studio (Beta): Fine-tune the HeartMuLa model on your own audio datasets directly within the app.
    • Custom Styles: Train the model to understand specific genres or artist styles (e.g., 'Afrobeat', 'MyVoice').
    • LoRA Training: Efficient low-rank adaptation training that runs locally.
    • Global Monitoring: Track training progress from anywhere in the app with the floating status widget.
  6. Real-Time Progress: Server-Sent Events (SSE) provide live feedback on the generation steps, from token inference to decoding.
  7. Smart History: Automatically saves all generated tracks, lyrics, and metadata (seed, cfg, temperature) for easy retrieval and playback.
  8. AI Co-Writer (Multi-Agent System): <br> AI Co-Writer Interface
    • Agentic Workflow: The Co-Writer is not a simple chatbot. It uses a graph of specialized Pydantic Agents working in tandem:
      • Coordinator Agent: Analyzes your request and routes it to the correct workflow (Creation vs. Editing).
      • Lyricist Agent: The creative engine that drafts content and executes complex editing operations (Update, Insert, Append).
      • StructureGuard Agent: A dedicated QA agent that validates every output against strict schemas. If the Lyricist makes a mistake, the Guard catches it and forces a retry automatically.
    • Pydantic-Native: By treating lyrics as code artifacts (Schemas), we eliminate hallucinated formatting and ensure the lyrics always fit the music generation engine perfectly.

Prerequisites

  • Conda (Anaconda or Miniconda)
  • Python 3.10+
  • Node.js 18+
  • npm or yarn
  • Ollama (Required for lyrics generation)

Setup Instructions

1. Environment Setup

It is highly recommended to use a Conda environment to manage dependencies.

conda create -n milimo python=3.12
conda activate milimo

2. Ollama Setup

  1. Download and install Ollama for your operating system.
  2. Pull a compatible model (e.g., Llama 3.2):
    ollama pull llama3.2:3b-instruct-fp16
    
  3. Ensure Ollama is running in the background:
    ollama serve 
    

3. LLM Configuration & Providers

Milimo Music supports multiple LLM providers for lyrics generation and creative prompting.

Supported Providers:

  • Ollama (Local, Default): Uses your local models via ollama serve.
  • OpenAI: Connects to proper GPT models (requires API Key).
  • Google Gemini: Uses Gemini models (requires API Key).
  • OpenRouter: Access various models like Claude, Mistral, Llama via a unified API (requires API Key).
  • DeepSeek: Direct integration with DeepSeek API.
  • LM Studio: Connects to other local inference servers compatible with OpenAI API.

Configuration:

  1. Click the Settings (Gear) icon in the sidebar.
  2. Select your desired provider tab.
  3. Enter your API Key or Base URL.
  4. Click "Save & Set Active". The app will automatically fetch available models for you.

4. HeartLib & Model Weights

Crucial Step: You must download the large model weights manually as they are excluded from the repository.

  1. Download Pretrained Models: You need to download the checkpoints into the heartlib/ckpt directory. You can use Hugging Face or ModelScope.

    Using Hugging Face CLI:

    # Install hf-hub if not present: pip install huggingface_hub[cli]
    
    hf download --local-dir './ckpt' 'HeartMuLa/HeartMuLaGen'
    hf download --local-dir './ckpt/HeartMuLa-oss-3B' 'HeartMuLa/HeartMuLa-oss-3B'
    hf download --local-dir './ckpt/HeartCodec-oss' 'HeartMuLa/HeartCodec-oss'
    

    Directory Structure Verification: After downloading, ensure your heartlib/ckpt folder looks like this:

    heartlib/ckpt/
    ├── HeartCodec-oss/
    ├── HeartMuLa-oss-3B/
    ├── gen_config.json
    └── tokenizer.json
    

5. Backend

  1. Navigate to the backend directory:

    cd ../backend
    
  2. Install dependencies (this will automatically install heartlib):

    pip install -r requirements.txt
    
  3. Run the server:

    python -m app.main
    

    The backend will start at http://localhost:8000.

6. Frontend

  1. Navigate to the frontend directory:

    cd ../frontend
    
  2. Install dependencies:

    npm install
    
  3. Start the development server:

    npm run dev
    

    The frontend will be available at http://localhost:5173.

Author

Mainza Kangombe
LinkedIn Profile

View on GitHub
GitHub Stars10
CategoryDevelopment
Updated1mo ago
Forks7

Languages

Python

Security Score

95/100

Audited on Mar 1, 2026

No findings