SkillAgentSearch skills...

Voicecloner

Voice cloning desktop app using Qwen3-TTS - Rust/Iced frontend with Python/FastAPI backend

Install / Use

/learn @adibhanna/Voicecloner
About this skill

Quality Score

0/100

Supported Platforms

Universal

README

VoiceCloner

A desktop application for voice design and voice cloning powered by Qwen3-TTS.

Features

  • Voice Design - Create custom voices from natural language descriptions
  • Voice Cloning - Clone any voice from just 3 seconds of audio
  • Built-in Recording - Record your voice directly in the app
  • 10 Languages - English, Chinese, Japanese, Korean, German, French, Russian, Portuguese, Spanish, Italian
  • Local Processing - All AI processing happens on your machine

Quick Start

Prerequisites

  • macOS 11+, Windows 10+, or Linux
  • Python 3.11+
  • Rust (for building from source)
  • NVIDIA GPU with 8GB+ VRAM (recommended) or CPU (slower)

Development Setup

# Clone the repository
git clone https://github.com/adibhanna/voicecloner
cd voicecloner

# Run setup script (installs Python deps, builds Rust)
./scripts/setup.sh

# Run the app
cargo run

Build Release App (macOS)

# Build the app bundle
./scripts/build-macos.sh

# The app will be at: target/bundle/VoiceCloner.app
open target/bundle/VoiceCloner.app

# Optionally create a DMG for distribution
./scripts/create-dmg.sh

How It Works

When you launch VoiceCloner:

  1. The app automatically starts a local Python backend server
  2. The backend loads Qwen3-TTS models (auto-downloaded on first use)
  3. You can design voices, clone voices, or generate speech
  4. All processing happens locally on your machine

System Requirements

Minimum (CPU mode)

  • 16GB RAM (32GB recommended for 1.7B models)
  • 15GB free disk space
  • Microphone for voice cloning

Recommended (GPU mode)

  • NVIDIA GPU with 16GB+ VRAM (for 1.7B models)
  • NVIDIA GPU with 8GB+ VRAM (for 0.6B models)
  • 32GB RAM
  • 20GB free disk space

Project Structure

voicecloner/
├── src/                    # Rust frontend (iced GUI)
│   ├── main.rs
│   ├── app.rs              # Main application
│   ├── ui/                 # UI panels
│   ├── audio/              # Recording & playback
│   ├── backend/            # Backend client & process manager
│   └── state/              # App state & persistence
├── backend/                # Python backend (FastAPI + Qwen3-TTS)
│   ├── main.py             # API server
│   ├── tts_engine.py       # TTS model wrapper
│   └── requirements.txt
└── scripts/                # Build scripts
    ├── setup.sh
    ├── build-macos.sh
    └── create-dmg.sh

Acknowledgments

This project is powered by Qwen3-TTS from Qwen Team. Qwen3-TTS provides:

  • High-quality text-to-speech synthesis
  • Voice cloning from short audio samples
  • Voice design from natural language descriptions
  • Support for 10+ languages

License

MIT

View on GitHub
GitHub Stars4
CategoryDevelopment
Updated2mo ago
Forks0

Languages

Rust

Security Score

70/100

Audited on Jan 26, 2026

No findings