VoiceCloner

A desktop application for voice design and voice cloning powered by Qwen3-TTS.

Features

Voice Design - Create custom voices from natural language descriptions
Voice Cloning - Clone any voice from just 3 seconds of audio
Built-in Recording - Record your voice directly in the app
10 Languages - English, Chinese, Japanese, Korean, German, French, Russian, Portuguese, Spanish, Italian
Local Processing - All AI processing happens on your machine

Quick Start

Prerequisites

macOS 11+, Windows 10+, or Linux
Python 3.11+
Rust (for building from source)
NVIDIA GPU with 8GB+ VRAM (recommended) or CPU (slower)

Development Setup

# Clone the repository
git clone https://github.com/adibhanna/voicecloner
cd voicecloner

# Run setup script (installs Python deps, builds Rust)
./scripts/setup.sh

# Run the app
cargo run

Build Release App (macOS)

# Build the app bundle
./scripts/build-macos.sh

# The app will be at: target/bundle/VoiceCloner.app
open target/bundle/VoiceCloner.app

# Optionally create a DMG for distribution
./scripts/create-dmg.sh

How It Works

When you launch VoiceCloner:

The app automatically starts a local Python backend server
The backend loads Qwen3-TTS models (auto-downloaded on first use)
You can design voices, clone voices, or generate speech
All processing happens locally on your machine

System Requirements

Minimum (CPU mode)

16GB RAM (32GB recommended for 1.7B models)
15GB free disk space
Microphone for voice cloning

Recommended (GPU mode)

NVIDIA GPU with 16GB+ VRAM (for 1.7B models)
NVIDIA GPU with 8GB+ VRAM (for 0.6B models)
32GB RAM
20GB free disk space

Project Structure

voicecloner/
├── src/                    # Rust frontend (iced GUI)
│   ├── main.rs
│   ├── app.rs              # Main application
│   ├── ui/                 # UI panels
│   ├── audio/              # Recording & playback
│   ├── backend/            # Backend client & process manager
│   └── state/              # App state & persistence
├── backend/                # Python backend (FastAPI + Qwen3-TTS)
│   ├── main.py             # API server
│   ├── tts_engine.py       # TTS model wrapper
│   └── requirements.txt
└── scripts/                # Build scripts
    ├── setup.sh
    ├── build-macos.sh
    └── create-dmg.sh

Acknowledgments

This project is powered by Qwen3-TTS from Qwen Team. Qwen3-TTS provides:

High-quality text-to-speech synthesis
Voice cloning from short audio samples
Voice design from natural language descriptions
Support for 10+ languages

License

MIT

Voicecloner

Install / Use

README

VoiceCloner

Features

Quick Start

Prerequisites

Development Setup

Build Release App (macOS)

How It Works

System Requirements

Minimum (CPU mode)

Recommended (GPU mode)

Project Structure

Acknowledgments

License