babylon.cpp

Babylon.cpp is a C and C++ library for grapheme-to-phoneme (G2P) conversion and neural text-to-speech (TTS) synthesis. All inference runs locally using ONNX Runtime — no internet connection is required and no data leaves the host machine.

It supports two TTS engines:

Kokoro — High-quality multi-voice neural TTS at 24 kHz with 54+ voices across multiple languages.
VITS — End-to-end neural TTS; compatible with Piper models.

Phonemization is handled by Open Phonemizer backed by a ~130,000-entry pronunciation dictionary.

Platforms

| Platform | Architecture | Library | |----------|------------------------|----------------------| | Linux | x86_64 | libbabylon.so | | macOS | Universal (x86_64 + arm64) | libbabylon.dylib | | Windows | x86_64 | babylon.dll | | Android | arm64-v8a, x86_64 | libbabylon.so |

Building

Requires CMake 3.18+, a C++17 compiler, and Git.

git clone --recursive https://github.com/Mobile-Artificial-Intelligence/babylon.cpp.git
cd babylon.cpp
make cli

This builds the library, CLI binary, and ONNX Runtime from source. All output goes to bin/.

| Target | Description | |----------------|------------------------------------------| | make lib | Library only | | make cli | Library + CLI binary + runtime files | | make debug | CLI build in Debug mode | | make android | Cross-compile for Android (requires NDK) |

CLI

The babylon binary provides three subcommands. It auto-loads config.json from the same directory as the executable on startup.

# Phonemize text to IPA
babylon phonemize "Hello world"
# → hɛloʊ wɜːld

# Synthesise speech (Kokoro, default)
babylon tts "Hello world" -o hello.wav
babylon tts --voice en-US-nova --speed 1.2 "Hello world"

# Synthesise speech (VITS)
babylon tts --vits "Hello world" -o hello.wav

# Start the REST API server and web frontend
babylon serve
babylon serve --host 0.0.0.0 --port 9000

Global flags (apply to all subcommands):

--config <path>           Load a JSON config file
--phonemizer-model <path> Phonemizer ONNX model
--dictionary <path>       Pronunciation dictionary JSON
--kokoro-model <path>     Kokoro ONNX model
--kokoro-voice <name>     Default Kokoro voice
--kokoro-voices <dir>     Directory of voice .bin files
--vits-model <path>       VITS ONNX model

REST API

When running babylon serve, the following endpoints are available:

| Method | Path | Description | |--------|--------------|----------------------------------------| | GET | / | Web frontend (HTML) | | GET | /status | Engine availability and voice count | | GET | /voices | List of available Kokoro voice names | | POST | /phonemize | Convert text to IPA or token IDs | | POST | /tts | Synthesise speech, returns audio/wav |

POST /tts body:

{
  "text":   "Hello world",
  "engine": "kokoro",
  "voice":  "en-US-heart",
  "speed":  1.0
}

POST /phonemize body:

{ "text": "Hello world", "tokens": false }

C API

#include "babylon.h"

int main(void) {
    babylon_g2p_options_t opts = {
        .dictionary_path = "models/dictionary.json",
        .use_punctuation = 1,
    };

    babylon_g2p_init("models/open-phonemizer.onnx", opts);
    babylon_kokoro_init("models/kokoro-quantized.onnx");

    babylon_kokoro_tts(
        "Hello world",
        "models/voices/en-US-heart.bin",
        1.0f,
        "output.wav"
    );

    babylon_kokoro_free();
    babylon_g2p_free();
    return 0;
}

C++ API

#include "babylon.h"

int main() {
    OpenPhonemizer::Session phonemizer(
        "models/open-phonemizer.onnx",
        "models/dictionary.json",
        /* use_punctuation = */ true
    );

    Kokoro::Session kokoro("models/kokoro-quantized.onnx");

    std::string phonemes = phonemizer.phonemize("Hello world");
    kokoro.tts(phonemes, "models/voices/en-US-heart.bin", 1.0f, "output.wav");

    return 0;
}

Documentation

A full manual is available in docs/manual.tex, covering the complete C and C++ API reference, CLI options, REST API, build instructions, and model configuration.

Babylon

Install / Use

README