Orca

Made in Vancouver, Canada by Picovoice

Orca is an on-device streaming text-to-speech engine that is designed for use with LLMs, enabling zero-latency voice assistants. Orca is:

Private; All speech synthesis runs locally.
Cross-Platform:
- Linux (x86_64), macOS (x86_64, arm64), Windows (x86_64, arm64)
- Android and iOS
- Raspberry Pi (3, 4, 5)
- Chrome, Safari, Firefox, and Edge

Orca
- Table of Contents
- Overview
- AccessKey
- Demos
- SDKs
  - Python
  - .NET
  - iOS
  - C
  - Web
  - Android
  - Node.js
- Releases
- FAQ

Language Support

Orca Streaming Text-to-Speech currently supports the following languages:
- English
- French
- German
- Japanese
- Korean
- Spanish
- Italian
- Portuguese
Support for additional languages is available for commercial customers on a case-by-case basis.

Overview

Orca input and output streaming synthesis

Orca is a streaming text-to-speech engine designed specifically for LLMs. It can process incoming text streams in real-time, generating audio continuously, i.e., as the LLM produces tokens, Orca generates speech in parallel. This enables seamless conversations with voice assistants, eliminating any audio delays.

Orca also supports single synthesis mode, where a complete text is synthesized in a single call to the Orca engine.

Text input

Orca supports a wide range of English characters, including letters, numbers, symbols, and punctuation marks. You can get a list of all supported characters by calling the valid_characters() method provided in the Orca SDK you are using. Pronunciations of characters or words not supported by this list can be achieved with custom pronunciations.

Custom pronunciations

Orca supports custom pronunciations via a specific syntax embedded within the input text. This feature allows users to define unique pronunciations for words using the following format: {word|pronunciation}. The pronunciation is expressed in ARPAbet phonemes. The following are examples of sentences using custom pronunciations:

"This is a {custom|K AH S T AH M} pronunciation"
"{read|R IY D} this as {read|R EH D}, please."
"I {live|L IH V} in {Sevilla|S EH V IY Y AH}. We have great {live|L AY V} sports!"

Language and Voice

Orca Streaming Text-to-Speech can synthesize speech in different languages and with a variety of voices, each of which is characterized by a model file (.pv) located in lib/common. The language and gender of the speaker is indicated in the file name.

To synthesize speech with a specific language and voice, provide the associated model file as an argument to the Orca init function.

Speech control

Orca provides a set of parameters to control the synthesized speech. The following table lists the available parameters:

| Parameter | Default | Description | |:------------:|:-------:|:---------------------------------------------------------------------------------------------------------------------------------------------------------:| | speech rate | 1.0 | Speed of generated speech. Valid values are within [0.7, 1.3]. <br/>Higher (lower) values generate faster (slower) speech. | | random state | random | Sets the random state for sampling during synthesis. <br/>Valid values are all non-negative integers. <br/>If not provided, a random seed will be chosen. |

Audio output

Orca's synthesized speech is delivered as either raw audio data or a WAV file. Output audio will be in single-channel 16-bit PCM format and can be directly fed into a playback audio system.

AccessKey

AccessKey is your authentication and authorization token for deploying Picovoice SDKs, including Orca. Anyone who is using Picovoice needs to have a valid AccessKey. You must keep your AccessKey secret. You will need internet connectivity to validate your AccessKey with Picovoice license servers even though the text-to-speech engine is running 100% offline.

AccessKey also verifies that your usage is within the limits of your account. You can see your usage limits and real-time usage on your Picovoice Console Profile. To continue using Picovoice after your trial or renew and adjust your usage limits, please reach out to our Enterprise Sales Team or your existing Picovoice contact.

Demos

Python Demos

Install the demo package:

pip3 install pvorcademo

Run the streaming demo:

orca_demo_streaming --access_key ${ACCESS_KEY} --model_path ${MODEL_PATH} --text_to_stream ${TEXT}

Run the single synthesis demo:

orca_demo --access_key ${ACCESS_KEY} --model_path ${MODEL_PATH} --text ${TEXT} --output_path ${WAV_OUTPUT_PATH}

Replace ${ACCESS_KEY} with yours obtained from Picovoice Console, ${MODEL_PATH} with a path to any of the model files available under lib/common, ${TEXT} with the text to be synthesized, and ${WAV_OUTPUT_PATH} with a path to an output WAV file.

For more information about Python demos go to demo/python.

.NET Demos

From demo/dotnet/OrcaDemo build the demo:

dotnet build -c StreamingDemo.Release

Run the streaming demo:

dotnet build -c StreamingDemo.Release -- --access_key ${ACCESS_KEY} --language ${LANGUAGE} --gender ${GENDER} --text_to_stream ${TEXT}

Run the single synthesis demo:

dotnet build -c FileDemo.Release -- --access_key ${ACCESS_KEY} --language ${LANGUAGE} --gender ${GENDER} --text ${TEXT} --output_path ${WAV_OUTPUT_PATH}

Replace ${ACCESS_KEY} with yours obtained from Picovoice Console, ${TEXT} with the text to be synthesized, and ${WAV_OUTPUT_PATH} with a path to an output WAV file.

For more information about .NET demos go to demo/dotnet.

iOS Demo

Open OrcaDemo.xcodeproj in XCode.
Replace ${YOUR_ACCESS_KEY_HERE} in the file ViewModel.swift with your AccessKey.
Go to Product > Scheme and select the scheme for the language and gender you would like to run the demo in (e.g. enFemaleDemo -> English Demo with a female voice, deMaleDemo -> German demo with a male voice).
Run the demo with a simulator or connected iOS device.
Once the demo app has started, enter the text you wish to synthesize in the text box area, and press the Synthesize button to synthesize the text and play audio.

For more information about iOS demos go to demo/ios.

C Demos

Build the streaming demo:

cmake -S demo/c/ -B demo/c/build && cmake --build demo/c/build --target orca_demo_streaming

Run the demo:

./demo/c/build/orca_demo_streaming -l ${LIBRARY_PATH} -m ${MODEL_PATH} -a ${ACCESS_KEY} -t ${TEXT} -o ${OUTPUT_PATH}

For more information about C demos go to demo/c.

Web Demos

From demo/web run the following in the terminal:

yarn
yarn start ${LANGUAGE} ${GENDER}

(or)

npm install
npm run start ${LANGUAGE} ${GENDER}

Replace ${LANGUAGE} and ${GENDER} with the language and gender you would like to run the demo in. Available languages are en, es, de, fr, ko, ja, it, pt, and available genders are male and female.

Open http://localhost:5000 in your browser

Orca

Install / Use

README

Orca

Table of Contents

Language Support

Overview

Orca input and output streaming synthesis

Text input

Custom pronunciations

Language and Voice

Speech control

Audio output

AccessKey

Demos

Python Demos

.NET Demos

iOS Demo

C Demos

Web Demos