Voot – LLM-powered Live Translation for HarmonyOS

Voot, standing for "Voice On Top," is an intelligent simultaneous-interpretation & text translation app for HarmonyOS, powered by your own LLM / translation APIs.
It is designed with three core principles: security, privacy, and simplicity.

[!NOTE] Voot does not provide or resell any LLM/translation service.
You bring your own API keys (OpenAI, DeepL, Ollama, 豆包, etc.).

Huawei AppGallery

Voot has launched on Huawei AppGallery (Overseas) (Note: You need an oversea internet environment to access). Also, releases update will be still available on GitHub for sideloading. Even though, we strongly recommend you to follow the AppGallery listing for the latest version.

Features
Architecture
Screenshots
Getting Started
- Prerequisites
- Clone
- Run
Install Hap
Configuration
- API Providers
- Target Languages
Usage
Security & Privacy
Roadmap
Blueprints
Contributing
Model Performance
Known Issues
Acknowledgements
License
Disclaimer

Features

🔐 Secure by design
- No built-in or hosted model – you must configure your own API keys.
- API keys are stored only in the HarmonyOS sandbox, protected by face / biometric unlock.
- No third-party analytics SDKs.
🕵️ Privacy-first
- Audio is processed locally on-device for capture & pre-processing.
- Recorded audio for translation is not uploaded and is destroyed after processing.
- Only the minimal text required for translation is sent directly to the provider you configure.
🧩 Multi-provider support
- OpenAI (GPT-style chat / translation)
- DeepL
- Ollama (local LLM gateway)
- 豆包 / other custom endpoints (via configurable URL & API key)
🗣️ Simultaneous interpretation
- One-tap start/stop of “live” translation.
- Clear split between original text and translated text.
🔄 Device Continuation
- Seamlessly transfer your active translation session to another HarmonyOS device (e.g., from Phone to Tablet).
- Keeps your current transcription and translation context intact.
🖼️ Subtitles
- Floating subtitle window that works over other apps.
- Resizable and movable overlay for seamless multitasking.
📱 Desktop Widgets
- Control Card: Start/stop subtitle and interpretation directly from the home screen.
- Token Card: Monitor your API token usage without opening the app.
💨 Air Gestures
- Control translation start/stop without touching the screen.
- Ideal for hands-free operation during presentations or cooking.
✨ Text Polishing
- Improve the quality and tone of translated text.
- Refine rough translations into more natural and professional language.
📷 Scan & Translate
- Scan text from physical documents or screens using the camera.
- Instantly translate scanned text with save functionality.

Architecture

Voot/
├─ entry/
│  ├─ src/main/ets/
│  │  ├─ pages/            # ArkUI pages (Index, Configuration, Translation, Settings, etc.)
│  │  ├─ services/         # Mic + ASR services (SherpaWhisperMicService, PipSubtitleManager)
│  │  ├─ storage/          # Preference-backed stores (API config, TokenUsage, etc.)
│  │  ├─ components/       # Shared UI builders (PolicySheet, TokenUsageChart, etc.)
│  │  ├─ widget/           # Service Cards (Desktop Widgets)
│  │  ├─ entryformability/ # Widget lifecycle management
│  │  └─ workers/          # Background ASR workers for long-running capture
│  ├─ src/main/resources/  # Raw HTML, media assets, Sherpa models
│  ├─ oh-package*.json5    # Module package definitions
│  └─ build-profile.json5  # Entry module build settings
├─ AppScope/               # Application-level configuration and assets
├─ hvigorfile.ts           # Workspace hvigor build script
└─ build-profile.json5     # Global build profile

Screenshots

Getting Started

Prerequisites

HarmonyOS toolchain:
- DevEco Studio with ArkTS support
- HarmonyOS SDK (version matching the project, current: 6.0.1(21))
A HarmonyOS device or emulator
One or more API keys, for example:
- OpenAI API key
- DeepL API key
- Ollama endpoint running locally or on LAN
- 豆包 / other compatible HTTP API

Clone

git clone https://github.com/YANGZX22/Voot.git
cd Voot

Open the project in DevEco Studio.

Run

Connect a HarmonyOS device or start an emulator.
In DevEco Studio, select the run configuration corresponding to the app.
Click Run to build and deploy.

Install Hap

Or you can use Auto-installer or DevEcho Testing for installation.

[!IMPORTANT] Huawei's signing servers block IP addresses outside mainland China. To sideload software for HarmonyOS NEXT in countries/regions outside mainland China.

[!NOTE] Apps sideloaded via self-signing on HarmonyOS NEXT have a default validity period of 14 days. Completing Developer Real-Name Authentication extends this period to 180 days.

Configuration

API Providers

In the “配置 API” tab:

Choose the current provider (e.g. OpenAI, DeepL, Ollama, 豆包).
Tap “配置 API”.
For each provider, fill in:
- API URL (e.g. https://api.openai.com/v1/chat/completions, https://api-free.deepl.com/v2/translate, or your Ollama endpoint)
- API Key / Token
- Optional: custom prompt / system message used for translation.

The configuration is stored locally in the sandbox and bound to face / biometric verification when accessing/modifying.

Target Languages

In the “目标语言 / Target language” section:

Select your default output language (e.g. 中文, English, etc.).
The chosen target language is used for all translation APIs by default.

Glossary / Terminology

In the “术语库 / Glossary” menu:

Enter term pairs in the format Original = Translation (one per line).
Example:
```
HarmonyOS = 鸿蒙
AI = 人工智能
```
These terms are automatically appended to the system prompt, instructing the LLM to strictly follow your terminology.

Usage

Launch Voot on your HarmonyOS device.
Configure API:
- Go to the first tab Configuration.
- Select an API provider (OpenAI, DeepL, etc.) and enter your API Key/URL.
- Set your Target Language.
Live Translation (翻译):
- Switch to the Translation tab.
- Tap “开启麦克风” to start capturing audio.
- Speak in the source language; the app will transcribe and translate in real-time.
- Air Gestures: Wave your hand above the front camera to start/stop translation without touching the screen.
- Device Continuation: Tap the Transfer (流转) icon to move the session to another HarmonyOS device.
Text Polishing (润色):
- Switch to the Polishing tab.
- Input or paste text that needs refinement.
- The AI will improve the tone, grammar, and clarity of the text.
Scan & Translate (扫描):
- Switch to the Scan tab.
- Point the camera at a document or screen.
- The app will recognize the text and provide an instant translation.
- You can save the scanned results to History.

Security & Privacy

Short summary (see in-app privacy policy / privacy.html for details):

Audio:
- Recorded only on device for the current translation session.
- Not uploaded to our servers (we have none).
- Discarded after processing.
API Keys:
- Stored in the app sandbox.
- Protected with HarmonyOS face/biometric mechanisms.
- Never transmitted to any server except the provider you configured.
Data Flow:
- Text is sent only to your chosen provider (OpenAI / DeepL / etc.).
- No central logging, analytics, or telemetry from the developer.

Roadmap

Finished / planned / possible steps:

Subtitle (Realized ✅)
Live Window on HarmonyOS (Realized ✅)
Desktop Widgets (Realized ✅)
Token usage analytics (Realized ✅)
Glossary / Terminology Support (Realized ✅)
Device Continuation (Realized ✅)
History & Favorites (Realized ✅)
Air Gestures (Realized ✅)
Text Polishing (Realized ✅)
Scan & Translate (Realized ✅)
Pose Detection Button Dialog (Realized ✅)
Support for more LLM / translation APIs (e.g. Google Translate)
Enhanced ASR and cutoff logic
More supported original languages

Feel free to open issues or PRs with feature requests.

Blueprints

1. Audio-Direct Multimodal

Moving beyond "Speech-to-Text-to-Translation" lossy pipelines:

Direct Audio Input Sending VAD-filtered audio segments directly to multimodal models (e.g., GPT-4o Audio, Gemini 1.5 Pro).
Nuance Capture Preserving tone, emotion (sarcasm, urgency), and speaker identity which are often lost in ASR.
Feedback Loop Using the rich understanding from the multimodal engine to "feed back" into the frontend, correcting previous ASR errors or updating the context for the Fast Track.

2. Confidence Scoring & Visual Feedback

Implement a confidence scoring system that highlights transl

Voot

Install / Use

README