Yap

🎙️ Vibe typing with your voice. Local real-time speech-to-text that auto-types into any app. Powered by MLX on Apple Silicon. No cloud, no cap.

Generate Convert Improve

Install / Use

/learn @TorchFun-AI/Yap

About this skill

Quality Score

0/100

README

<p align="center"> <img src="docs/logo.png" alt="Yap" width="120" /> </p> <h1 align="center">Yap</h1> <p align="center"> <strong>The voice input layer for agentic coding.</strong><br/> Speak in any language. It transcribes, corrects, translates, and types — right where your cursor is. </p> <p align="center"> <a href="LICENSE"><img src="https://img.shields.io/badge/license-CC%20BY--NC%204.0-blue.svg" alt="License" /></a> <img src="https://img.shields.io/badge/platform-macOS%20(Apple%20Silicon)-black?logo=apple" alt="Platform" /> <img src="https://img.shields.io/badge/runtime-100%25%20local-brightgreen" alt="Local" /> </p>

🎬 Demo

🎥 Demo video coming soon — stay tuned!

💡 Inspired by the agentic coding movement — like OpenClaw's founder voice-chatting with 10+ agents to build software. Yap is the missing input layer that makes talking to your dev tools feel native.

🤔 Why Yap?

The agentic coding era is here. You're talking to Claude Code, Cursor, Copilot — but you're still typing every prompt with your fingers.

Your voice is 3x faster than your keyboard. Yap bridges the gap.

🗣️ Voice-first workflow — Talk to your agents, your terminal, your browser. Yap types it out.
🔒 100% local — On-device VAD + ASR via MLX. No cloud. No data leaves your machine.
🌍 Multilingual — Speak Chinese, English, Japanese, Korean, and more. Real-time translation built in.
✨ Smart correction — LLM-powered spoken → written style conversion. Your voice, but polished.

⚡ How It Works

Yap lives as a floating ball on your screen. Toggle input mode, and it listens:

🎙️ Voice ──→ 🔇 VAD ──→ 🧠 ASR ──→ 💬 LLM ──→ ⌨️ Input
             Silero      MLX       Correct    Types into
             detects     on-device  & Translate active app
             speech      transcribe (optional)

Models auto-download from HuggingFace on first launch. Zero config to get started.

✨ Features

| | Feature | Description | |---|---------|-------------| | 🎙️ | Multilingual Voice Input | Chinese, English, Japanese, and more — switch on the fly | | 🌐 | Real-time Translation | Speak in one language, type in another | | ✍️ | Formal Correction | Spoken → written style, powered by any LLM | | 🖥️ | Universal Input | Works with any app — Claude Code, Cursor, VS Code, Terminal, browser, Slack... | | 🫧 | Floating Ball UI | Always-on-top, draggable, with live waveform visualization | | 🔒 | Fully Local | On-device ASR, no cloud dependency, your data stays yours | | 🌏 | i18n Menu | 中文 / English interface |

🚀 Quick Start

Prerequisites

macOS with Apple Silicon (M1/M2/M3/M4)
Node.js 18+
Python 3.10 – 3.12

Rust and uv will be installed automatically by the setup script if missing.

Development

git clone https://github.com/TorchFun-AI/Yap.git && cd Yap

# One-click setup (install all dependencies + dev environment)
./setup.sh

# Terminal 1 — Python AI backend
cd src-backend && uv run python main.py

# Terminal 2 — Tauri + Vue dev server
make dev

Production Build

# Build .app bundle (compiles backend + Tauri app)
./build.sh

Output in src-tauri/target/release/bundle/.

🏗️ Architecture

┌─────────────────┐     ┌─────────────────┐     ┌─────────────────┐
│   Vue 3 UI      │◄───►│   Tauri Core    │◄───►│  Python AI      │
│   (Webview)     │ IPC │   (Rust)        │ WS  │  (FastAPI)      │
└─────────────────┘     └─────────────────┘     └─────────────────┘
                              │                        │
                              ▼                        ▼
                        ┌───────────┐           ┌───────────┐
                        │ Keyboard  │           │ VAD + ASR │
                        │ Simulation│           │   + LLM   │
                        └───────────┘           └───────────┘

| Layer | Stack | |-------|-------| | Frontend | Vue 3 + TypeScript + Ant Design Vue + Pinia | | Core | Tauri 2 (Rust) | | Backend | Python + FastAPI + Silero VAD + MLX Audio |

🔧 LLM Configuration

Yap uses any OpenAI-compatible API for text correction and translation. Configure in Settings:

API Key
Base URL (e.g. https://api.openai.com/v1, or a local Ollama endpoint)
Model name

This is optional — without it, Yap still does voice-to-text perfectly fine.

📄 License

CC BY-NC 4.0 — Free to use, modify, and share. Not for commercial use.

Related Skills

node-connect

342.5k

Diagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps

frontend-design

85.3k

Create distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.

openai-whisper-api

342.5k

Transcribe audio via OpenAI Audio Transcriptions API (Whisper).

qqbot-media

342.5k

QQBot 富媒体收发能力。使用 <qqmedia> 标签，系统根据文件扩展名自动识别类型（图片/语音/视频/文件）。