Yapper

AI dictation for macOS. Speech-to-text runs locally on your Mac, nothing leaves your machine. Optional AI cleanup via OpenAI, Anthropic, or Ollama.

Generate Convert Improve

Install / Use

/learn @ahmedlhanafy/Yapper

About this skill

Quality Score

0/100

README

<p align="center"> <img src="yapper-icon.png" width="128" height="128" alt="Yapper icon"> <br> <h1 align="center">Yapper</h1> </p>

Dictation app for macOS. Talk, and it types. Whisper runs locally on your Mac. If you want an LLM to clean up what you said, plug in OpenAI, Anthropic, or run Ollama on your own hardware.

How it works

Press a hotkey, talk, let go. Text shows up wherever your cursor is.

Whisper handles the transcription right on your machine (defaults to the large-v3-turbo model). If you want the output cleaned up, polished, or reformatted, an LLM does that as a second pass. You pick which one.

There are seven built-in modes, or make your own:

| Mode | What it does | |------|-------------| | Voice to Text | Raw transcription, nothing added or removed | | Email | Turns your rambling into something you'd actually send | | Message | Casual cleanup, just the rough edges | | Note | Bullet points from a stream of consciousness | | Meeting | Pulls out action items, decisions, who said what | | Smart | Looks at what app you're in and writes accordingly |

Email, Message, Note, Meeting, and Smart send the transcript through an LLM (OpenAI, Anthropic, or Ollama). Voice to Text stays entirely local.

The AI modes also read your clipboard, selected text, and active app for context. So if you're in a code editor, it formats differently than if you're in Mail.

Get it running

macOS 13+ and Xcode Command Line Tools (xcode-select --install).

./scripts/setup-whisper.sh   # builds whisper.cpp, downloads the large-v3-turbo model (~1.5GB)
swift build
.build/debug/Yapper

macOS will ask for microphone and accessibility permissions. Grant both.

Left-click the menubar waveform icon to start recording (right-click for the menu). Or press Option+Space from anywhere. Talk, then press it again. Text lands at your cursor.

For a release .app bundle:

./build.sh                   # builds + signs -> dist/Yapper.app
YAPPER_DIST=1 ./build.sh     # ad-hoc signed for distribution

Intel build:

./build-intel.sh             # cross-compiles whisper.cpp for x86_64

DMG installer:

./create-dmg.sh              # creates dist/Yapper-0.1.0-apple-silicon.dmg

AI providers

You choose what processes your text. Or nothing at all.

| Provider | Where it runs | Setup | |----------|--------------|-------| | None (Voice to Text mode) | Your Mac | Nothing needed | | OpenAI | Cloud | Add API key in Settings | | Anthropic | Cloud | Add API key in Settings | | Ollama | Your Mac | Install Ollama, pull a model |

Settings detects if Ollama is running and shows your local models in a dropdown. The base URL is configurable if you're running it on another machine.

API keys are stored in a local file at ~/Library/Application Support/Yapper/.

The cycle-to-record thing

Hit the cycle modes hotkey. The recording window appears with the next mode name. Keep hitting it to flip through modes. Once you stop, recording kicks in after about a second and a half.

You can also use the Fn (globe) key as a modifier in your hotkey combos.

Project layout

Sources/Yapper/
├── YapperApp.swift              # entry point, menubar setup
├── Core/
│   ├── Audio/AudioEngine        # mic recording (AVFoundation)
│   ├── ASR/WhisperService       # whisper.cpp transcription
│   ├── AI/AIProcessor           # OpenAI, Anthropic API calls
│   ├── AI/OllamaService         # local Ollama model discovery + chat
│   ├── Context/ContextCapture   # clipboard, selection, active app
│   ├── Output/TextInserter      # pastes text via Accessibility APIs
│   ├── Storage/StorageManager   # JSON settings, API keys, history
│   ├── RecordingCoordinator     # record -> transcribe -> AI -> insert
│   └── HotkeyManager           # global hotkeys + Fn key support
├── Models/                      # Mode, Session, Settings
├── Views/                       # SwiftUI (Settings, Recording, History)
└── Resources/                   # icon assets, Info.plist

Swift Package Manager. Whisper.cpp linked as a C library through Vendor/CWhisper.

Working on it

Add a mode: Sources/Yapper/Models/Mode.swift, define a static Mode, append to allBuiltIn.

Change what the AI does with your text: edit the instructions field on any mode definition.

Add a new AI provider: new case in AIProvider, implement the API call in AIProcessor.swift.

Hotkeys: Settings > Shortcuts in the app, or edit Settings.swift for defaults.

Tests: swift test. Quick manual check: record something, does it transcribe, does the text end up in TextEdit, do settings stick after a restart.

Troubleshooting

| Problem | Fix | |---------|-----| | Mic not working | System Settings > Privacy & Security > Microphone, enable Yapper | | Text not inserting | Same but Accessibility instead of Microphone | | Model not found | Run ./scripts/setup-whisper.sh or download from Settings > Advanced | | Build cache weirdness | swift package clean && swift build | | AI not responding | Check your key in Settings > API Keys. Ollama running? | | AI taking forever | It'll bail after 15 seconds and paste whatever Whisper gave it |

FAQ

Do I need internet? No. Voice to Text is completely offline. The AI modes need a connection unless you're using Ollama locally.

Which Whisper model? Large-v3-turbo is the default and the sweet spot. Tiny if you're on an older machine. Large if you want maximum accuracy and don't mind waiting.

Works everywhere? Most apps. Password fields and a few sandboxed apps won't let it paste. macOS security restriction, not a bug.

Why does macOS block the app? We don't have an Apple Developer ID ($99/year). Run xattr -cr /Applications/Yapper.app after installing, or use the curl install method on the website.

Docs

Thanks

Whisper.cpp by Georgi Gerganov
OpenAI Whisper
Anthropic Claude
The Ollama project

MIT License. Created by Ahmed Elhanafy.

Related Skills

node-connect

344.4k

Diagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps

frontend-design

99.2k

Create distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.

openai-whisper-api

344.4k

Transcribe audio via OpenAI Audio Transcriptions API (Whisper).

qqbot-media

344.4k

QQBot 富媒体收发能力。使用 <qqmedia> 标签，系统根据文件扩展名自动识别类型（图片/语音/视频/文件）。