🗣️ yap

A CLI for on-device speech transcription using Speech.framework on macOS 26.

Demo

Usage

USAGE: yap transcribe [--locale <locale>] [--censor] <input-file> [--txt] [--srt] [--vtt] [--json] [--output-file <output-file>] [--max-length <max-length>] [--word-timestamps]

ARGUMENTS:
  <input-file>            Path to an audio or video file to transcribe.

OPTIONS:
  -l, --locale <locale>   (default: current)
  --censor                Replaces certain words and phrases with a redacted form.
  --txt/--srt/--vtt/--json
                          Output format for the transcription. (default: --txt)
  -o, --output-file <output-file>
                          Path to save the transcription output. If not provided,
                          output will be printed to stdout.
  -m, --max-length <max-length>
                          Maximum sentence length in characters. (default: 40)
  --word-timestamps       Include word-level timestamps in JSON output.
  -h, --help              Show help information.

Installation

Homebrew

brew install yap

Mint

mint install finnvoor/yap

Examples

Transcribe a YouTube video using yap and yt-dlp

yt-dlp "https://www.youtube.com/watch?v=ydejkIvyrJA" -x --exec yap

Summarize a video using yap and llm

yap video.mp4 | uvx llm -m mlx-community/Llama-3.2-1B-Instruct-4bit 'Summarize this transcript:'

Create SRT captions for a video

yap video.mp4 --srt -o captions.srt

Generate WebVTT subtitles

yap video.mp4 --vtt -o subtitles.vtt

Export JSON with word-level timestamps

yap video.mp4 --json --word-timestamps -o transcript.json

Live System Audio

yap listen transcribes system audio in real time — anything playing on your computer.

USAGE: yap listen [--locale <locale>] [--censor] [--txt] [--srt] [--vtt] [--json] [--max-length <max-length>] [--word-timestamps]

OPTIONS:
  -l, --locale <locale>   (default: current)
  --censor                Replaces certain words and phrases with a redacted form.
  --txt/--srt/--vtt/--json
                          Output format for the transcription. (default: --txt)
  -m, --max-length <max-length>
                          Maximum sentence length in characters for timed output
                          formats. (default: 40)
  --word-timestamps       Include word-level timestamps in JSON output.
  -h, --help              Show help information.

Screen Recording permission is required. Grant it to your terminal app in System Settings > Privacy & Security > Screen Recording.

Examples

# Transcribe system audio live
yap listen

# Pipe live transcription to another tool
yap listen | uvx llm 'Translate this to French:'

# Save system audio as VTT subtitles
yap listen --vtt > captions.vtt

Listen and Dictate

yap listen-and-dictate transcribes both system audio and microphone input simultaneously — perfect for meeting transcription.

USAGE: yap listen-and-dictate [--locale <locale>] [--censor] [--txt] [--srt] [--vtt] [--json] [--max-length <max-length>] [--mic-label <mic-label>] [--system-label <system-label>] [--word-timestamps]

OPTIONS:
  -l, --locale <locale>   (default: current)
  --censor                Replaces certain words and phrases with a redacted form.
  --txt/--srt/--vtt/--json
                          Output format for the transcription. (default: --txt)
  -m, --max-length <max-length>
                          Maximum sentence length in characters for timed output
                          formats. (default: 40)
  --mic-label <mic-label> Speaker label for microphone audio in timed output
                          formats. (default: Mic)
  --system-label <system-label>
                          Speaker label for system audio in timed output
                          formats. (default: System)
  --word-timestamps       Include word-level timestamps in JSON output.
  -h, --help              Show help information.

Both Screen Recording and Microphone permissions are required. Grant them to your terminal app in System Settings > Privacy & Security.

Examples

# Transcribe a video call (both sides)
yap listen-and-dictate

# Save a meeting transcript
yap listen-and-dictate > meeting.txt

# Save a meeting transcript as VTT with speaker labels
yap listen-and-dictate --vtt > meeting.vtt

# Use custom speaker labels
yap listen-and-dictate --vtt --mic-label Alice --system-label Bob > meeting.vtt

Dictation

yap dictate transcribes microphone input in real time.

USAGE: yap dictate [--locale <locale>] [--censor] [--txt] [--srt] [--vtt] [--json] [--max-length <max-length>] [--word-timestamps]

OPTIONS:
  -l, --locale <locale>   (default: current)
  --censor                Replaces certain words and phrases with a redacted form.
  --txt/--srt/--vtt/--json
                          Output format for the transcription. (default: --txt)
  -m, --max-length <max-length>
                          Maximum sentence length in characters for timed output
                          formats. (default: 40)
  --word-timestamps       Include word-level timestamps in JSON output.
  -h, --help              Show help information.

Microphone permission is required. Grant it to your terminal app in System Settings > Privacy & Security > Microphone.

Examples

# Dictate from your microphone
yap dictate

# Dictate and save to a file
yap dictate > notes.txt

MCP Server

yap includes an MCP server that exposes a transcribe tool, allowing any MCP-compatible agent to transcribe audio and video files.

Claude Code

claude mcp add yap -- yap mcp

Codex

codex mcp add yap -- yap mcp

Yap

Install / Use

README

🗣️ yap

Usage

Installation

Homebrew

Mint

Examples

Transcribe a YouTube video using yap and yt-dlp

Summarize a video using yap and llm

Create SRT captions for a video

Generate WebVTT subtitles

Export JSON with word-level timestamps

Live System Audio

Examples

Listen and Dictate

Examples

Dictation

Examples

MCP Server

Claude Code

Codex

Related Skills