Summarize 📝 — Chrome Side Panel + CLI

Fast summaries from URLs, files, and media. Works in the terminal, a Chrome Side Panel and Firefox Sidebar.

Highlights

Chrome Side Panel chat (streaming agent + history) inside the sidebar.
YouTube slides: screenshots + OCR + transcript cards, timestamped seek, OCR/Transcript toggle.
Media-aware summaries: auto‑detect video/audio vs page content.
Streaming Markdown + metrics + cache‑aware status.
CLI supports URLs, files, podcasts, YouTube, audio/video, PDFs.

Feature overview

URLs, files, and media: web pages, PDFs, images, audio/video, YouTube, podcasts, RSS.
Slide extraction for video sources (YouTube/direct media) with OCR + timestamped cards.
Transcript-first media flow: published transcripts when available, then Groq/ONNX/whisper.cpp/AssemblyAI/Gemini/OpenAI/FAL transcription fallback when not.
Streaming output with Markdown rendering, metrics, and cache-aware status.
Local, paid, and free models: OpenAI‑compatible local endpoints, paid providers, plus an OpenRouter free preset.
Output modes: Markdown/text, JSON diagnostics, extract-only, metrics, timing, and cost estimates.
Smart default: if content is shorter than the requested length, we return it as-is (use --force-summary to override).

Get the extension (recommended)

Summarize extension screenshot

One‑click summarizer for the current tab. Chrome Side Panel + Firefox Sidebar + local daemon for streaming Markdown.

Chrome Web Store: Summarize Side Panel

YouTube slide screenshots (from the browser):

Summarize YouTube slide screenshots

Beginner quickstart (extension)

Install the CLI (choose one):
- npm (cross‑platform): npm i -g @steipete/summarize
- Homebrew (macOS arm64): brew install steipete/tap/summarize
Install the extension (Chrome Web Store link above) and open the Side Panel.
The panel shows a token + install command. Run it in Terminal:
- summarize daemon install --token <TOKEN>

Why a daemon/service?

The extension can’t run heavy extraction inside the browser. It talks to a local background service on 127.0.0.1 for fast streaming and media tools (yt‑dlp, ffmpeg, OCR, transcription).
The service autostarts (launchd/systemd/Scheduled Task) so the Side Panel is always ready.

If you only want the CLI, you can skip the daemon install entirely.

Notes:

Summarization only runs when the Side Panel is open.
Auto mode summarizes on navigation (incl. SPAs); otherwise use the button.
Daemon is localhost-only and requires a shared token; rerunning summarize daemon install --token <TOKEN> adds another paired browser token instead of invalidating the old one.
Autostart: macOS (launchd), Linux (systemd user), Windows (Scheduled Task).
Tip: configure free via summarize refresh-free (needs OPENROUTER_API_KEY). Add --set-default to set model=free.

Step-by-step install: apps/chrome-extension/README.md
Architecture + troubleshooting: docs/chrome-extension.md
Firefox compatibility notes: apps/chrome-extension/docs/firefox.md

Slides (extension)

Select Video + Slides in the Summarize picker.
Slides render at the top; expand to full‑width cards with timestamps.
Click a slide to seek the video; toggle Transcript/OCR when OCR is significant.
Requirements: yt-dlp + ffmpeg for extraction; tesseract for OCR. Missing tools show an in‑panel notice.

Advanced (unpacked / dev)

Build + load the extension (unpacked):
- Chrome: pnpm -C apps/chrome-extension build
  - chrome://extensions → Developer mode → Load unpacked
  - Pick: apps/chrome-extension/.output/chrome-mv3
- Firefox: pnpm -C apps/chrome-extension build:firefox
  - about:debugging#/runtime/this-firefox → Load Temporary Add-on
  - Pick: apps/chrome-extension/.output/firefox-mv3/manifest.json
Open Side Panel/Sidebar → copy token.
Install daemon in dev mode:
- pnpm summarize daemon install --token <TOKEN> --dev

CLI

Summarize CLI screenshot

Install

Requires Node 22+.

npx (no install):

npx -y @steipete/summarize "https://example.com"

npm (global):

npm i -g @steipete/summarize

npm (library / minimal deps):

npm i @steipete/summarize-core

import { createLinkPreviewClient } from "@steipete/summarize-core/content";

Homebrew (custom tap):

brew install steipete/tap/summarize

Homebrew availability depends on the current tap formula for your architecture. If Homebrew install fails on Intel/x64, use the npm global install above.

Optional local dependencies

Install these if you want media-heavy features:

ffmpeg: required for --slides and many local media/transcription flows
yt-dlp: required for YouTube slide extraction and some remote media flows
tesseract: optional OCR for --slides-ocr
Optional cloud transcription providers:
- GROQ_API_KEY
- ASSEMBLYAI_API_KEY
- GEMINI_API_KEY / GOOGLE_GENERATIVE_AI_API_KEY / GOOGLE_API_KEY
- OPENAI_API_KEY
- FAL_KEY

macOS (Homebrew):

brew install ffmpeg yt-dlp
brew install tesseract # optional, for --slides-ocr

If --slides is enabled and these tools are missing, Summarize warns and continues without slides.

CLI vs extension

CLI only: just install via npm/Homebrew and run summarize ... (no daemon needed).
Chrome/Firefox extension: install the CLI and run summarize daemon install --token <TOKEN> so the Side Panel can stream results and use local tools.

Quickstart

summarize "https://example.com"

Inputs

URLs or local paths:

summarize "/path/to/file.pdf" --model google/gemini-3-flash
summarize "https://example.com/report.pdf" --model google/gemini-3-flash
summarize "/path/to/audio.mp3"
summarize "/path/to/video.mp4"

Stdin (pipe content using -):

echo "content" | summarize -
pbpaste | summarize -
# binary stdin also works (PDF/image/audio/video bytes)
cat /path/to/file.pdf | summarize -

Notes:

Stdin has a 50MB size limit
The - argument tells summarize to read from standard input
Text stdin is treated as UTF-8 text (whitespace-only input is rejected as empty)
Binary stdin is preserved as raw bytes and file type is auto-detected when possible
Useful for piping clipboard content or command output

YouTube (supports youtube.com and youtu.be):

summarize "https://youtu.be/dQw4w9WgXcQ" --youtube auto

Podcast RSS (transcribes latest enclosure):

summarize "https://feeds.npr.org/500005/podcast.xml"

Apple Podcasts episode page:

summarize "https://podcasts.apple.com/us/podcast/2424-jelly-roll/id360084272?i=1000740717432"

Spotify episode page (best-effort; may fail for exclusives):

summarize "https://open.spotify.com/episode/5auotqWAXhhKyb9ymCuBJY"

Output length

--length controls how much output we ask for (guideline), not a hard cap.

summarize "https://example.com" --length long
summarize "https://example.com" --length 20k

Presets: short|medium|long|xl|xxl
Character targets: 1500, 20k, 20000
Optional hard cap: --max-output-tokens <count> (e.g. 2000, 2k)
- Provider/model APIs still enforce their own maximum output limits.
- If omitted, no max token parameter is sent (provider default).
- Prefer --length unless you need a hard cap.
Short content: when extracted content is shorter than the requested length, the CLI returns the content as-is.
- Override with --force-summary to always run the LLM.
Minimums: --length numeric values must be >= 50 chars; --max-output-tokens must be >= 16.
Preset targets (source of truth: packages/core/src/prompts/summary-lengths.ts):
- short: target ~900 chars (range 600-1,200)
- medium: target ~1,800 chars (range 1,200-2,500)
- long: target ~4,200 chars (range 2,500-6,000)
- xl: target ~9,000 chars (range 6,000-14,000)
- xxl: target ~17,000 chars (range 14,000-22,000)

What file types work?

Best effort and provider-dependent. These usually work well:

text/* and common structured text (.txt, .md, .json, .yaml, .xml, ...)
- Text-like files are inlined into the prompt for better provider compatibility.
PDFs: application/pdf (provider support varies; Google is the most reliable here)
Images: image/jpeg, image/png, image/webp, image/gif
Audio/Video: audio/*, video/* (local audio/video files MP3/WAV/M4A/OGG/FLAC/MP4/MOV/WEBM automatically transcribed, when supported by the model)

Notes:

If a provider rejects a media type, the CLI fails fast with a friendly message.
xAI models do not support attaching generic files (like PDFs) via the AI SDK; use Google/OpenAI/Anthropic for those.

Model ids

Use gateway-style ids: <provider>/<model>.

Examples:

openai/gpt-5-mini
anthropic/claude-sonnet-4-5
xai/grok-4-fast-non-reasoning
google/gemini-3-flash
zai/glm-4.7
openrouter/openai/gpt-5-mini (force OpenRouter)

Note: some models/providers do not support streaming or certain file media types. When that happens, the CLI prints a friendly error (or auto-disables streaming for that model when supported by the provider).

Limits

Text inputs over 10 MB are rejected before tokenization.
Text prompts are preflighted against the model input limit (LiteLLM catalog), using a GPT tokenizer.

Common flags

summarize <input> [flags]

Use summarize --help or summarize help for the full help text.

--model <provider/model>: which model to use (defaults to auto)
--model auto: a

Summarize

Install / Use

README