Summarize
Video transcript summarization from multiple sources (YouTube, Instagram, TikTok, Twitter, Reddit, Facebook, Google Drive, Dropbox, and local files). Works with any OpenAI-compatible LLM provider (even locally hosted).
Install / Use
/learn @martinopiaggi/SummarizeREADME
Video Transcript Summarizer
Transcribe and summarize videos from YouTube, Instagram, TikTok, Twitter, Reddit, Facebook, Google Drive, Dropbox, and local files.
Works with any OpenAI-compatible LLM provider, including locally hosted endpoints.
Interfaces
| Interface | Command |
|-----------|---------|
| CLI | python -m summarizer --source <source> |
| Streamlit GUI | python -m streamlit run app.py |
| Docker | docker compose up -d -> http://localhost:8501 |
| Agent skill | .agent/skills/summarize/SKILL.md for agent access to the CLI |
How It Works
+--------------------+
| Video URL/Path |
+---------+----------+
|
v
+---------+----------+
| Source Type? |
+---------+----------+
|
+-----------------+-------------+
| | |
| X.com/IG Local File
YouTube TikTok Google Drive
| etc. Dropbox
| | |
v +----+-----+ |
+------+----------+ | Cobalt | |
| Captions Exist? | +----+-----+ |
+----+----+-------+ | |
Yes No | |
| +--------------+--------+----+
| |
| v
| +--------+--------+
| | Whisper |
| | endpoint? |
| +--------+--------+
| |
| +-----------+-----------+
| | |
| Cloud Whisper Local Whisper
| | |
| +----------+------------+
| |
+---------------------------+
|
Transcript
|
v
+------------+----------+
summarizer.yaml -> | Prompt + LLM |
prompts.json -> | Merge |
.env -> +------------+----------+
|
v
+------+-------+
| Output |
+--------------+
summarizer.yaml: Provider settings (base_url,model,chunk-size) and defaults.env: API keys matched by URL keywordprompts.json: Summary style templates
Notes:
- Cloud Whisper uses Groq Cloud API and requires a Groq API key
- The Docker image does not include Local Whisper and is aimed at lightweight VPS deployment
Installation and Usage
Step 0 - CLI installation:
git clone https://github.com/martinopiaggi/summarize.git
cd summarize
pip install -e .
Step 1 - Run the CLI:
python -m summarizer --source "https://youtube.com/watch?v=VIDEO_ID"
The summary is saved to summaries/watch_YYYYMMDD_HHMMSS.md.
Streamlit GUI
python -m streamlit run app.py
Visit port 8501.
Docker
git clone https://github.com/martinopiaggi/summarize.git
cd summarize
# Create [.env](./.env) with your API keys, then:
docker compose up -d
Open http://localhost:8501 for the GUI. Summaries are saved to ./summaries/.
CLI via Docker: docker compose run --rm summarizer python -m summarizer --source "URL"
Cobalt standalone: docker compose -f docker-compose.cobalt.yml up -d
Configuration
Providers (summarizer.yaml)
Define your LLM providers and defaults. CLI flags override everything.
default_provider: gemini
providers:
gemini:
base_url: https://generativelanguage.googleapis.com/v1beta/openai
model: gemini-2.5-flash-lite
chunk-size: 128000
groq:
base_url: https://api.groq.com/openai/v1
model: openai/gpt-oss-20b
ollama:
base_url: http://localhost:11434/v1
model: qwen3:8b
openrouter:
base_url: https://openrouter.ai/api/v1
model: google/gemini-2.0-flash-001
defaults:
prompt-type: Questions and answers
chunk-size: 10000
parallel-calls: 30
max-tokens: 4096
audio-speed: 1.0
use-proxy: false
output-dir: summaries
API Keys (.env)
# Required for Cloud Whisper transcription
groq = gsk_YOUR_KEY
# LLM providers (choose one or more)
openai = sk-proj-YOUR_KEY
generativelanguage = YOUR_GOOGLE_KEY
deepseek = YOUR_DEEPSEEK_KEY
openrouter = YOUR_OPENROUTER_KEY
perplexity = YOUR_PERPLEXITY_KEY
hyperbolic = YOUR_HYPERBOLIC_KEY
# Optional: Webshare credentials
# - YouTube transcript fetching uses them automatically when present
# - pytubefix audio downloads also require `defaults.use-proxy: true`
WEBSHARE_PROXY_USERNAME = YOUR_WEBSHARE_USERNAME
WEBSHARE_PROXY_PASSWORD = YOUR_WEBSHARE_PASSWORD
If you pass an endpoint URL with --base-url, the API key is matched from .env by URL keyword. For example, https://generativelanguage.googleapis.com/... matches generativelanguage.
Prompts (prompts.json)
Use with --prompt-type in the CLI or select it from the dropdown in the web interface.
Add custom styles by editing prompts.json. Use {text} as the transcript placeholder.
CLI Examples
With a configured summarizer.yaml, the CLI is simple:
# Uses the default provider from YAML
python -m summarizer --source "https://youtube.com/watch?v=VIDEO_ID"
# Specify a provider
python -m summarizer --source "https://youtube.com/watch?v=VIDEO_ID" --provider groq
# Fact-check claims with Perplexity (use the Summarize skill for AI agents)
python -m summarizer \
--source "https://youtube.com/watch?v=VIDEO_ID" \
--base-url "https://api.perplexity.ai" \
--model "sonar-pro" \
--prompt-type "Fact Checker"
# Extract key insights
python -m summarizer \
--source "https://youtube.com/watch?v=VIDEO_ID" \
--provider gemini \
--prompt-type "Distill Wisdom"
# Generate a Mermaid diagram
python -m summarizer \
--source "https://youtube.com/watch?v=VIDEO_ID" \
--provider openrouter \
--prompt-type "Mermaid Diagram"
# Multiple videos
python -m summarizer --source "URL1" "URL2" "URL3"
# Local files
python -m summarizer --type "Local File" --source "./lecture.mp4"
# Speed up audio before Whisper (faster, may reduce accuracy)
python -m summarizer --source "URL" --force-download --audio-speed 2.0
# Aggressive speed-up (supported)
python -m summarizer --source "URL" --force-download --audio-speed 5.0
# Force YouTube audio download and show detailed progress
python -m summarizer \
--source "https://youtube.com/watch?v=VIDEO_ID" \
--force-download \
-v
# Non-YouTube URL (requires Cobalt)
python -m summarizer --type "Video URL" --source "https://www.instagram.com/reel/..."
# Specify a language for YouTube captions
python -m summarizer --source "URL" --prompt-type "Distill Wisdom" --language "it"
Without YAML, pass --base-url and --model explicitly:
python -m summarizer \
--source "https://youtube.com/watch?v=VIDEO_ID" \
--base-url "https://generativelanguage.googleapis.com/v1beta/openai" \
--model "gemini-2.5-flash-lite"
CLI Reference
| Flag | Description | Default |
|------|-------------|---------|
| --source | Video URLs or file paths (multiple allowed) | Required |
| --provider | Provider name from YAML | default_provider |
| --base-url | API endpoint (overrides provider) | From YAML |
| --model | Model identifier (overrides provider) | From YAML |
| --api-key | API key (overrides .env) | - |
| --type | YouTube Video, Video URL, Local File, Google Drive Video Link, Dropbox Video Link, TXT | YouTube Video |
| --prompt-type | Summary style | Questions and answers |
| --chunk-size | Input text chunk size in characters | 10000 |
| --force-download | Skip captions and download audio instead | False |
| --transcription | Cloud Whisper (Groq API) or Local Whisper (local) | Cloud Whisper |
| --whisper-model | tiny, base, small, medium, large | tiny |
| --audio-speed | Pre-transcription playback speed | 1.0 |
| --language | Language code for YouTube captions; useful when auto-detection picks the wrong track | auto |
| --parallel-calls | Concurrent API requests | 30 |
| --max-tokens | Max output tokens per chunk | 4096 |
| --cobalt-url | Cobalt base URL for non-YouTube platforms and fallback downloads | http://localhost:9000 |
| --output-dir | Output directory | summaries |
| --no-save | Print only, no file output | False |
| --verbose, -v | Detailed output | False |
Use --verbose to see detailed status output during config loading, downloads, transcription, and summarization.
Extra
Local Whisper
Runs transcription on your machine instead of using Groq Cloud Whisper. This removes the Groq API requirement, but CPU-only runs are much slower.
# Add Local Whisper support
pip install -e .[whisper]
# Optional: install CUDA-enabled PyTorch for GPU acceleration
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu124
# Use it
python -m summarizer --source "URL" --force-download --transcription "Local Whisper" --whisper-model "small"
If you only need CPU transcription, pip install -e .[whisper] is enough.
Why not in Docker? The Docker image installs the core app only. It does not include openai-whisper or GPU-oriented PyTorch because this project targets lightweight VPS deployments, where GPUs are usually unavailable. In Docker, Cloud Whisper is the pr
