Clawvid
AI Skill to create short-form video generation with OpenClaw
Install / Use
/learn @neur0map/ClawvidREADME
ClawVid
Status: Beta — End-to-end pipeline working. TTS-driven timing (narration drives scene duration), voice consistency (voice cloning across scenes), subtitle sync, aspect-ratio-safe encoding, Kling 2.6 video, Remotion rendering, word-level subtitles, and multi-track audio. 97 tests passing. Edge cases and hardening still in progress.
AI-powered short-form video generation CLI for OpenClaw.
Generate YouTube Shorts, TikToks, and Instagram Reels from text prompts. The OpenClaw agent orchestrates the entire pipeline — planning scenes, writing prompts, and generating a workflow JSON that ClawVid executes end-to-end.
How It Works
User: "Make a horror video about a haunted library"
|
v
OpenClaw Agent (reads SKILL.md)
- Asks clarifying questions
- Plans scenes with timing
- Writes image/video prompts
- Plans sound effects + music
- Creates workflow.json
- Runs: clawvid generate --workflow workflow.json
|
v
ClawVid Pipeline (TTS-first, 6 phases)
Phase 1. Generate TTS narration (qwen-3-tts, voice cloning for consistency)
Phase 2. Compute timing (scene durations derived from actual TTS length)
Phase 3. Generate images via fal.ai (kling-image/v3 or nano-banana-pro)
Phase 4. Generate video clips via fal.ai (Kling 2.6 Pro)
Phase 5. Generate sound effects via fal.ai (beatoven)
Phase 6. Generate background music via fal.ai (beatoven)
+ Position narration at scene starts (adelay, not sequential concat)
+ Process audio (trim, normalize, multi-track mix)
+ Generate subtitles (Whisper word-level, offset by scene start)
+ Render compositions (Remotion: 16:9 + 9:16 with effects)
+ Post-process (FFmpeg: encode with aspect-ratio-safe scaling, thumbnails)
|
v
output/2026-02-11-haunted-library/
youtube/ haunted-library.mp4 (1920x1080)
tiktok/ haunted-library.mp4 (1080x1920)
instagram/ haunted-library.mp4 (1080x1920)
All AI generation flows through fal.ai — one API, one key.
Prerequisites
- Node.js >= 18
- FFmpeg installed and on PATH (
brew install ffmpegon macOS) - fal.ai API key (get one here)
Installation
git clone https://github.com/neur0map/clawvid
cd clawvid
npm install
cp .env.example .env
# Edit .env and add your FAL_KEY
Link as global CLI (optional)
npm run build
npm link
# Now "clawvid" is available globally
Development mode
npm run dev -- generate --workflow workflow.json
Configuration
Environment
FAL_KEY=your_fal_ai_key_here
That's the only required environment variable.
config.json
Central settings file checked into git. Controls:
| Section | What it configures |
|---------|-------------------|
| fal.image | Image generation model (kling-image/v3, nano-banana-pro) |
| fal.video | Video generation model (Kling 2.6 Pro) |
| fal.audio | TTS (qwen-3-tts), transcription (whisper), sound effects (beatoven), music (beatoven) |
| fal.analysis | Image/video analysis models for quality verification |
| defaults | Aspect ratio, resolution, FPS, duration, max video clips |
| templates | 4 built-in templates (horror, motivation, quiz, reddit) |
| quality | 3 presets (max_quality, balanced, budget) |
| platforms | YouTube, TikTok, Instagram Reels specs |
| output | Output directory, format, naming pattern |
preferences.json
Per-user defaults created by clawvid setup. Gitignored. Stores platform selection, default template, quality mode, voice, and visual style.
CLI Commands
# Full pipeline: generate assets + render video
clawvid generate --workflow <path> # Required: workflow JSON
--template <name> # Override template
--quality <mode> # max_quality | balanced | budget
--skip-cache # Regenerate all assets
# Re-render from existing assets
clawvid render --run <path> # Required: output run directory
--platform <name> # youtube | tiktok | instagram_reels
--all-platforms # Render all platforms
# Preview in Remotion
clawvid preview --workflow <path> # Required: workflow JSON
--platform <name> # Preview as platform (default: tiktok)
# Remotion visual editor
clawvid studio
# User preferences
clawvid setup # Interactive setup
clawvid setup --reset # Reset to defaults
Workflow JSON
The agent generates a workflow JSON file that describes every scene, prompt, model, timing, sound effects, and music. See SKILL.md for the complete schema reference.
Minimal example:
{
"name": "Quick Horror",
"template": "horror",
"timing_mode": "tts_driven",
"scene_padding_seconds": 0.5,
"scenes": [
{
"id": "scene_1",
"type": "video",
"timing": {},
"narration": "The door opened by itself.",
"image_generation": {
"model": "fal-ai/kling-image/v3/text-to-image",
"input": {
"prompt": "Dark hallway, door slightly ajar, light from behind, horror atmosphere",
"aspect_ratio": "9:16"
}
},
"video_generation": {
"model": "fal-ai/kling-video/v2.6/pro/image-to-video",
"input": {
"prompt": "Slow push into dark hallway, door creaks open, light flickers",
"duration": "5",
"negative_prompt": "blur, low quality, bright"
}
},
"sound_effects": [
{
"prompt": "Creaky door opening slowly",
"timing_offset": 1,
"duration": 3,
"volume": 0.7
}
],
"effects": ["vignette", "kenburns_slow_zoom", "grain"]
}
],
"audio": {
"tts": {
"model": "fal-ai/qwen-3-tts/voice-design/1.7b",
"voice_prompt": "A deep male voice with creepy undertones",
"speed": 0.9
},
"music": {
"generate": true,
"prompt": "Dark ambient drone, horror atmosphere",
"duration": 30,
"volume": 0.2
}
}
}
Scene consistency
For visually consistent characters/environments across scenes, add a consistency block:
{
"consistency": {
"reference_prompt": "A dark animated shadow creature with glowing white eyes...",
"seed": 666
}
}
This generates a reference image first, then edits it for each scene using nano-banana-pro/edit with the same seed — ensuring visual consistency without the fal.ai workflow platform.
Full examples:
- workflows/horror-story-example.json — standard workflow
- workflows/production-horror-frames.json — production 30s with consistency
Project Structure
clawvid/
SKILL.md # Agent instructions (the brain)
config.json # All tuneable settings
preferences.json # Per-user defaults (gitignored)
.env # FAL_KEY (gitignored)
workflows/ # Example workflow JSONs
horror-story-example.json # Standard horror (7 scenes, 60s)
production-horror-frames.json # Production horror (6 frames, 30s, consistency)
test-motivation-consistency.json # Test with scene consistency
test-workflow-consistency.json # Minimal consistency test
test-minimal.json # Minimal 2-scene test workflow
src/
index.ts # CLI entry point
cli/ # Command definitions
program.ts # Commander setup (5 commands)
generate.ts # clawvid generate
render.ts # clawvid render
preview.ts # clawvid preview
studio.ts # clawvid studio
setup.ts # clawvid setup
core/ # Pipeline orchestration
pipeline.ts # Main pipeline (generate/render/preview/studio/setup)
workflow-runner.ts # TTS-first workflow execution (6 phases + computeTiming)
scene-planner.ts # Validate scene plans
asset-manager.ts # Track assets per run
fal/ # fal.ai API layer
client.ts # Shared client (auth, queue, retry)
image.ts # Image generation (kling-image/v3)
video.ts # Image-to-video (Kling 2.6 Pro, kandinsky5-pro)
audio.ts # TTS (qwen-3-tts) and transcription (whisper)
sound.ts # Sound effect generation (beatoven)
music.ts # Music generation (beatoven)
workflow.ts # Scene consistency (reference + edit orchestration)
analysis.ts # Image/video analysis (got-ocr, video-understanding)
cost.ts # Cost tracking per run
queue.ts # Concurrency control (p-queue)
types.ts # Shared response types
render/ # Remotion video composition
root.tsx # Remotion entry point
renderer.ts # Programmatic render (bundle + render)
compositions/
landscape.tsx # 16:9 YouTube composition
portrait.tsx # 9:16 social media composition
scene-renderer.tsx # Shared scene render logic
types.ts # SceneProps, TemplateStyle, CompositionProps
templates/
horror.tsx
