SkillAgentSearch skills...

RotoAI

An open-source studio for prompt-driven video segmentation. Powered by SAM2 & Grounding DINO with a hybrid Cloud-Local architecture.

Install / Use

/learn @sPappalard/RotoAI
About this skill

Quality Score

0/100

Supported Platforms

Universal

README

🟣 RotoAI - Intelligent Video Rotoscoping

<a name="readme-top"></a>

<div align="center">

RotoAI Banner

License Python 3.10+ React 18 FastAPI Powered By SAM2

Automated zero-shot video segmentation powered by SAM2 & Grounding DINO

🌟 Overview🎨 Visual Effects Showcase✨ Key Features📸 UI Screenshots🔄 Pipeline & Architecture
🛠 Tech Stack & Repository Structure💾 Memory Management🚀 Getting Started🐛 Troubleshooting
📜 Credits👨‍💻 Author <br><br>

If you find RotoAI useful, please consider supporting the development!

<a href="https://www.buymeacoffee.com/sPappalard"> <img src="https://cdn.buymeacoffee.com/buttons/v2/default-yellow.png" alt="Buy Me A Coffee" width="180" /> </a> </div>

🌟 Overview

RotoAI is an advanced open-source studio for prompt-driven video segmentation. It leverages a Hybrid Cloud-Local Architecture: a responsive React frontend runs locally, while the heavy inference is offloaded to Google Colab GPUs (T4) via a secure Ngrok tunnel. Powered by state-of-the-art foundation models (SAM2 & Grounding DINO), RotoAI introduces intelligent VRAM management and chunked processing, enabling high-resolution rotoscoping on free cloud tier hardware without memory bottlenecks.

What Makes RotoAI Special?

  • Semantic Understanding: Select objects using natural language prompts (e.g., "person in red shirt") via Grounding DINO.
  • Hybrid Architecture: Combines the responsiveness of a local UI with the raw power of Cloud GPUs (Google Colab).
  • Production Resilience: Handles long videos via Smart Chunking (5s segments) and Auto-Resolution Scaling to prevent OOM errors.
  • Dual Detection Modes: Supports both generic Zero-Shot detection and Custom YOLO Models for specialized tasks.
  • 6 Professional Effects: From cinematic B&W pop to neon glow overlays

🎬 Demo

<div align="center">

Watch RotoAI in Action

RotoAI Demo

Click to watch the full demonstration on YouTube

</div>

🎨 Visual Effects Showcase

Discover the cinematic effects you can create in seconds.

Bokeh Blur

Simulates a high-end camera lens by applying a realistic Gaussian blur to the background, creating a shallow depth-of-field effect.

<div align="center"> <img src="public/boke.gif" alt="Bokeh Blur Effect" width="800"> <p>Prompt used: <em>"Running man"</em></p> </div>

Chroma Key (Green Screen)

Replaces the background with a solid green (or custom hex) color, perfect for compositing in post-production tools like After Effects or Premiere.

<div align="center"> <img src="public/greenk.gif" alt="Chroma Key Effect" width="800"> <p>Prompt used: <em>"Boys dressed in red"</em></p> </div>

B&W Color Pop

Isolates the subject by keeping them in full color while instantly desaturating the background to grayscale.

<div align="center"> <img src="public/bew.gif" alt="B&W Effect" width="800"> <p>Prompt used: <em>"Boy with orange backpack"</em></p> </div>

Neon Glow

Adds a futuristic glowing outline around the detected subject. You can choose between a sharp border or a diffuse glow.

<div align="center"> <img src="public/neonEffect.gif" alt="Neon Effect" width="800"> <p>Prompt used: <em>"Dancing man"</em></p> </div>

Configuration Options:

  • ✅ With Border: Colored neon outline with edge detection
  • ❌ No Border: Soft glow with adjustable blur radius (1-15)

Color Pop

Applies a cinematic desaturation filter to the background, creating a moody, vintage aesthetic while keeping the subject vivid.

<div align="center"> <img src="public/colorPop.gif" alt="Color Pop Effect" width="800"> <p>Prompt used: <em>"Man with glasses"</em></p> </div>

Luminous Edge

Highlights the contours of the subject with a radiant light effect, creating a sketched light-painting look.

<div align="center"> <img src="public/doctors.gif" alt="Edge Light Effect" width="800"> <p>Prompt used: <em>"Doctors"</em></p> </div>

✨ Key Features

🤖 AI-Powered Detection

  • Open-Vocabulary Detection: Zero-shot capabilities via Grounding DINO allows finding any object using natural language prompts.
  • BYO Model (Bring Your Own): Support for custom trained YOLO (.pt) weights for specialized industrial or specific object detection.
  • Interactive Calibration: Built-in Test Mode to validate detection accuracy on individual frames before committing to full GPU rendering.

🎨 Professional Visual Effects

<table> <tr> <td width="50%">

Cinematic Effects

  • 🔳 B&W Color Pop: Isolate subjects in vibrant color against grayscale backgrounds
  • 🟩 Chroma Key: Green screen-style background replacement with custom colors
  • 🧪 Neon Glow: Cyberpunk-inspired luminous edge effects with configurable colors
</td> <td width="50%">

Advanced Filters

  • 💧 Bokeh Blur: Professional depth-of-field simulation
  • 🎞️ Color Pop: Cinematic desaturation for mood creation
  • 💡 Luminous Edge: Highlight subject contours with glowing borders
</td> </tr> </table>

✨Effect Previews

<div align="center"> <video src="https://github.com/user-attachments/assets/53be5717-2a86-4521-a576-b98eee2e7100" controls="controls" style="max-width: 100%; border-radius: 10px;" muted="muted" autoplay="autoplay" loop="loop"> </video> <p><i>▶️ Click play to watch the Effect Previews</i></p> </div>

⚙️ Advanced Configuration

  • Dual Output Modes: Side-by-side comparison or processed-only
  • Smart Scanning: Configurable detection window (1-10 seconds)
  • Precision Tuning: Adjustable confidence thresholds (0.01-0.80)
  • Memory Management: Automatic resolution scaling for optimal VRAM usage

🚀 Performance & Optimization

  • Chunked Processing: Handles videos of any length without OOM errors
  • GPU Acceleration: FP16 precision for 2x speed on modern GPUs
  • Real-Time Progress: Frame-by-frame statistics with ETA
  • Smart Caching: Efficient frame storage and cleanup

📸 UI Screenshots

A glance at the RotoAI interface and its capabilities.

Main interface

<div align="center"> <img src="public/main.jpg" alt="Main Interface" width="100%"> </div>

Hybrid Cloud Connection

The entry point connecting the local React UI with the Colab GPU backend via Ngrok.

<div align="center"> <img src="public/connect.jpg" alt="Connection Screen" width="100%"> </div>

Interactive test detection

Test Detection Module: Validate prompts and confidence thresholds frame-by-frame before processing.

<div align="center"> <img src="public/test.jpg" alt="Test Detection UI" width="100%"> </div> <div align="center"> <img src="public/test2.jpg" alt="Test Detection result" width="100%"> </div>

Visual Effects Engine

Select from cinematic effects like Neon Glow, Chroma Key, Bokeh Blur, and B&W Pop.

<div align="center"> <img src="public/effects.jpg" alt="Effects Selection" width="100%"> </div>

Advanced Configuration

Fine-tune scan duration, confidence thresholds, and output formats (Comparison vs. Processed Only).

<div align="center"> <img src="public/advanced.jpg" alt="Advanced Settings" width="100%"> </div>

Results & Player

Built-in video player with loop functionality and instant download for the final rendered MP4.

<div align="center"> <img src="public/res.jpg" alt="Results" width="100%"> </div>

🔄 How It Works

AI Processing Pipeline

<div align="center">

RotoAI Pipeline

End-to-End Flow: Visual step-by-step of the rendering process

</div>

Hybrid Architecture

<div align="center">

RotoAI Architecture

Hybrid Infrastructure: Local Frontend connected to Remote Backend via Ngrok

</div>

🛠 Tech Stack

Core AI Models

🔍 Detection (The "Eyes")

Grounding DINO (SwinB) The prompt-master. Allows you to select objects using natural language queries (e.g., "black cat").

  • Size: ~600MB
  • Type: Zero-shot Object Detection

YOLO v8/v11 The specialist. Supports user-uploaded .pt weights for fine-tuned tasks.

  • Size: <100MB (Typical)
  • Type: Custom Object Detection

✂️ Segmentation (The "Hands")

SAM 2 (Segment Anything 2) The tracker. Handles the heavy lifting of propagating masks across video frames.

  • Architecture: Hiera Small
  • Size: ~180MB
  • Performance: Real-time propagation

Backend Stack

# Core Dependencies
PyTorch 2.0+          # Deep learning framework
FastAPI 0.104+        # Async web framework
Uvicorn               # ASGI server
OpenCV (cv2)          # Video processing
NumPy                 # Matrix operations
Pillow (PIL)          # Image manipulation
FFmpeg                # Video encoding

Infrastructure:

  • 🌐 Google Colab: Cloud GPU environment (T4/P100/V100)
  • 🔗 Ngrok: Secure tunnel for public API access
  • 🔧 Nest AsyncIO: Event loop management for Co
View on GitHub
GitHub Stars49
CategoryContent
Updated1d ago
Forks12

Languages

TypeScript

Security Score

95/100

Audited on Mar 20, 2026

No findings