🟣 RotoAI - Intelligent Video Rotoscoping

Automated zero-shot video segmentation powered by SAM2 & Grounding DINO

🌟 Overview • 🎨 Visual Effects Showcase • ✨ Key Features • 📸 UI Screenshots • 🔄 Pipeline & Architecture •
🛠 Tech Stack & Repository Structure • 💾 Memory Management • 🚀 Getting Started • 🐛 Troubleshooting •
📜 Credits • 👨‍💻 Author <br><br>

If you find RotoAI useful, please consider supporting the development!

🌟 Overview

RotoAI is an advanced open-source studio for prompt-driven video segmentation. It leverages a Hybrid Cloud-Local Architecture: a responsive React frontend runs locally, while the heavy inference is offloaded to Google Colab GPUs (T4) via a secure Ngrok tunnel. Powered by state-of-the-art foundation models (SAM2 & Grounding DINO), RotoAI introduces intelligent VRAM management and chunked processing, enabling high-resolution rotoscoping on free cloud tier hardware without memory bottlenecks.

What Makes RotoAI Special?

Semantic Understanding: Select objects using natural language prompts (e.g., "person in red shirt") via Grounding DINO.
Hybrid Architecture: Combines the responsiveness of a local UI with the raw power of Cloud GPUs (Google Colab).
Production Resilience: Handles long videos via Smart Chunking (5s segments) and Auto-Resolution Scaling to prevent OOM errors.
Dual Detection Modes: Supports both generic Zero-Shot detection and Custom YOLO Models for specialized tasks.
6 Professional Effects: From cinematic B&W pop to neon glow overlays

🎬 Demo

Watch RotoAI in Action

Click to watch the full demonstration on YouTube

</div>

🎨 Visual Effects Showcase

Discover the cinematic effects you can create in seconds.

Bokeh Blur

Simulates a high-end camera lens by applying a realistic Gaussian blur to the background, creating a shallow depth-of-field effect.

<div align="center"> <img src="public/boke.gif" alt="Bokeh Blur Effect" width="800"> <p>Prompt used: <em>"Running man"</em></p> </div>

Chroma Key (Green Screen)

Replaces the background with a solid green (or custom hex) color, perfect for compositing in post-production tools like After Effects or Premiere.

<div align="center"> <img src="public/greenk.gif" alt="Chroma Key Effect" width="800"> <p>Prompt used: <em>"Boys dressed in red"</em></p> </div>

B&W Color Pop

Isolates the subject by keeping them in full color while instantly desaturating the background to grayscale.

<div align="center"> <img src="public/bew.gif" alt="B&W Effect" width="800"> <p>Prompt used: <em>"Boy with orange backpack"</em></p> </div>

Neon Glow

Adds a futuristic glowing outline around the detected subject. You can choose between a sharp border or a diffuse glow.

<div align="center"> <img src="public/neonEffect.gif" alt="Neon Effect" width="800"> <p>Prompt used: <em>"Dancing man"</em></p> </div>

Configuration Options:

✅ With Border: Colored neon outline with edge detection
❌ No Border: Soft glow with adjustable blur radius (1-15)

Color Pop

Applies a cinematic desaturation filter to the background, creating a moody, vintage aesthetic while keeping the subject vivid.

<div align="center"> <img src="public/colorPop.gif" alt="Color Pop Effect" width="800"> <p>Prompt used: <em>"Man with glasses"</em></p> </div>

Luminous Edge

Highlights the contours of the subject with a radiant light effect, creating a sketched light-painting look.

<div align="center"> <img src="public/doctors.gif" alt="Edge Light Effect" width="800"> <p>Prompt used: <em>"Doctors"</em></p> </div>

✨ Key Features

🤖 AI-Powered Detection

Open-Vocabulary Detection: Zero-shot capabilities via Grounding DINO allows finding any object using natural language prompts.
BYO Model (Bring Your Own): Support for custom trained YOLO (.pt) weights for specialized industrial or specific object detection.
Interactive Calibration: Built-in Test Mode to validate detection accuracy on individual frames before committing to full GPU rendering.

🎨 Professional Visual Effects

Cinematic Effects

🔳 B&W Color Pop: Isolate subjects in vibrant color against grayscale backgrounds
🟩 Chroma Key: Green screen-style background replacement with custom colors
🧪 Neon Glow: Cyberpunk-inspired luminous edge effects with configurable colors

</td> <td width="50%">

Advanced Filters

💧 Bokeh Blur: Professional depth-of-field simulation
🎞️ Color Pop: Cinematic desaturation for mood creation
💡 Luminous Edge: Highlight subject contours with glowing borders

</td> </tr> </table>

✨Effect Previews

<div align="center"> <video src="https://github.com/user-attachments/assets/53be5717-2a86-4521-a576-b98eee2e7100" controls="controls" style="max-width: 100%; border-radius: 10px;" muted="muted" autoplay="autoplay" loop="loop"> </video> <p><i>▶️ Click play to watch the Effect Previews</i></p> </div>

⚙️ Advanced Configuration

Dual Output Modes: Side-by-side comparison or processed-only
Smart Scanning: Configurable detection window (1-10 seconds)
Precision Tuning: Adjustable confidence thresholds (0.01-0.80)
Memory Management: Automatic resolution scaling for optimal VRAM usage

🚀 Performance & Optimization

Chunked Processing: Handles videos of any length without OOM errors
GPU Acceleration: FP16 precision for 2x speed on modern GPUs
Real-Time Progress: Frame-by-frame statistics with ETA
Smart Caching: Efficient frame storage and cleanup

📸 UI Screenshots

A glance at the RotoAI interface and its capabilities.

Main interface

Hybrid Cloud Connection

The entry point connecting the local React UI with the Colab GPU backend via Ngrok.

Interactive test detection

Test Detection Module: Validate prompts and confidence thresholds frame-by-frame before processing.

Visual Effects Engine

Select from cinematic effects like Neon Glow, Chroma Key, Bokeh Blur, and B&W Pop.

Advanced Configuration

Fine-tune scan duration, confidence thresholds, and output formats (Comparison vs. Processed Only).

Results & Player

Built-in video player with loop functionality and instant download for the final rendered MP4.

🔄 How It Works

AI Processing Pipeline

RotoAI Pipeline

End-to-End Flow: Visual step-by-step of the rendering process

</div>

Hybrid Architecture

RotoAI Architecture

Hybrid Infrastructure: Local Frontend connected to Remote Backend via Ngrok

</div>

🛠 Tech Stack

Core AI Models

🔍 Detection (The "Eyes")

Grounding DINO (SwinB) The prompt-master. Allows you to select objects using natural language queries (e.g., "black cat").

Size: ~600MB

Type: Zero-shot Object Detection

YOLO v8/v11 The specialist. Supports user-uploaded .pt weights for fine-tuned tasks.

Size: <100MB (Typical)

Type: Custom Object Detection

✂️ Segmentation (The "Hands")

SAM 2 (Segment Anything 2) The tracker. Handles the heavy lifting of propagating masks across video frames.

Architecture: Hiera Small

Size: ~180MB

Performance: Real-time propagation

Backend Stack

# Core Dependencies
PyTorch 2.0+          # Deep learning framework
FastAPI 0.104+        # Async web framework
Uvicorn               # ASGI server
OpenCV (cv2)          # Video processing
NumPy                 # Matrix operations
Pillow (PIL)          # Image manipulation
FFmpeg                # Video encoding

Infrastructure:

🌐 Google Colab: Cloud GPU environment (T4/P100/V100)
🔗 Ngrok: Secure tunnel for public API access
🔧 Nest AsyncIO: Event loop management for Co

RotoAI

Install / Use

README

🟣 RotoAI - Intelligent Video Rotoscoping

🌟 Overview

What Makes RotoAI Special?

🎬 Demo

🎨 Visual Effects Showcase

Bokeh Blur

Chroma Key (Green Screen)

B&W Color Pop

Neon Glow

Color Pop

Luminous Edge

✨ Key Features

🤖 AI-Powered Detection

🎨 Professional Visual Effects

✨Effect Previews

⚙️ Advanced Configuration

🚀 Performance & Optimization

📸 UI Screenshots

Main interface

Hybrid Cloud Connection

Interactive test detection

Visual Effects Engine

Advanced Configuration

Results & Player

🔄 How It Works

AI Processing Pipeline

Hybrid Architecture

🛠 Tech Stack

Core AI Models

🔍 Detection (The "Eyes")

✂️ Segmentation (The "Hands")

Backend Stack