ReStraV
AI-Generated Video Detection via Perceptual Straightening (NeurIPS2025)
Install / Use
/learn @ChristianInterno/ReStraVREADME
ReStraV: AI-Generated Video Detection via Perceptual Straightening
Official implementation of the paper "AI-Generated Video Detection via Perceptual Straightening", accepted at NeurIPS 2025.
Figure 1: The ReStraV method. Video frames are processed by a self-supervised encoder (DINOv2) to get embeddings. In this representation space, natural videos trace "straighter" paths than AI-generated ones. The trajectory's geometry, especially its curvature, serves as a powerful signal for a lightweight classifier to distinguish real from fake.
Important (local setup knobs): several scripts include hard-coded values for
device(e.g.cuda:1),batch_size,num_workers, paths, and download worker counts.
You will likely need to open the files and change these values to match your machine (GPU index, RAM/VRAM, CPU cores, filesystem layout).
What this repo does
Core idea:
- Sample a short clip from each video (default: ~2 seconds, 24 frames).
- Encode frames with a pretrained vision backbone (DINOv2 ViT-S/14 via
torch.hub). - Treat the per-frame embeddings as a trajectory in representation space.
- Compute temporal geometry features: stepwise distances and curvature/turning angles across time.
- Train a lightweight classifier (an MLP) on a 21-D feature vector per video.
- Use the trained model to predict whether a new video is REAL or FAKE.
Repository layout (high level)
dinov2_features.py— video decoding + DINOv2 embedding extraction + 21-D feature computationtrain.py— trains the MLP classifier; savesmodel.pt,mean.npy,std.npy,best_tau.npydemo.py— Gradio demo (upload video or paste URL; usesyt-dlpto download)DATA/— data + helper scripts (download/extract features) and generated artifacts
Method details (the 21-D feature vector)
The feature builder in dinov2_features.py computes:
- 7 early stepwise distances:
d[0:7] - 6 early turning angles:
theta[0:6] - 8 summary statistics (mean/min/max/variance) for distances and angles:
μ_d, min_d, max_d, var_dμ_θ, min_θ, max_θ, var_θ
Total: 7 + 6 + 8 = 21 features per video.
Setup
1) Clone
git clone https://github.com/ChristianInterno/ReStraV.git
cd ReStraV
2) Install dependencies
pip install -r requirements.txt
Data (training)
- REAL videos: pulled from the Video Similarity Challenge URL list, filtered by a local reference list file
- FAKE videos: pulled from VidProM (often the
example/subset from Hugging Face)
Step-by-step pipeline
Step A — Download training videos
python DATA/download_training_data.py
- Downloads a subset of REAL mp4s by matching filenames from a
ref_file_paths.txtlist - Downloads FAKE examples from the VidProM dataset and extracts
.tarfiles intoFAKE/
Things you may need to edit inside the script:
MAX_WORKERS(default may be too high for your network / OS)TIMEOUT
Step B — Extract DINOv2 geometry features into an HDF5
python DATA/extract_training_features.py
This writes an HDF5 file:
path(string)label(int; 1=real, 0=fake)features(float; shape[N, 21])
Things you may need to edit inside this script:
batch_sizedevice
Step C — Train the classifier
python train.py
- Loads all samples from the HDF5
- Balances classes by subsampling to equal priors
- Normalizes features (saves
mean.npyandstd.npy) - Splits 50/50 train/test with stratification
- Trains a small MLP for a fixed number of epochs
- Picks an operating threshold
τ*maximizing F1 on the training set - Evaluates on test set; writes
test_predictions_all.csv - Saves model weights to
model.pt
Things you may need to edit inside train.py:
device- DataLoader
batch_size num_workersepochs, learning rate, hidden sizes
Outputs written in the working directory by default:
model.ptmean.npystd.npybest_tau.npytest_predictions_all.csv
Demo (Gradio)
Once you have model.pt, mean.npy, std.npy, and best_tau.npy in the repo root:
python demo.py
The demo supports:
- Uploading a video file, or
- Pasting a URL; it downloads the video via
yt-dlpinto a temp folder
Citation
If you find our work useful in your research, please consider citing our paper:
@misc{internò2025aigeneratedvideodetectionperceptual,
title={AI-Generated Video Detection via Perceptual Straightening},
author={Christian Internò and Robert Geirhos and Markus Olhofer and Sunny Liu and Barbara Hammer and David Klindt},
year={2025},
eprint={2507.00583},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2507.00583},
}
Acknowledgements
This research was partly funded by Honda Research Institute Europe and Cold Spring Harbor Laboratory. We would like to thank Eero Simoncelli for insightful discussions and feedback, as well as all our colleagues from Google DeepMind, the Machine Learning Group at Bielefeld University, Honda Research Institute for the insightful discussions and feedback.
All code in this repository was contributed by Sam Pagon (@sampagon).
Related Skills
qqbot-channel
349.0kQQ 频道管理技能。查询频道列表、子频道、成员、发帖、公告、日程等操作。使用 qqbot_channel_api 工具代理 QQ 开放平台 HTTP 接口,自动处理 Token 鉴权。当用户需要查看频道、管理子频道、查询成员、发布帖子/公告/日程时使用。
docs-writer
100.3k`docs-writer` skill instructions As an expert technical writer and editor for the Gemini CLI project, you produce accurate, clear, and consistent documentation. When asked to write, edit, or revie
model-usage
349.0kUse CodexBar CLI local cost usage to summarize per-model usage for Codex or Claude, including the current (most recent) model or a full model breakdown. Trigger when asked for model-level usage/cost data from codexbar, or when you need a scriptable per-model summary from codexbar cost JSON.
Design
Campus Second-Hand Trading Platform \- General Design Document (v5.0 \- React Architecture \- Complete Final Version)1\. System Overall Design 1.1. Project Overview This project aims t
