128 skills found · Page 1 of 5
IDEA-Research / Grounded SAM 2Grounded SAM 2: Ground and Track Anything in Videos with Grounding DINO, Florence-2 and SAM 2
TheShadow29 / Awesome Groundingawesome grounding: A curated list of research papers in visual grounding
showlab / UniVTG[ICCV 2023] UniVTG: Towards Unified Video-Language Temporal Grounding
ttengwang / Awesome Long Form Video UnderstandingAwesome papers & datasets specifically focused on long-term videos.
facebookresearch / Grounded Video DescriptionVideo Grounding and Captioning
mbzuai-oryx / Video LLaVAPG-Video-LLaVA: Pixel Grounding in Large Multimodal Video Models
antoyang / TubeDETR[CVPR 2022 Oral] TubeDETR: Spatio-Temporal Video Grounding with Transformers
Soldelli / MADMAD: A Scalable Dataset for Language Grounding in Videos from Movie Audio Descriptions
sutdcv / Animal Kingdom[CVPR2022] Animal Kingdom: A Large and Diverse Dataset for Animal Behavior Understanding
wjun0830 / CGDETROfficial pytorch repository for CG-DETR "Correlation-guided Query-Dependency Calibration in Video Representation Learning for Temporal Grounding"
gyxxyg / TRACE[ICLR 2025] TRACE: Temporal Grounding Video LLM via Casual Event Modeling
yongliang-wu / NumPro[CVPR2025] Number it: Temporal Grounding Videos like Flipping Manga
WHB139426 / Grounded Video LLM[EMNLP 2025 Findings] Grounded-VideoLLM: Sharpening Fine-grained Temporal Grounding in Video Large Language Models
iSEE-Laboratory / ReferDINO(ICCV 2025) ReferDINO: Referring Video Object Segmentation with Visual Grounding Foundations
www-Ye / Time R1R1-like Video-LLM for Temporal Grounding
jayleicn / TVQAplus[ACL 2020] PyTorch code for TVQA+: Spatio-Temporal Grounding for Video Question Answering
JonghwanMun / LGI4temporalgroundingRepository for the CVPR-20 paper "Local-Global Video-Text Interactions for Temporal Grounding"
fletcherjiang / LLMEPET[MM'24 Oral] Prior Knowledge Integration via LLM Encoding and Pseudo Event Regulation for Video Moment Retrieval
zhongyingji / Guidedvd 3dgsTaming Video Diffusion Prior with Scene-Grounding Guidance for 3D Gaussian Splatting from Sparse Inputs (CVPR2025 Highlight)
gyxxyg / VTG LLM[AAAI 2025] VTG-LLM: Integrating Timestamp Knowledge into Video LLMs for Enhanced Video Temporal Grounding