121 skills found · Page 1 of 5
NVlabs / Describe Anything[ICCV 2025] Implementation for Describe Anything: Detailed Localized Image and Video Captioning
SteveSandersonMS / CarCheckerA sample Blazor WebAssembly application that includes authentication, in-browser data storage, offline support, localization, responsive layouts, and more. For a video walkthrough, see this link:
agan-j / Xiaoniu小牛视频翻译 是一款支持本地视频翻译、字幕翻译和 YouTube 视频翻译下载的 AI 工具,集成自动语音识别与多语言翻译功能,助力创作者高效完成视频翻译,应用于视频本地化与视频出海场景。
zhengshou / ScnnSegment-CNN: A Framework for Temporal Action Localization in Untrimmed Videos via Multi-stage CNNs
bmartacho / UniPoseWe propose UniPose, a unified framework for human pose estimation, based on our “Waterfall” Atrous Spatial Pooling architecture, that achieves state-of-art-results on several pose estimation metrics. Current pose estimation methods utilizing standard CNN architectures heavily rely on statistical postprocessing or predefined anchor poses for joint localization. UniPose incorporates contextual seg- mentation and joint localization to estimate the human pose in a single stage, with high accuracy, without relying on statistical postprocessing methods. The Waterfall module in UniPose leverages the efficiency of progressive filter- ing in the cascade architecture, while maintaining multi- scale fields-of-view comparable to spatial pyramid config- urations. Additionally, our method is extended to UniPose- LSTM for multi-frame processing and achieves state-of-the- art results for temporal pose estimation in Video. Our re- sults on multiple datasets demonstrate that UniPose, with a ResNet backbone and Waterfall module, is a robust and efficient architecture for pose estimation obtaining state-of- the-art results in single person pose detection for both sin- gle images and videos.
YapengTian / AVE ECCV18Audio-Visual Event Localization in Unconstrained Videos, ECCV 2018
Yui010206 / SeViLA[NeurIPS 2023] Self-Chained Image-Language Model for Video Localization and Question Answering
LisaAnne / LocalizingMomentsGithub for my ICCV 2017 paper: "Localizing Moments in Video with Natural Language"
jayleicn / TVQA[EMNLP 2018] PyTorch code for TVQA: Localized, Compositional Video Question Answering
alipay / VCSLVideo Copy Segment Localization (VCSL) dataset and benchmark [CVPR2022]
MCG-NJU / MultiSports[ICCV 2021] MultiSports: A Multi-Person Video Dataset of Spatio-Temporally Localized Sports Actions
HumamAlwassel / TSPTSP: Temporally-Sensitive Pretraining of Video Encoders for Localization Tasks (ICCVW 2021)
26hzhang / VSLNetSpan-based Localizing Network for Natural Language Video Localization (ACL 2020)
TimeMarker-LLM / TimeMarkerA Versatile Video-LLM for Long and Short Video Understanding with Superior Temporal Localization Ability
Nicous20 / FunQAFunQA benchmarks funny, creative, and magic videos for challenging tasks including timestamp localization, video description, reasoning, and beyond.
SCZwangxiao / Temporal Language Grounding In VideosTemporal Moment(Action) Localization via Language / Temporal Language Grounding / Video Moment Retrieval
ktr-hubrt / WSALOfficial codes for paper: Localizing Anomalies from Weakly-Labeled Videos
QuanjianSong / UniVST[TPAMI 2025] Official Pytorch Code of the Paper "UniVST: A Unified Framework for Training-free Localized Video Style Transfer"
fengyang0317 / Video RelocCode for "Video Re-localization" in ECCV 2018
zhengshou / AutoLocAutoLoc: Weakly-supervised Temporal Action Localization in Untrimmed Videos. ECCV'18.