OpenTAD
OpenTAD is an open-source temporal action detection (TAD) toolbox based on PyTorch.
Install / Use
/learn @sming256/OpenTADREADME
OpenTAD: An Open-Source Temporal Action Detection Toolbox.
<p align="left"> <a href="https://arxiv.org/abs/2502.20361" alt="arXiv"> <img src="https://img.shields.io/badge/arXiv-2502.20361-b31b1b.svg?style=flat" /></a> <a href="https://github.com/sming256/opentad/blob/main/LICENSE" alt="license"> <img src="https://img.shields.io/badge/License-Apache_2.0-blue.svg" /></a> <a href="https://github.com/sming256/OpenTAD/issues" alt="docs"> <img src="https://img.shields.io/github/issues-raw/sming256/OpenTAD?color=%23FF9600" /></a> <a href="https://img.shields.io/github/stars/sming256/opentad" alt="arXiv"> <img src="https://img.shields.io/github/stars/sming256/opentad" /></a> </p>OpenTAD is an open-source temporal action detection (TAD) toolbox based on PyTorch.
🥳 What's New
- [2024/07/25] 🔥 We rank 1st in the Action Recognition, Action Detection, and Audio-Based Interaction Detection tasks of the EPIC-KITCHENS-100 2024 Challenge, as well as 1st place in the Moment Queries task of the Ego4D 2024 Challenge! Code is released at CausalTAD (arxiv'24).
- [2024/07/07] 🔥 We support DyFADet (ECCV'24). Thanks to the authors's effort!
- [2024/06/14] We release version v0.3, which brings many new features and improvements.
- [2024/04/17] We release the AdaTAD (CVPR'24), which can achieve average mAP of 42.90% on ActivityNet and 77.07% on THUMOS14.
📖 Major Features
- Support SoTA TAD methods with modular design. We decompose the TAD pipeline into different components, and implement them in a modular way. This design makes it easy to implement new methods and reproduce existing methods.
- Support multiple TAD datasets. We support 9 TAD datasets, including ActivityNet-1.3, THUMOS-14, HACS, Ego4D-MQ, EPIC-Kitchens-100, FineAction, Multi-THUMOS, Charades, and EPIC-Sounds Detection datasets.
- Support feature-based training and end-to-end training. The feature-based training can easily be extended to end-to-end training with raw video input, and the video backbone can be easily replaced.
- Release various pre-extracted features. We release the feature extraction code, as well as many pre-extracted features on each dataset.
🌟 Model Zoo
<table align="center"> <tbody> <tr align="center" valign="bottom"> <td> <b>One Stage</b> </td> <td> <b>Two Stage</b> </td> <td> <b>DETR</b> </td> <td> <b>End-to-End Training</b> </td> </tr> <tr valign="top"> <td> <ul> <li><a href="configs/actionformer">ActionFormer (ECCV'22)</a></li> <li><a href="configs/tridet">TriDet (CVPR'23)</a></li> <li><a href="configs/temporalmaxer">TemporalMaxer (arXiv'23)</a></li> <li><a href="configs/videomambasuite">VideoMambaSuite (arXiv'24)</a></li> <li><a href="configs/dyfadet">DyFADet (ECCV'24)</a></li> <li><a href="configs/causaltad">CausalTAD (arXiv'24)</a></li> </ul> </td> <td> <ul> <li><a href="configs/bmn">BMN (ICCV'19)</a></li> <li><a href="configs/gtad">GTAD (CVPR'20)</a></li> <li><a href="configs/tsi">TSI (ACCV'20)</a></li> <li><a href="configs/vsgn">VSGN (ICCV'21)</a></li> </ul> </td> <td> <ul> <li><a href="configs/tadtr">TadTR (TIP'22)</a></li> </ul> </td> <td> <ul> <li><a href="configs/afsd">AFSD (CVPR'21)</a></li> <li><a href="configs/tadtr">E2E-TAD (CVPR'22)</a></li> <li><a href="configs/etad">ETAD (CVPRW'23)</a></li> <li><a href="configs/re2tal">Re2TAL (CVPR'23)</a></li> <li><a href="configs/adatad">AdaTAD (CVPR'24)</a></li> </ul> </td> </tr> </td> </tr> </tbody> </table>The detailed configs, results, and pretrained models of each method can be found in above folders.
🛠️ Installation
Please refer to install.md for installation.
📝 Data Preparation
Please refer to data.md for data preparation.
🚀 Usage
Please refer to usage.md for details of training and evaluation scripts.
📄 Updates
Please refer to changelog.md for update details.
🤝 Roadmap
All the things that need to be done in the future is in roadmap.md.
🖊️ Citation
[Acknowledgement] This repo is inspired by OpenMMLab project, and we give our thanks to their contributors.
If you think this repo is helpful, please cite us:
@article{liu2025opentad,
title={OpenTAD: A Unified Framework and Comprehensive Study of Temporal Action Detection},
author={Liu, Shuming and Zhao, Chen and Zohra, Fatimah and Soldan, Mattia and Pardo, Alejandro and Xu, Mengmeng and Alssum, Lama and Ramazanova, Merey and Alcázar, Juan León and Cioppa, Anthony and Giancola, Silvio and Hinojosa, Carlos and Ghanem, Bernard},
journal={arXiv preprint arXiv:2502.20361},
year={2025}
}
If you have any questions, please contact: shuming.liu@kaust.edu.sa.
Related Skills
qqbot-channel
347.0kQQ 频道管理技能。查询频道列表、子频道、成员、发帖、公告、日程等操作。使用 qqbot_channel_api 工具代理 QQ 开放平台 HTTP 接口,自动处理 Token 鉴权。当用户需要查看频道、管理子频道、查询成员、发布帖子/公告/日程时使用。
docs-writer
100.1k`docs-writer` skill instructions As an expert technical writer and editor for the Gemini CLI project, you produce accurate, clear, and consistent documentation. When asked to write, edit, or revie
model-usage
347.0kUse CodexBar CLI local cost usage to summarize per-model usage for Codex or Claude, including the current (most recent) model or a full model breakdown. Trigger when asked for model-level usage/cost data from codexbar, or when you need a scriptable per-model summary from codexbar cost JSON.
Design
Campus Second-Hand Trading Platform \- General Design Document (v5.0 \- React Architecture \- Complete Final Version)1\. System Overall Design 1.1. Project Overview This project aims t
