AlphaVideo

Vision toolbox for video related tasks including action recognition, multi-object tracking.

Generate Convert Improve

Install / Use

/learn @Alpha-Video/AlphaVideo

About this skill

Quality Score

0/100

README

Introduction

AlphaVideo is an open-sourced video understanding toolbox based on PyTorch covering multi-object tracking and action detection. In AlphaVideo, we released the first one-stage multi-object tracking (MOT) system TubeTK that can achieve 66.9 MOTA on MOT-16 dataset and 63 MOTA on MOT-17 dataset. For action detection, we released an efficient model AlphAction, which is the first open-source project that achieves 30+ mAP (32.4 mAP) with single model on AVA dataset.

Quick Start

pip

Run this command:

pip install alphavideo

from source

Clone repository from github:

git clone https://github.com/Alpha-Video/AlphaVideo.git alphaVideo
cd alphaVideo

Setup and install AlphaVideo:

pip install .

Features & Capabilities

Multi-Object Tracking

For this task, we provide the TubeTK model which is the official implementation of paper "TubeTK: Adopting Tubes to Track Multi-Object in a One-Step Training Model (CVPR2020, oral)." Detailed training and testing script on MOT-Challenge datasets can be found here.
<img src="https://github.com/BoPang1996/TubeTK/raw/master/assets/demo.gif" width = "600" align=center />
- Accurate end-to-end multi-object tracking.
- Do not need any ready-made image-level object deteaction models.
- Pre-trained model for pedestrian tracking.
- Input: Frame list; video.
- Output: Videos decorated by colored bounding-box; Btube lists.
- For details usages, see our docs.
Action recognition

For this task, we provide the AlphAction model as an implementation of paper "Asynchronous Interaction Aggregation for Action Detection". This paper is recently accepted by ECCV 2020!
<img src="https://github.com/MVIG-SJTU/AlphAction/raw/master/gifs/demo2.gif" width = "600" align=center />
- Accurate and efficient action detection.
- Pre-trained model for 80 atomic action categories defined in AVA.
- Input: Video; camera.
- Output: Videos decorated by human boxes, attached with corresponding action predictions.
- For details usages, see our docs.

Paper and Citations

@inproceedings{pang2020tubeTK,
  title={TubeTK: Adopting Tubes to Track Multi-Object in a One-Step Training Model},
  author={Pang, Bo and Li, Yizhuo and Zhang, Yifan and Li, Muchen and Lu, Cewu}
  booktitle={CVPR},
  year={2020}
}

@inproceedings{tang2020asynchronous,
  title={Asynchronous Interaction Aggregation for Action Detection},
  author={Tang, Jiajun and Xia, Jin and Mu, Xinzhi and Pang, Bo and Lu, Cewu},
  booktitle={Proceedings of the European conference on computer vision (ECCV)},
  year={2020}
}

Maintainers

This project is open-sourced and maintained by Machine Vision and Intelligence Group (MVIG) in Shanghai Jiao Tong University.

Related Skills

docs-writer

98.9k

`docs-writer` skill instructions As an expert technical writer and editor for the Gemini CLI project, you produce accurate, clear, and consistent documentation. When asked to write, edit, or revie

model-usage

333.7k

Use CodexBar CLI local cost usage to summarize per-model usage for Codex or Claude, including the current (most recent) model or a full model breakdown. Trigger when asked for model-level usage/cost data from codexbar, or when you need a scriptable per-model summary from codexbar cost JSON.

arscontexta

2.8k

Claude Code plugin that generates individualized knowledge systems from conversation. You describe how you think and work, have a conversation and get a complete second brain as markdown files you own.

generate-pydantic-ai-prp

Pydantic AI Research Agent Built with the PRP Framework Template

Alpha-Video

View profile

View on GitHub

GitHub Stars211

CategoryContent

Updated21d ago

Forks19

Alpha-Video/AlphaVideo

Languages

Python

Security Score

80/100

Audited on Mar 2, 2026

No findings