AutoVideo: An Automated Video Action Recognition System

AutoVideo is a system for automated video analysis. It is developed based on D3M infrastructure, which describes machine learning with generic pipeline languages. Currently, it focuses on video action recognition, supporting a complete training pipeline consisting of data processing, video processing, video transformation, and action recognition. It also supports automated tuners for pipeline search. AutoVideo is developed by DATA Lab at Rice University.

Paper: https://arxiv.org/abs/2108.04212
Demo Video: https://youtu.be/BEInjBjeIuo
Tutorial: [Towards Data Science] AutoVideo: An Automated Video Action Recognition System
Related Project: TODS: Automated Time-series Outlier Detection System
:loudspeaker: Do you want to learn more about data pipeline search? Please check out our data-centric AI survey and data-centric AI resources!

There are some other video analysis libraries out there, but this one is designed to be highly modular. AutoVideo is highly extendible thanks to the pipeline language, where each module is wrapped as a primitive with some hyperparameters. This allows us to easily develop new modules. It is also convenient to perform pipeline search. We welcome contributions to enrich AutoVideo with more primitives. You can find instructions in Contributing Guide.

Cite this work

If you find this repo useful, you may cite:

Zha, Daochen, et al. "AutoVideo: An Automated Video Action Recognition System." arXiv preprint arXiv:2108.0421 (2021).

@inproceedings{zha2021autovideo,
  title={Autovideo: An automated video action recognition system},
  author={Zha, Daochen and Bhat, Zaid Pervaiz and Chen, Yi-Wei and Wang, Yicheng and Ding, Sirui and Chen, Jiaben and Lai, Kwei-Herng and Bhat, Mohammad Qazim and Jain, Anmoll Kumar and Reyes, Alfredo Costilla and Zou, Na and Xia, Hu},
  booktitle={IJCAI},
  year={2022}
}

Installation

Make sure that you have Python 3.6+ and pip installed. Currently the code is only tested in Linux system. First, install torch and torchvision with

pip3 install torch
pip3 install torchvision

To use the automated searching, you need to install ray-tune and hyperopt with

pip3 install 'ray[tune]' hyperopt

We recommend installing the stable version of autovideo with pip:

pip3 install autovideo

Alternatively, you can clone the latest version with

git clone https://github.com/datamllab/autovideo.git

Then install with

cd autovideo
pip3 install -e .

Quick Start

To try the examples, you may download hmdb6 dataset, which is a subset of hmdb51 with only 6 classes. All the datasets can be downloaded from Google Drive. Then, you may unzip a dataset and put it in datasets. You may also try STGCN for skeleton-based action recogonition on kinetics36, which is a subset of Kinetics dataset with 36 classes.

Fitting and saving a pipeline

python3 examples/fit.py

Some important hyperparameters are as follows.

--alg: the supported algorithm. Currently we support tsn, tsm, i3d, eco, eco_full, c3d, r2p1d, r3d, stgcn.
--pretrained: whether loading pre-trained weights and fine-tuning.
--gpu: which gpu device to use. Empty string for CPU.
--data_dir: the directory of the dataset
--log_dir: the path for sainge the log
--save_path: the path for saving the fitted pipeline

In AutoVideo, all the pipelines can be described as Python Dictionaries. In examplers/fit.py, the default pipline is defined below.

config = {
	"transformation":[
		("RandomCrop", {"size": (128,128)}),
		("Scale", {"size": (128,128)}),
	],
	"augmentation": [
		("meta_ChannelShuffle", {"p": 0.5} ),
		("blur_GaussianBlur",),
		("flip_Fliplr", ),
		("imgcorruptlike_GaussianNoise", ),
	],
	"multi_aug": "meta_Sometimes",
	"algorithm": "tsn",
	"load_pretrained": False,
	"epochs": 50,
}

This pipeline describes what transformation and augmentation primitives will be used, and also how the multiple augmentation primitives are combined. It also specifies using TSN to train 50 epochs from scratch. The hyperparameters can be flexibly configured based on the hyperparameters defined in each primitive.

Loading a fitted pipeline and producing predictions

After fitting a pipeline, you can load a pipeline and make predictions.

python3 examples/produce.py

Some important hyperparameters are as follows.

--gpu: which gpu device to use. Empty string for CPU.
--data_dir: the directory of the dataset
--log_dir: the path for saving the log
--load_path: the path for loading the fitted pipeline

Loading a fitted pipeline and recogonizing actions

After fitting a pipeline, you can also make predicitons on a single video. As a demo, you may download the fitted pipeline and the demo video from Google Drive. Then, you can use the following command to recogonize the action in the video:

python3 examples/recogonize.py

Some important hyperparameters are as follows.

--gpu: which gpu device to use. Empty string for CPU.
--video_path: the path of video file
--log_dir: the path for saving the log
--load_path: the path for loading the fitted pipeline

Fitting and producing a pipeline

Alternatively, you can do fit and produce without saving the model with

python3 examples/fit_produce.py

Some important hyperparameters are as follows.

--alg: the supported algorithm.
--pretrained: whether loading pre-trained weights and fine-tuning.
--gpu: which gpu device to use. Empty string for CPU.
--data_dir: the directory of the dataset
--log_dir: the path for saving the log

Automated searching

In addition to running them by yourself, we also support automated model selection and hyperparameter tuning:

python3 examples/search.py

Some important hyperparameters are as follows.

--alg: the searching algorithm. Currently, we support random and hyperopt.
--num_samples: the number of samples to be tried
--gpu: which gpu device to use. Empty string for CPU.
--data_dir: the directory of the dataset

Search sapce can also be specified as Python Dictionaries. An example:

search_space = {
	"augmentation": {
		"aug_0": tune.choice([
			("arithmetic_AdditiveGaussianNoise",),
			("arithmetic_AdditiveLaplaceNoise",),
		]),
		"aug_1": tune.choice([
			("geometric_Rotate",),
			("geometric_Jigsaw",),
		]),
	},
	"multi_aug": tune.choice([
		"meta_Sometimes",
		"meta_Sequential",
	]),
	"algorithm": tune.choice(["tsn"]),
	"learning_rate": tune.uniform(0.0001, 0.001),
	"momentum": tune.uniform(0.9,0.99),
	"weight_decay": tune.uniform(5e-4,1e-3),
	"num_segments": tune.choice([8,16,32]),
}

Supported Action Recogoniton Algorithms

Autovideo

Install / Use

README

AutoVideo: An Automated Video Action Recognition System

Cite this work

Installation

Quick Start

Fitting and saving a pipeline

Loading a fitted pipeline and producing predictions

Loading a fitted pipeline and recogonizing actions

Fitting and producing a pipeline

Automated searching

Supported Action Recogoniton Algorithms