VideoCapsuleNet

Code for VideoCapsuleNet: A Simplified Network for Action Detection

Generate Convert Improve

Install / Use

/learn @KevinDuarte/VideoCapsuleNet

About this skill

Quality Score

0/100

README

VideoCapsuleNet

This is the code for the NeurIPS 2018 paper VideoCapsuleNet: A Simplified Network for Action Detection.

The paper can be found here: http://papers.nips.cc/paper/7988-videocapsulenet-a-simplified-network-for-action-detection

The network is implemented using TensorFlow 1.4.1.

Python packages used: numpy, scipy, scikit-video

Files and their use

caps_layers.py: Contains the functions required to construct capsule layers - (primary, convolutional, and fully-connected).
caps_network.py: Contains the VideoCapsuleNet model.
caps_main.py: Contains the main function, which is called to train the network.
config.py: Contains several different hyperparameters used for the network, training, or inference.
get_iou.py: Contains the function used to evaluate the network.
inference.py: Contains the inference code.
load_ucf101_data.py: Contains the data-generator for UCF-101.
output2.txt: This is a sample output file for training and testing

Data Used

We have supplied the code for training and testing the model on the UCF-101 dataset. The file <code>load_ucf101_data.py</code> creates two DataLoaders - one for training and one for testing. The <code>dataset_dir</code> variable at the top of the file should be set to the base directory which contains the frames and annotations..

To run this code, you need to do the following:

Download the UCF-101 dataset at http://crcv.ucf.edu/data/UCF101.php
- Extract the frames from each video (downsized to 160x120), and store them as .jpeg files, with the names "frame_K.jpg" where K is the frame number, from 0 to T-1. The path to the frames should be: <code>[dataset_dir]/UCF101_Frames/[Video Name]/frame_K.jpg</code>.
Download the trainAnnot.mat and testAnnot.mat Annotations from https://github.com/gurkirt/corrected-UCF101-Annots and the path to the annotations should be <code>[dataset_dir]/UCF101_Annotations/*.mat</code>

Training the Model

Once the data is set up you can train (and test) the network by calling <code>python3 caps_main.py</code>.

To get similar results found in the paper, the pretrained C3D weights are needed (see <code>readme.txt</code>) in the pretrained_weights folder.

The <code>config.py</code> file contains several hyper-parameters which are useful for training the network.

Output File

During training and testing, metrics are printed to stdout as well as an output*.txt file. During training/validation, the losses and accuracies are printed out. At test time, the accuracy, f-mAP and v-mAP scores (for many IoU thresholds), and f-AP@IoU=0.5 and v-AP@IoU=0.5 for each class, are printed out.

An example of this is found in <code>output2.txt</code>. These are not the same results as those found in the paper (since cleaning the code led to different variable names, so using the same weights would be difficult to transfer) but they are comparable.

Saved Weights

As the network is trained, the best weights are being saved to the network_saves folder. The weights for the network trained on UCF-101 can be found here. Unzip the file and place the three .ckpt files in the network_saves folder. These weights correspond the the results found in <code>output2.txt</code>.

Testing the Model

If you just want to test the model using the weights above, uncomment <code>#iou()</code> at the bottom of the <code>get_iou.py</code> file, and <code>run python3 get_iou.py</code>.

Inference

If you just want to obtain the segmentation for a single video, you can use <code>inference.py</code>. An example video from UCF-101 is given.

Error occured Loading gif

Running <code>inference.py</code> saves the cropped video (first resized to HxW=120x160 and cropped to HxW=112x112) as well as the segmented video: <code>cropped_vid.avi</code> and <code>segmented_vid.avi</code> respectively.

Error occured Loading gif

Related Skills

qqbot-channel

349.0k

QQ 频道管理技能。查询频道列表、子频道、成员、发帖、公告、日程等操作。使用 qqbot_channel_api 工具代理 QQ 开放平台 HTTP 接口，自动处理 Token 鉴权。当用户需要查看频道、管理子频道、查询成员、发布帖子/公告/日程时使用。

docs-writer

100.3k

`docs-writer` skill instructions As an expert technical writer and editor for the Gemini CLI project, you produce accurate, clear, and consistent documentation. When asked to write, edit, or revie

model-usage

349.0k

Use CodexBar CLI local cost usage to summarize per-model usage for Codex or Claude, including the current (most recent) model or a full model breakdown. Trigger when asked for model-level usage/cost data from codexbar, or when you need a scriptable per-model summary from codexbar cost JSON.

Design

Campus Second-Hand Trading Platform \- General Design Document (v5.0 \- React Architecture \- Complete Final Version)1\. System Overall Design 1.1. Project Overview This project aims t

KevinDuarte

View profile

View on GitHub

GitHub Stars59

CategoryContent

Updated1y ago

Forks15

KevinDuarte/VideoCapsuleNet

Languages

Python

Security Score

65/100

Audited on Sep 5, 2024

No findings