VideoCapsuleNet
Code for VideoCapsuleNet: A Simplified Network for Action Detection
Install / Use
/learn @KevinDuarte/VideoCapsuleNetREADME
VideoCapsuleNet
This is the code for the NeurIPS 2018 paper VideoCapsuleNet: A Simplified Network for Action Detection.
The paper can be found here: http://papers.nips.cc/paper/7988-videocapsulenet-a-simplified-network-for-action-detection
The network is implemented using TensorFlow 1.4.1.
Python packages used: numpy, scipy, scikit-video
Files and their use
- caps_layers.py: Contains the functions required to construct capsule layers - (primary, convolutional, and fully-connected).
- caps_network.py: Contains the VideoCapsuleNet model.
- caps_main.py: Contains the main function, which is called to train the network.
- config.py: Contains several different hyperparameters used for the network, training, or inference.
- get_iou.py: Contains the function used to evaluate the network.
- inference.py: Contains the inference code.
- load_ucf101_data.py: Contains the data-generator for UCF-101.
- output2.txt: This is a sample output file for training and testing
Data Used
We have supplied the code for training and testing the model on the UCF-101 dataset. The file <code>load_ucf101_data.py</code> creates two DataLoaders - one for training and one for testing. The <code>dataset_dir</code> variable at the top of the file should be set to the base directory which contains the frames and annotations..
To run this code, you need to do the following:
- Download the UCF-101 dataset at http://crcv.ucf.edu/data/UCF101.php
- Extract the frames from each video (downsized to 160x120), and store them as .jpeg files, with the names "frame_K.jpg" where K is the frame number, from 0 to T-1. The path to the frames should be: <code>[dataset_dir]/UCF101_Frames/[Video Name]/frame_K.jpg</code>.
- Download the trainAnnot.mat and testAnnot.mat Annotations from https://github.com/gurkirt/corrected-UCF101-Annots and the path to the annotations should be <code>[dataset_dir]/UCF101_Annotations/*.mat</code>
Training the Model
Once the data is set up you can train (and test) the network by calling <code>python3 caps_main.py</code>.
To get similar results found in the paper, the pretrained C3D weights are needed (see <code>readme.txt</code>) in the pretrained_weights folder.
The <code>config.py</code> file contains several hyper-parameters which are useful for training the network.
Output File
During training and testing, metrics are printed to stdout as well as an output*.txt file. During training/validation, the losses and accuracies are printed out. At test time, the accuracy, f-mAP and v-mAP scores (for many IoU thresholds), and f-AP@IoU=0.5 and v-AP@IoU=0.5 for each class, are printed out.
An example of this is found in <code>output2.txt</code>. These are not the same results as those found in the paper (since cleaning the code led to different variable names, so using the same weights would be difficult to transfer) but they are comparable.
Saved Weights
As the network is trained, the best weights are being saved to the network_saves folder. The weights for the network trained on UCF-101 can be found here. Unzip the file and place the three .ckpt files in the network_saves folder. These weights correspond the the results found in <code>output2.txt</code>.
Testing the Model
If you just want to test the model using the weights above, uncomment <code>#iou()</code> at the bottom of the <code>get_iou.py</code> file, and <code>run python3 get_iou.py</code>.
Inference
If you just want to obtain the segmentation for a single video, you can use <code>inference.py</code>. An example video from UCF-101 is given.

Running <code>inference.py</code> saves the cropped video (first resized to HxW=120x160 and cropped to HxW=112x112) as well as the segmented video: <code>cropped_vid.avi</code> and <code>segmented_vid.avi</code> respectively.

Related Skills
qqbot-channel
349.0kQQ 频道管理技能。查询频道列表、子频道、成员、发帖、公告、日程等操作。使用 qqbot_channel_api 工具代理 QQ 开放平台 HTTP 接口,自动处理 Token 鉴权。当用户需要查看频道、管理子频道、查询成员、发布帖子/公告/日程时使用。
docs-writer
100.3k`docs-writer` skill instructions As an expert technical writer and editor for the Gemini CLI project, you produce accurate, clear, and consistent documentation. When asked to write, edit, or revie
model-usage
349.0kUse CodexBar CLI local cost usage to summarize per-model usage for Codex or Claude, including the current (most recent) model or a full model breakdown. Trigger when asked for model-level usage/cost data from codexbar, or when you need a scriptable per-model summary from codexbar cost JSON.
Design
Campus Second-Hand Trading Platform \- General Design Document (v5.0 \- React Architecture \- Complete Final Version)1\. System Overall Design 1.1. Project Overview This project aims t
