MVDeTr
[ACM MM 2021] Multiview Detection with Shadow Transformer (and View-Coherent Data Augmentation)
Install / Use
/learn @hou-yz/MVDeTrREADME
Multiview Detection with Shadow Transformer (and View-Coherent Data Augmentation) [arXiv] [paper]
@inproceedings{hou2021multiview,
title={Multiview Detection with Shadow Transformer (and View-Coherent Data Augmentation)},
author={Hou, Yunzhong and Zheng, Liang},
booktitle={Proceedings of the 29th ACM International Conference on Multimedia (MM ’21)},
year={2021}
}
Overview
We release the PyTorch code for MVDeTr, a state-of-the-art multiview pedestrian detector. Its superior performance should be credited to transformer architectures, updated loss terms, and view-coherent data augmentations. Moreover, MVDeTr is also very efficient and can be trained on a single RTX 2080TI. This repo also includes a simplified version of MVDet, which also runs on a single RTX 2080TI.
Content
MVDeTr Code
This repo is dedicated to the code for MVDeTr.
<!--  -->Dependencies
This code uses the following libraries
- python
- pytorch & tochvision
- numpy
- matplotlib
- pillow
- opencv-python
- kornia
Data Preparation
By default, all datasets are in ~/Data/. We use MultiviewX and Wildtrack in this project.
Your ~/Data/ folder should look like this
Data
├── MultiviewX/
│ └── ...
└── Wildtrack/
└── ...
Code Preparation
Before running the code, one should go to multiview_detector/models/ops and run bash mask.sh to build the deformable transformer (forked from Deformable DETR).
Training
In order to train classifiers, please run the following,
python main.py -d wildtrack
python main.py -d multiviewx
This should automatically return evaluation results similar to the reported 91.5% MODA on Wildtrack dataset and 93.7% MODA on MultiviewX dataset.
Architectures
This repo supports multiple architecture variants. For MVDeTr, please specify --world_feat deform_trans; for a similar fully convolutional architecture like MVDet, please specify --world_feat conv.
Loss terms
This repo supports multiple loss terms. For the focal loss variant as in MVDeTr, please specify --use_mse 0; for the MSE loss as in MVDet, please specify ----use_mse 1.
Augmentations
This repo includes support for view coherent data augmentation, which applies affine transformations onto the per-view inputs, and then invert the per-view feature maps to maintain multiview coherency.
Pre-trained models
You can download the checkpoints at this link.
Related Skills
node-connect
353.1kDiagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps
frontend-design
111.6kCreate distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.
openai-whisper-api
353.1kTranscribe audio via OpenAI Audio Transcriptions API (Whisper).
qqbot-media
353.1kQQBot 富媒体收发能力。使用 <qqmedia> 标签,系统根据文件扩展名自动识别类型(图片/语音/视频/文件)。
