SNIPER / AutoFocus: Efficient Multi-Scale Training / Inference

SNIPER is an efficient multi-scale training approach for instance-level recognition tasks like object detection and instance-level segmentation. Instead of processing all pixels in an image pyramid, SNIPER selectively processes context regions around the ground-truth objects (a.k.a chips). This significantly speeds up multi-scale training as it operates on low-resolution chips. Due to its memory-efficient design, SNIPER can benefit from Batch Normalization during training and it makes larger batch-sizes possible for instance-level recognition tasks on a single GPU. Hence, we do not need to synchronize batch-normalization statistics across GPUs and we can train object detectors similar to the way we do image classification!

AutoFocus, on the other hand, is an efficient multi-scale inference algorithm for deep-learning based object detectors. Instead of processing an entire image pyramid, AutoFocus adopts a coarse to fine approach and only processes regions that are likely to contain small objects at finer scales. This is achieved by predicting category agnostic segmentation maps for small objects at coarser scales, called FocusPixels. FocusPixels can be predicted with high recall, and in many cases, they only cover a small fraction of the entire image. To make efficient use of FocusPixels, an algorithm is proposed which generates compact rectangular FocusChips which enclose FocusPixels. The detector is while processing finer scales.

SNIPER is initially described in the following paper published at NeurIPS 2018:

<div class="highlight highlight-html"><pre> SNIPER: Efficient Multi-Scale Training <a href=https://github.com/bharatsingh430>Bharat Singh*</a>, <a href=https://github.com/mahyarnajibi>Mahyar Najibi*</a>, and Larry S. Davis (* denotes equal contribution) NeurIPS, 2018. </pre></div>

AutoFocus is initially described in the following paper published at ICCV 2019:

<div class="highlight highlight-html"><pre> AutoFocus: Efficient Multi-Scale Inference <a href=https://github.com/mahyarnajibi>Mahyar Najibi*</a>, <a href=https://github.com/bharatsingh430>Bharat Singh*</a>, and Larry S. Davis (* denotes equal contribution) ICCV, 2019. </pre></div>

Features

Train with a batch size of 160 images with a ResNet-101 backbone on 8 V100 GPUs
NO PYTHON LAYERS (Every layer is optimized for large batch sizes in CUDA/C++)
HALF PRECISION TRAINING with no loss in accuracy
5 Images/second during inference on a single V100 GPU, 47.8/68.2 on COCO using ResNet-101 and without training on segmentation masks
Use the lightweight MobileNetV2 model trained with SNIPER to get 34.3/54.5 on COCO without training on segmentation masks
The R-FCN-3K branch is also powered by SNIPER. Now 21% better than YOLO-9000 on ImageNetDet. This branch also supports on-the-fly training (in seconds) with very few samples (no bounding boxes needed!)
Train on OpenImagesV4 (14x bigger than COCO) with ResNet-101 in 3 days on a p3.x16.large AWS instance!

Results

COCO dataset

Here are the COCO results for SNIPER trained using this repository. The models are trained on the trainval set (using only the bounding box annotations) and evaluated on the test-dev set.

| | network architecture | pre-trained dataset | test dataset | mAP | mAP@0.5 | mAP@0.75| mAP@S | mAP@M | mAP@L | |:---------------------------------:|:---------------:|:---------------:|:------:|:---------:|:---------:|:-------:|:-------:|:-------:|:-------:| | SNIPER | ResNet-101 | ImageNet | test-dev15 | 46.5 | 67.5 | 52.2 | 30.0 | 49.4 | 58.4 | | SNIPER |ResNet-101 | OpenImagesV4 | test-dev15| 47.8 | 68.2 | 53.6 | 31.5 | 50.4 | 59.8 | | SNIPER | MobileNetV2 | ImageNet | test-dev15| 34.3 | 54.4 | 37.9 | 18.5 | 36.9 | 46.4 | | | | | | | | | | | | | AutoFocus | ResNet-101 | OpenImagesV4 | val-2017| 47.5 | 67.7 | 53.2 | 33.3 | 51.2 | 60.8 |

You can download the OpenImages pre-trained model by running bash scripts/download_pretrained_models.sh. The SNIPER detectors trained on both COCO (ResNet-101 and MobileNetV2) and PASCAL VOC datasets and the AutoFocus model trained on the COCO dataset (ResNet-101) can be downloaded by running bash scripts/download_sniper_autofocus_detectors.sh.

License

SNIPER is released under Apache license. See LICENSE for details.

Citing

@article{najibi2019autofocus,
  title={{AutoFocus}: Efficient Multi-Scale Inference},
  author={Najibi, Mahyar and Singh, Bharat and Davis, Larry S},
  journal={ICCV},
  year={2019}
}
@article{sniper2018,
  title={{SNIPER}: Efficient Multi-Scale Training},
  author={Singh, Bharat and Najibi, Mahyar and Davis, Larry S},
  journal={NeurIPS},
  year={2018}
}
@article{analysissnip2017,
  title={An analysis of scale invariance in object detection-snip},
  author={Singh, Bharat and Davis, Larry S},
  journal={CVPR},
  year={2018}
}

Installation
Running the demo
Training a model with SNIPER / AutoFocus
Evaluting SNIPER / AutoFocus models
Other methods and branches in this repo (SSH Face Detector, R-FCN-3K, open-images)

Installation

Clone the repository:

git clone --recursive https://github.com/mahyarnajibi/SNIPER.git

Compile the provided MXNet fork in the repository.

You need to install CUDA, CuDNN, OpenCV, and OpenBLAS. These libraries are set to be used by default in the provided config.mk file in the SNIPER-mxnet repository. You can use the make command to build the MXNet library:

cd SNIPER-mxnet
make -j [NUM_OF_PROCESS] USE_CUDA_PATH=[PATH_TO_THE_CUDA_FOLDER]

If you plan to train models on multiple GPUs, it is optional but recommended to install NCCL and build MXNet with the NCCL support as instructed below:

make -j [NUM_OF_PROCESS] USE_CUDA_PATH=[PATH_TO_THE_CUDA_FOLDER] USE_NCCL=1

In this case, you may also need to set the USE_NCCL_PATH variable in the above command to point to your NCCL installation path.

If you need more information on how to compile MXNet please see here.

Compile the C++ files in the lib directory. The following script compiles them all:

bash scripts/compile.sh

Install the required python packages:

pip install -r requirements.txt

Running the demo

For running the demo, you need to download the provided SNIPER models. The following script downloads SNIPER models and extracts them into the default location:

bash download_sniper_autofocus_detectors.sh

After downloading the model, the following command would run the SNIPER detector trained on the COCO dataset with the default configs on the provided sample image:

python demo.py

If everything goes well, the sample detections would be saved as data/demo/demo_detections.jpg.

You can also run the detector on an arbitrary image by providing its path to the script:

python demo.py --im_path [PATH to the image]

However, if you plan to run the detector on multiple images, please consider using the provided multi-process and multi-batch main_test module.

You can also test the provided SNIPER model based on the MobileNetV2 architecture trained on the COCO dataset by passing the provided config file as follows:

python demo.py --cfg configs/faster/sniper_mobilenetv2_e2e.yml

Training a model with SNIPER / AutoFocus

For training SNIPER/AutoFocus, you first need to download the pre-trained models and configure the datasets as described below.

Downloading pre-trained models

Running the following script downloads and extracts the pre-trained models into the default path (data/pretrained_model):

bash download_pretrained_models.sh

Configuring the dataset

COCO dataset:

Please follow the official COCO dataset website to download the dataset. After downloading the dataset you should have the following directory structure:

data
  |--coco
      |--annotations
      |--images

PASCAL VOC dataset:

Please download the training, validation, and test subsets from the [official Pascal VOC dataset website (http://host.robots.ox.ac.uk/pascal/VOC/). After downloading the dataset you should have the following directory structure:

data
  |--VOCdevkit
      |--VOC2007
      |--VOC2012

Training the SNIPER detector

You can train the SNIPER detector with or without negative chip mining as described below.

Training with Negative Chip Mining:

Negative chip mining results in a relative improvement in AP (please refer to the paper for the details). To determine the candidate hard negative regions, SNIPER uses proposals extracted from a proposal network trained for a short training schedule.

For COCO and Pascal VOC datasets, we provide the pre-computed proposals. The following commands download the pre-computed proposals, extra

SNIPER

Install / Use

README