Detectron.pytorch
A pytorch implementation of Detectron. Both training from scratch and inferring directly from pretrained Detectron weights are available.
Install / Use
/learn @roytseng-tw/Detectron.pytorchREADME
Use this instead: https://github.com/facebookresearch/maskrcnn-benchmark
A Pytorch Implementation of Detectron
<div align="center"> <img src="demo/33823288584_1d21cf0a26_k-pydetectron-R101-FPN.jpg" width="700px"/> <p> Example output of <b>e2e_mask_rcnn-R-101-FPN_2x</b> using Detectron pretrained weight.</p> <img src="demo/33823288584_1d21cf0a26_k-detectron-R101-FPN.jpg" width="700px"/> <p>Corresponding example output from Detectron. </p> <img src="demo/img1_keypoints-pydetectron-R50-FPN.jpg" width="700px"/> <p>Example output of <b>e2e_keypoint_rcnn-R-50-FPN_s1x</b> using Detectron pretrained weight.</p> </div>This code follows the implementation architecture of Detectron. Only part of the functionality is supported. Check this section for more information.
With this code, you can...
- Train your model from scratch.
- Inference using the pretrained weight file (*.pkl) from Detectron.
This repository is originally built on jwyang/faster-rcnn.pytorch. However, after many modifications, the structure changes a lot and it's now more similar to Detectron. I deliberately make everything similar or identical to Detectron's implementation, so as to reproduce the result directly from official pretrained weight files.
This implementation has the following features:
-
It is pure Pytorch code. Of course, there are some CUDA code.
-
It supports multi-image batch training.
-
It supports multiple GPUs training.
-
It supports three pooling methods. Notice that only roi align is revised to match the implementation in Caffe2. So, use it.
-
It is memory efficient. For data batching, there are two techiniques available to reduce memory usage: 1) Aspect grouping: group images with similar aspect ratio in a batch 2) Aspect cropping: crop images that are too long. Aspect grouping is implemented in Detectron, so it's used for default. Aspect cropping is the idea from jwyang/faster-rcnn.pytorch, and it's not used for default.
Besides of that, I implement a customized
nn.DataParallelmodule which enables different batch blob size on different gpus. Check My nn.DataParallel section for more details about this.
News
- (2018/05/25) Support ResNeXt backbones.
- (2018/05/22) Add group normalization baselines.
- (2018/05/15) PyTorch0.4 is supported now !
Getting Started
Clone the repo:
git clone https://github.com/roytseng-tw/mask-rcnn.pytorch.git
Requirements
Tested under python3.
- python packages
- pytorch>=0.3.1
- torchvision>=0.2.0
- cython
- matplotlib
- numpy
- scipy
- opencv
- pyyaml
- packaging
- pycocotools — for COCO dataset, also available from pip.
- tensorboardX — for logging the losses in Tensorboard
- An NVIDAI GPU and CUDA 8.0 or higher. Some operations only have gpu implementation.
- NOTICE: different versions of Pytorch package have different memory usages.
Compilation
Compile the CUDA code:
cd lib # please change to this directory
sh make.sh
If your are using Volta GPUs, uncomment this line in lib/mask.sh and remember to postpend a backslash at the line above. CUDA_PATH defaults to /usr/loca/cuda. If you want to use a CUDA library on different path, change this line accordingly.
It will compile all the modules you need, including NMS, ROI_Pooing, ROI_Crop and ROI_Align. (Actually gpu nms is never used ...)
Note that, If you use CUDA_VISIBLE_DEVICES to set gpus, make sure at least one gpu is visible when compile the code.
Data Preparation
Create a data folder under the repo,
cd {repo_root}
mkdir data
-
COCO: Download the coco images and annotations from coco website.
And make sure to put the files as the following structure:
coco ├── annotations | ├── instances_minival2014.json │ ├── instances_train2014.json │ ├── instances_train2017.json │ ├── instances_val2014.json │ ├── instances_val2017.json │ ├── instances_valminusminival2014.json │ ├── ... | └── images ├── train2014 ├── train2017 ├── val2014 ├──val2017 ├── ...Download coco mini annotations from here. Please note that minival is exactly equivalent to the recently defined 2017 val set. Similarly, the union of valminusminival and the 2014 train is exactly equivalent to the 2017 train set.
Feel free to put the dataset at any place you want, and then soft link the dataset under the
data/folder:ln -s path/to/coco data/cocoRecommend to put the images on a SSD for possible better training performance
Pretrained Model
I use ImageNet pretrained weights from Caffe for the backbone networks.
Download them and put them into the {repo_root}/data/pretrained_model.
You can the following command to download them all:
- extra required packages:
argparse_color_formater,colorama,requests
python tools/download_imagenet_weights.py
NOTE: Caffe pretrained weights have slightly better performance than Pytorch pretrained. Suggest to use Caffe pretrained models from the above link to reproduce the results. By the way, Detectron also use pretrained weights from Caffe.
If you want to use pytorch pre-trained models, please remember to transpose images from BGR to RGB, and also use the same data preprocessing (minus mean and normalize) as used in Pytorch pretrained model.
ImageNet Pretrained Model provided by Detectron
Besides of using the pretrained weights for ResNet above, you can also use the weights from Detectron by changing the corresponding line in model config file as follows:
RESNETS:
IMAGENET_PRETRAINED_WEIGHTS: 'data/pretrained_model/R-50.pkl'
R-50-GN.pkl and R-101-GN.pkl are required for gn_baselines.
X-101-32x8d.pkl, X-101-64x4d.pkl and X-152-32x8d-IN5k.pkl are required for ResNeXt backbones.
Training
DO NOT CHANGE anything in the provided config files(configs/**/xxxx.yml) unless you know what you are doing
Use the environment variable CUDA_VISIBLE_DEVICES to control which GPUs to use.
Adapative config adjustment
Let's define some terms first
batch_size: NUM_GPUS x TRAIN.IMS_PER_BATCH
effective_batch_size: batch_size x iter_size
change of somethining: new value of something / old value of something
Following config options will be adjusted automatically according to actual training setups: 1) number of GPUs NUM_GPUS, 2) batch size per GPU TRAIN.IMS_PER_BATCH, 3) update period iter_size
SOLVER.BASE_LR: adjust directly propotional to the change of batch_size.SOLVER.STEPS,SOLVER.MAX_ITER: adjust inversely propotional to the change of effective_batch_size.
Train from scratch
Take mask-rcnn with res50 backbone for example.
python tools/train_net_step.py --dataset coco2017 --cfg configs/baselines/e2e_mask_rcnn_R-50-C4.yml --use_tfboard --bs {batch_size} --nw {num_workers}
Use --bs to overwrite the default batch size to a proper value that fits into your GPUs. Simliar for --nw, number of data loader threads defaults to 4 in config.py.
Specify —-use_tfboard to log the losses on Tensorboard.
NOTE: use --dataset keypoints_coco2017 when training for keypoint-rcnn.
The use of --iter_size
As in Caffe, update network once (optimizer.step()) every iter_size iterations (forward + backward). This way to have a larger effective batch size for training. Notice that, step count is only increased after network update.
python tools/train_net_step.py --dataset coco2017 --cfg configs/baselines/e2e_mask_rcnn_R-50-C4.yml --bs 4 --iter_size 4
iter_size defaults to 1.
Finetune from a pretrained checkpoint
python tools/train_net_step.py ... --load_ckpt {path/to/the/checkpoint}
or using Detectron's checkpoint file
python tools/train_net_step.py ... --load_detectron {path/to/the/checkpoint}
Resume training with the same dataset and batch size
python tools/train_net_step.py ... --load_ckpt {path/to/the/checkpoint} --resume
When resume the training, step count and **optimizer stat
