CRAFT
Codes for CVPR2016 paper "CRAFT Objects from Images"
Install / Use
/learn @byangderek/CRAFTREADME
README
The codes are with the CVPR2016 paper "CRAFT Objects from Images".
In a word, we extend the conventional two-stage object detection framework (first locating object proposals, then classifying object categories) to a four-stage pipeline, in which the proposal localization task is solved with a cascade network of Region Proposal Network (RPN) and Fast R-CNN to improve the proposal quality, while the object classification task is handled by a cascade network of two Fast R-CNN nets with different objective functions (one-hot classification and one-vs-rest classification) to eliminate false positives.
We name our approach "CRAFT" (short for "Cascade Rpn And FasT-rcnn") and show considerable improvement over Fast R-CNN and Faster R-CNN baselines on PASCAL VOC 07/12 and ILSVRC datasets. For more details please refer to our CVPR2016 paper.
The codes are built on RPN (Stage 1) and Fast R-CNN (Stage 2,3,4). It would be easier to use the codes if you are familiar with these two projects.
The codes are tested on Ubuntu 14.04, 256GB Memory, Titan X GPU, MATLAB R2015a.
Preparation
- Follow instructions in Faster R-CNN to make the codes in
1_RPN, using Caffe provided by Shaoqing Ren - Follow instructions in Fast R-CNN to make the codes in
2_CasRPN,3_FRCN, and4_CasFRCN, using our slightly modified Caffe - Download the VGG16 pre-trained model and PASCAL VOC 2012 dataset and make proper links pointing to them
- You can create a soft link of folders
caffe-fast-rcnnanddatafor2_CasRPN,3_FRCN, and4_CasFRCNfor convenience.
Training and testing
The whole pipeline is stage-wise. Now we show how to train an object detector using CRAFT approach on PASCAL VOC 2012 train+val dataset and test it on PASCAL VOC 2012 test set. For simplicity, we do not use joint training between RPN and Fast R-CNN networks.
Stage 1. RPN
cd 1_RPN
matlab ./experiments/script_faster_rcnn_VOC2012_VGG16.m
matlab saveProposals.m
Stage 2. CasRPN
cd 2_CasRPN
bash train.sh
bash test.sh
matlab saveProposals.m
Stage 3. FRCN
cd 3_FRCN
bash train.sh
bash test.sh
matlab saveDetections.m
Stage 4. CasFRCN
cd 4_CasFRCN
bash train.sh
bash test.sh
Results
| training data | test data | mAP
------------------------- |:--------------------------------------:|:--------------------:|:-----: CRAFT, VGG-16 | VOC 2007 trainval + 2012 trainval | VOC 2007 test | 75.7% CRAFT, VGG-16 | VOC 2012 trainval | VOC 2012 test | 71.3%
Note: The real mAP results may vary a little from the above results reported in the paper. We do not adopt joint training between RPN and Fast R-CNN currently.
Reference
If you use our codes in your research, we are grateful if you cite the paper:
@inproceedings{binyang16craft,
title={Craft Objects from Images},
author={Yang, Bin and Yan, Junjie and Lei, Zhen and Li, Stan},
booktitle={Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition},
year={2016}
}
Acknowledgement
We give our sincere gratitude to the following people, groups and institutions:
- Anonymous reviewers
- Ross Girshick for the Fast R-CNN project
- Shaoqing Ren for the Faster R-CNN project
- Caffe team
- VGG team
- SenseTime Group Limited
- NVIDIA Corporation
Related Skills
node-connect
345.9kDiagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps
frontend-design
106.4kCreate distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.
openai-whisper-api
345.9kTranscribe audio via OpenAI Audio Transcriptions API (Whisper).
qqbot-media
345.9kQQBot 富媒体收发能力。使用 <qqmedia> 标签,系统根据文件扩展名自动识别类型(图片/语音/视频/文件)。
