PSROIAlign
(Oriented) PsRoIAlign Operation In Tensorflow C++ API
Install / Use
/learn @HiKapok/PSROIAlignREADME
PsRoIAlign Operation In Tensorflow C++ API
PsRoIAlign involves interpolation techniques for PsRoiPooling (position-sensitive RoI pooling operation), the interpolation idea is proposed in RoIAlign to avoid any quantization of the RoI boundaries. The first adoption of PsRoIAlign might be in this paper Light-Head R-CNN: In Defense of Two-Stage Object Detector.
You can find more details about the RoiPooling technique in Fast R-CNN and Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition.
This repository contains code of the implement of PsRoIAlign operation in Tensorflow C++ API. You can use this operation in many popular two-stage object detector. Both research work using PsRoIAlign and contribution to this repository are welcomed.
For using this op in your own machine, just following these steps:
-
copy the header file "cuda_config.h" from "your_python_path/site-packages/external/local_config_cuda/cuda/cuda/cuda_config.h" to "your_python_path/site-packages/tensorflow/include/tensorflow/stream_executor/cuda/cuda_config.h".
-
run the following script:
mkdir build
cd build && cmake ..
make
-
run "test_op.py" and check the numeric errors to test your install
-
follow the below codes snippet to integrate this Op into your own code:
op_module = tf.load_op_library(so_lib_path) ps_roi_align = op_module.ps_roi_align @ops.RegisterGradient("PsRoiAlign") def _ps_roi_align_grad(op, grad, _): '''The gradients for `PsRoiAlign`. ''' inputs_features = op.inputs[0] rois = op.inputs[1] pooled_features_grad = op.outputs[0] pooled_index = op.outputs[1] grid_dim_width = op.get_attr('grid_dim_width') grid_dim_height = op.get_attr('grid_dim_height') return [op_module.ps_roi_align_grad(inputs_features, rois, grad, pooled_index, grid_dim_width, grid_dim_height, pool_method), None] pool_method = 'max' # or 'mean' pool_result = ps_roi_align(features, rois, 2, 2, pool_method)
The code is tested under TensorFlow 1.6 with CUDA 8.0 using Ubuntu 16.04. This PsRoIAlign Op had been used to train Xception based Light-Head RCNN successfully with performance at ~75%mAP on PASCAL VOC 2007 Test dataset, you can see codes here.
Update:
- Added support for mean pooling (default is max pooling)
- PsRoIAlign now support oriented RoI inputs (for both max and mean pooling).
Future Work:
- Check if there is need to ensure the convex of polygon
- Improve performance
If you encountered some linkage problem when generating or loading *.so, you are highly recommended to read this section in the official tourial to make sure you were using the same C++ ABI version.
The MIT License
Related Skills
node-connect
343.3kDiagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps
frontend-design
92.1kCreate distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.
openai-whisper-api
343.3kTranscribe audio via OpenAI Audio Transcriptions API (Whisper).
qqbot-media
343.3kQQBot 富媒体收发能力。使用 <qqmedia> 标签,系统根据文件扩展名自动识别类型(图片/语音/视频/文件)。
