PSROIAlign

(Oriented) PsRoIAlign Operation In Tensorflow C++ API

Generate Convert Improve

Install / Use

/learn @HiKapok/PSROIAlign

About this skill

Quality Score

0/100

README

PsRoIAlign Operation In Tensorflow C++ API

PsRoIAlign involves interpolation techniques for PsRoiPooling (position-sensitive RoI pooling operation), the interpolation idea is proposed in RoIAlign to avoid any quantization of the RoI boundaries. The first adoption of PsRoIAlign might be in this paper Light-Head R-CNN: In Defense of Two-Stage Object Detector.

You can find more details about the RoiPooling technique in Fast R-CNN and Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition.

This repository contains code of the implement of PsRoIAlign operation in Tensorflow C++ API. You can use this operation in many popular two-stage object detector. Both research work using PsRoIAlign and contribution to this repository are welcomed.

For using this op in your own machine, just following these steps:

copy the header file "cuda_config.h" from "your_python_path/site-packages/external/local_config_cuda/cuda/cuda/cuda_config.h" to "your_python_path/site-packages/tensorflow/include/tensorflow/stream_executor/cuda/cuda_config.h".
run the following script:

mkdir build
cd build && cmake ..
make

run "test_op.py" and check the numeric errors to test your install

follow the below codes snippet to integrate this Op into your own code:

op_module = tf.load_op_library(so_lib_path)
ps_roi_align = op_module.ps_roi_align

@ops.RegisterGradient("PsRoiAlign")
def _ps_roi_align_grad(op, grad, _):
  '''The gradients for `PsRoiAlign`.
  '''
  inputs_features = op.inputs[0]
  rois = op.inputs[1]
  pooled_features_grad = op.outputs[0]
  pooled_index = op.outputs[1]
  grid_dim_width = op.get_attr('grid_dim_width')
  grid_dim_height = op.get_attr('grid_dim_height')

  return [op_module.ps_roi_align_grad(inputs_features, rois, grad, pooled_index, grid_dim_width, grid_dim_height, pool_method), None]

pool_method = 'max' # or 'mean'
pool_result = ps_roi_align(features, rois, 2, 2, pool_method)

The code is tested under TensorFlow 1.6 with CUDA 8.0 using Ubuntu 16.04. This PsRoIAlign Op had been used to train Xception based Light-Head RCNN successfully with performance at ~75%mAP on PASCAL VOC 2007 Test dataset, you can see codes here.

Update:

Added support for mean pooling (default is max pooling)
PsRoIAlign now support oriented RoI inputs (for both max and mean pooling).

Future Work:

Check if there is need to ensure the convex of polygon
Improve performance

If you encountered some linkage problem when generating or loading *.so, you are highly recommended to read this section in the official tourial to make sure you were using the same C++ ABI version.

The MIT License

Related Skills

node-connect

343.3k

Diagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps

frontend-design

92.1k

Create distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.

openai-whisper-api

343.3k

Transcribe audio via OpenAI Audio Transcriptions API (Whisper).

qqbot-media

343.3k

QQBot 富媒体收发能力。使用 <qqmedia> 标签，系统根据文件扩展名自动识别类型（图片/语音/视频/文件）。