Bounding box regression RCNN我的理解

0. bounding-box regression

bouding-box regression 在R-CNN论文附录C中有详细的介绍，在后续的论文Fast-RCNN、Faster-RCNN、Mask-RCNN、SSD系列、yolo系列中都没有仔细介绍.
本文使用RCNN论文来介绍bounding box regression原理，同时利用faster rcnn代码来分析理论公式在代码中是如何实现的

R-CNN 论文地址：      https://arxiv.org/pdf/1311.2524v3.pdf
faster r-cnn地址 :    https://github.com/ShaoqingRen/faster_rcnn

1. bouding-box参数解释

RPN网络层与分类cls层并列的bbox层，其网络权重值是 Ground Truth与预测值P之间的参数，其网络输出值W*P是 proposal(rpn层的anchor，或者是 fast rcnn层的proposal)到 GT坐标值的四个变化系数

训练过程学习什么参数

学习的参数是bbox层的网络权重，因为bbox层有四个通道，分别对应四个输出值，可以将每个通道对应的卷积参数称之为 $w_x,w_y,w_w,w_h$ ，图像经过这层卷积之后就是四个值 $d_x(P), d_y(P), d_w(P), d_h(P)$ 了，为了方便表示，将这四个数字或者说函数的结果表示为 $t_x',t_y',t_w',t_h'$ ,也就是RPN网络中bbox层的输出。

2. 网络训练过程

理论公式

一方面将bbox层的输出 $t_x',t_y',t_w',t_h'$ 作为预测值，另一方面将 $t_x=(G_x - P_x)/ P_w$ $t_x=(G_x - P_x)/ P_w$ $t_x=(G_x - P_x)/ P_w$ $t_x=(G_x - P_x)/ P_w$
作为label，于是求使label与预测值最小的网络权重偏移参数 $w_x,w_y,w_w,w_h$ …这便是bbox层网络权重的更新过程。

其中 $G$ 是实际值，那么 $P$ 要怎么求解出来呢？

R-CNN的预测框是由 selective search方法得到的，称之为 proposal.于是这个proposal的x,y,w,h就用于和ground truth作比较。
RPN网络中 P是 9个anchor中的被保留下来的那个anchor， anchor经过上面的公式得到第一次优化的bounding-box，称为proposal。
Fast RCNN中将RPN的输出proposal作为P，再次寻求P到G之间的变换函数。
实际代码

function [regression_label] = fast_rcnn_bbox_transform(ex_boxes, gt_boxes)
% [regression_label] = fast_rcnn_bbox_transform(ex_boxes, gt_boxes)
% --------------------------------------------------------
% Fast R-CNN
% Reimplementation based on Python Fast R-CNN (https://github.com/rbgirshick/fast-rcnn)
% Copyright (c) 2015, Shaoqing Ren
% Licensed under The MIT License [see LICENSE for details]
% --------------------------------------------------------

    ex_widths = ex_boxes(:, 3) - ex_boxes(:, 1) + 1;
    ex_heights = ex_boxes(:, 4) - ex_boxes(:, 2) + 1;
    ex_ctr_x = ex_boxes(:, 1) + 0.5 * (ex_widths - 1);
    ex_ctr_y = ex_boxes(:, 2) + 0.5 * (ex_heights - 1);
    
    gt_widths = gt_boxes(:, 3) - gt_boxes(:, 1) + 1;
    gt_heights = gt_boxes(:, 4) - gt_boxes(:, 2) + 1;
    gt_ctr_x = gt_boxes(:, 1) + 0.5 * (gt_widths - 1);
    gt_ctr_y = gt_boxes(:, 2) + 0.5 * (gt_heights - 1);
    
    targets_dx = (gt_ctr_x - ex_ctr_x) ./ (ex_widths+eps);
    targets_dy = (gt_ctr_y - ex_ctr_y) ./ (ex_heights+eps);
    targets_dw = log(gt_widths ./ ex_widths);
    targets_dh = log(gt_heights ./ ex_heights);
    
    regression_label = [targets_dx, targets_dy, targets_dw, targets_dh];
end

迁移到RPN网络中的做法

其中上面的代码中ex_boxes即为faster rcnn论文中说到的筛选方法之后被选中的9个anchor中的一个，一个anchor有四个参数
在Fast RCNN的训练过程中，也就是Faster RCNN第二个bounding-box regression过程中，RPN网络产生的anchor经过RPN层后得到第一次优化的bounding-box，称为proposal，因为有NMS步骤，所以对于一个物体，最多有一个proposal框，拿这个proposal的四个参数再次和ground truth来运算，形成了Fast RCNN层的 $t_x,t_y,t_w,t_z$ 。于是就将proposal按照 $t_x,t_y,t_w,t_z$ 去调整为最终的输出。

在RPN网络训练过程中，anchor的四个数字认为是公式中的P。
在Fast-RCNN网络训练部分，P不再是anchor，而是由RPN网络得到的proposal框的四个值。

anchor生成过程可以参看这篇博客

3. 预测过程

理论公式

$\hat{G}_x= P_wd_x(P) + P_x$ $\hat{G}_y = P_hd_y(P) + P_y$ $\hat{G}_w = P_wexp(d_w(P))$ $\hat{G}_h = P_hexp(d_h(P))$
代码框架

for j = 1:2 % we warm up 2 times
   im = uint8(ones(375, 500, 3)*128);
   if opts.use_gpu
       im = gpuArray(im);
   end
   % proposal_im_detect是RPN网络输出结果的过程
   [boxes, scores]             = proposal_im_detect(proposal_detection_model.conf_proposal, rpn_net, im);
   % aboxes是经过NMS等过程后，挑选出合适的boxes
   aboxes                      = boxes_filter([boxes, scores], opts.per_nms_topN, opts.nms_overlap_thres, opts.after_nms_topN, opts.use_gpu);
   if proposal_detection_model.is_share_feature  
       %用于RPN层的卷积和Fast RCNN的卷积层共享参数, 要达到这个功能，需要按照论文那样四步走训练网络
       [boxes, scores]             = fast_rcnn_conv_feat_detect(proposal_detection_model.conf_detection, 
       								 fast_rcnn_net, im, 
           							 rpn_net.blobs(proposal_detection_model.last_shared_output_blob_name), 
           							 aboxes(:, 1:4), opts.after_nms_topN);
   else
       [boxes, scores]             = fast_rcnn_im_detect(proposal_detection_model.conf_detection, 
                                          fast_rcnn_net, im, aboxes(:, 1:4), opts.after_nms_topN);
   end
end

公式在代码中的应用

    % 在RPN网络中使用anchor来预测第一次的boxes
	box_deltas = output_blobs{1};    % 从rpn层的输出
	%获取到的anchors，经过NMS等操作处理
	anchors = proposal_locate_anchors(conf, size(im), conf.test_scales, featuremap_size);   
	% 利用anchor和 box_deltas求取预测框输出的过程 ，也是下面论文中的公式
	pred_boxes = fast_rcnn_bbox_transform_inv(anchors, box_deltas);

	%Faster RCNN中第二次bounding-box regression即Fast RCNN中的回归过程
	box_deltas = output_blobs{1};
	box_deltas = squeeze(box_deltas)';
	% 这里使用的是上一步产生的boxes
	pred_boxes = fast_rcnn_bbox_transform_inv(boxes, box_deltas);

4. R-CNN论文Bounding-box regression内容

Bounding box regression RCNN我的理解

0. bounding-box regression

1. bouding-box参数解释

RPN网络层与分类cls层并列的bbox层，其网络权重值是 Ground Truth与 预测值P之间的参数，其网络输出值W*P是 proposal(rpn层的anchor， 或者是 fast rcnn层的proposal)到 GT坐标值的四个变化系数

训练过程学习什么参数

2. 网络训练过程

理论公式

实际代码

迁移到RPN网络中的做法

3. 预测过程

理论公式

代码框架

公式在代码中的应用

4. R-CNN论文Bounding-box regression内容

另外不得不感叹R-CNN的附录图片真的超级漂亮！检测效果、美观程度兼备！

RPN网络层与分类cls层并列的bbox层，其网络权重值是 Ground Truth与预测值P之间的参数，其网络输出值W*P是 proposal(rpn层的anchor，或者是 fast rcnn层的proposal)到 GT坐标值的四个变化系数