【论文学习记录】Accurate Face Detection for High Performance

论文地址：Accurate Face Detection for High Performance。

论文基于RetinaFace提出新的网络结构AInnoFace，聚焦于改善小脸的检测性能。

一、整体情况

采用的是有6-level特征金字塔结构的ResNet-152作为网络的backbone来生成多尺度的feature maps，后接两个子网络，一个用于分类backbone的输出，一个用于回归bounding box。

为解决one-stage检测方法面临的正负样本不平衡问题，论文同样引入了focal loss，

【论文学习记录】Accurate Face Detection for High Performance

其中，【论文学习记录】Accurate Face Detection for High Performance 代表ground-truth正负样本类别

【论文学习记录】Accurate Face Detection for High Performance 表示预测label y=1的概率

【论文学习记录】Accurate Face Detection for High Performance 是一个平衡因子

【论文学习记录】Accurate Face Detection for High Performance 是一个可调节的focusing参数

二、IoU Regression Loss

这里引入的UnitBox的IoU Regression Loss，

【论文学习记录】Accurate Face Detection for High Performance

其中，

【论文学习记录】Accurate Face Detection for High Performance 是预测的bounding box

【论文学习记录】Accurate Face Detection for High Performance 是ground-truth bounding box

直接优化这二者的IoU。

三、Selective Refinement Network

为解决RetinaFace在人脸检测中依然存在的两个问题：（1）低召回率；（2）低定位精度，引入了SRN（Selective Refinement Network）中STC（Selective Two-step Classification）和STR（Selective Two-step Regression）。

【论文学习记录】Accurate Face Detection for High Performance

STC在三个浅层上做two-step的分类，以过滤大部分负样本和减少后面分类器的搜索空间。

【论文学习记录】Accurate Face Detection for High Performance

STR在三个深层上做two-step的回归，来对anchor做进一步的调整和为后面的回归网络提供更好的初始位置信息。

其中，i是anchor的索引

【论文学习记录】Accurate Face Detection for High Performance 是在1/2step的预测分类

【论文学习记录】Accurate Face Detection for High Performance 是在1/2step的回归预测值

【论文学习记录】Accurate Face Detection for High Performance 是ground truth的分类和bounding box值

【论文学习记录】Accurate Face Detection for High Performance 是在1/2step的anchors正样本数

【论文学习记录】Accurate Face Detection for High Performance 是输入2step的anchors样本集合

【论文学习记录】Accurate Face Detection for High Performance 是sigmoid focal loss

【论文学习记录】Accurate Face Detection for High Performance 表示仅在正的anchors上计算IoU Regression

四、Data Augmentation

论文除采用常规的数据增强方法外，还以50%的概率采用类似PyramidBox的data-anchor-sampling方法。

从训练图像的一个batch中随机选择一个尺度【论文学习记录】Accurate Face Detection for High Performance 的人脸，然后找到离它最近的一个anchor尺度，然后在附近随机选择一个，最后将图像以尺度做resize并随机裁剪到训练尺寸（裁剪部分包含人脸），就得到anchor-sampled训练数据。

五、Mat-out Label

为减少假正例，论文也引入了max-out操作，具体做法是分类子网络为每个anchor预测【论文学习记录】Accurate Face Detection for High Performance 维分数，然后再选择和作为最终人脸和非人脸的分数。论文设置，。

六、Multi-scale Testing

将图像以多种尺度分别用训练好的模型进行预测，得到检测结果之后，对bounding box进行投票得到最终结果。

七、实验结果

【论文学习记录】Accurate Face Detection for High Performance

Anchor的设置是一个长宽比1.25，两个尺度2S和【论文学习记录】Accurate Face Detection for High Performance （S是检测层的下采样尺度）。最终feature map上的每个位置有A=2两个anchors，覆盖了8x362像素的人脸（输入图像1024 x 1024）。

在STC阶段设置【论文学习记录】Accurate Face Detection for High Performance ，。

在STR阶段设置【论文学习记录】Accurate Face Detection for High Performance ，。

网络的backbone使用的是在ImageNet数据集上预训练的模型，新添加的卷积层使用“xavier”随机初始化参数，optimizer是SGD，momentum 0.9，weight decay 0.0001，batch size 32。学习率的Warmup策略是开始的5个epochs从0.0003125到0.1，然后在第10和100个epochs学习率除以10，一共训练130个epochs。

【论文学习记录】Accurate Face Detection for High Performance