【SENet】《Squeeze-and-Excitation Networks》
CVPR-2018


caffe 代码:https://github.com/hujie-frank/SENet
caffe 代码可视化工具:http://ethereon.github.io/netscope/#/editor



1 Background and Motivation

CNN extract informative features by fusing spatial and channel-wise information together within local receptive fields.

为了 增强 CNN 的 representation power

  • several methods:enhancing spatial encoding(spatial correlations、spatial attention)比如,inception 中的多个感受野 concatenate
  • 作者:focus on the channel relationship

因此作者设计出 Squeeze-and-Excitation 结构,emphasise informative features and suppress less useful ones(channel-wise)

2 Advantages

  • ILSVRC2017 classification first place
  • reduced the top-5 error to 2.251%

The development of new CNN architectures is a challenging engineering task, typically involving the selection of many new hyperparameters and layer configurations.

3 Related work

  • Deep architecture

  • Attention and gating mechanisms

4 Method

the role it performs at different depths adapts to the needs of the network

  • In the early layers, it learns to excite informative features in a class agnostic manner
  • In later layers, the SE block becomes increasingly specialised(in a highly class-specific manner)

SE 结构可以自成一派(用 SE block stacking 成 neural network),也可以中西结合,即插即用,as a drop-in replacement for the original block at any depth in the architecture(eg,resnet、resnext 的 bottleneck block).

【SENet】《Squeeze-and-Excitation Networks》
FtrF_{tr} is a convolution operator ,Ftr:XUF_{tr}:X→U,具体运算如下
【SENet】《Squeeze-and-Excitation Networks》
对于上面公式的理解,可以参考如下图片
【SENet】《Squeeze-and-Excitation Networks》
XX 即为输入的 feature map,VcV_c (上图红色字体,小写c,黑色的为大写)为某个 filter, ucu_c (上图红色字体,小写c,黑色的为大写)是某个生成的结果!

其中 V=[v1,v2,...,vC]V = [v_1,v_2,...,v_C](大写的 C)
其中 U=[u1,u2,...,uC]U = [u_1,u_2,...,u_C](大写的 C)
【SENet】《Squeeze-and-Excitation Networks》
【SENet】《Squeeze-and-Excitation Networks》
上标表示 spatial kernel,也即我图片中拆分出 XXvcv_c(小写的 c )画出来的部分!这样就明朗了很多,至于从这个耳熟能详的公式,如何就能引发对 channels 的特征重要性的思考,进而提出 SE block 的结构,我目前还体会不出来!

4.1 Squeeze: Global Information Embedding

【SENet】《Squeeze-and-Excitation Networks》
【SENet】《Squeeze-and-Excitation Networks》
ucu_c 是特征图 uucc 通道,上面的公式是对该通道进行 global average pooling,zcz_c 为标量,是 channel descriptor(如下图) 的一小格
【SENet】《Squeeze-and-Excitation Networks》

当然,global average pooling 只是一种统计全局信息的方式,more sophisticated aggregation strategies could be employed here as well.

4.2 Excitation: Adaptive Recalibration

两个设计准则

  • flexible(channels 之间有 non-linear interaction)
  • non-mutually-exclusive(非互斥的,避免 one-hot)

作者落地的方式为:employ a simple gating mechanism with a sigmoid activation,再细化一点即 two fully connection,再具体一点,如下图所示,第一个 fc 降低 dimension,activation function 为 relu,第二个还原为原来的 dimension,activation function 为 sigmoid(借鉴 LSTM 中的门机制)

【SENet】《Squeeze-and-Excitation Networks》
公式如下:
【SENet】《Squeeze-and-Excitation Networks》
δ\delta 为 relu,σ\sigma 为 sigmoid
【SENet】《Squeeze-and-Excitation Networks》
上述公式的意义为,UU 的一个 channels 与 ss 的一个 dimension 相乘,相当于对 feature map 的加权!对应如下图 FscaleF_{scale} 部分!
【SENet】《Squeeze-and-Excitation Networks》
最后的输出
【SENet】《Squeeze-and-Excitation Networks》
总结一下

  • FsqF_{sq} global average pooling
  • FexF_{ex} two fully connection
  • FscaleF_{scale} feature map (UU) multiply channels weight (FexF_{ex} 的输出结果)

4.3 Exemplars: SEInception and SEResNet

【SENet】《Squeeze-and-Excitation Networks》

【SENet】《Squeeze-and-Excitation Networks》
左边正常的 inception,右边 SE-Inception

  • SE-ResNet:
    【SENet】《Squeeze-and-Excitation Networks》
    FtrF_{tr} 替换成 non-identity branch of a residual module

4.4 Model and Computational Complexity

trade-off between model complexity and performance

ResNet-50 SE-ResNet-50
GPU:training a mini-batch 256 images,8 TItan X 190 ms 209 ms
CPU:inference 164 ms 167ms

global pooling and inner product are less optimised in existing GPU libraries

额外的参数量如下:two FC layers of the gating mechanism
【SENet】《Squeeze-and-Excitation Networks》

  • s 为 stage
  • r 为 reduction ratio
  • NsN_s 为 repeated block number for stage s.
  • CsC_s 为 the dimension of the output channels,也即 number of channels

【SENet】《Squeeze-and-Excitation Networks》
看这个图就知道怎么计算了 ,某个 stage 中的一个 block 的计算量增加量为 CrC+CrC\frac{C}{r}*C+\frac{C}{r}*C,我们都知道,越后面的 stage,C 越大,增加的计算量也越大,作者实验表明,去掉后面 stage 的 SE 结构,效果不会降太多,但是计算量会增加的少一些!

5 Experiments

database

  • ImageNet 2012
  • COCO
  • Places 365-Challenge

rr :reduction ratio is 16

5.1 ImageNet Classification

【SENet】《Squeeze-and-Excitation Networks》
看 SENet 的小括号,加 SE 结构效果都有提升!看看下面训练和测试的 loss
【SENet】《Squeeze-and-Excitation Networks》
看看在轻量级网络上的表现
【SENet】《Squeeze-and-Excitation Networks》
看 table 2 和 table 3 SENet 小括号中的内容就说明了一起,强,有普适性, can be used in combination with a wide range of architectures.(residual or no residual)

华山论剑,一决雌雄
【SENet】《Squeeze-and-Excitation Networks》

5.2 Scene Classification

【SENet】《Squeeze-and-Excitation Networks》
providing evidence that SE blocks can perform well on different datasets

5.3 Object Detection on COCO

【SENet】《Squeeze-and-Excitation Networks》
基于 Faster R-CNN,猛猛猛

5.4 Analysis and Interpretation

1)Reduction ratio

rr :作者设置 reduction ratio 为 16,trade-off between model complexity and performance
【SENet】《Squeeze-and-Excitation Networks》

2)The role of Excitation

感觉是统计 SE block 在 squeeze 之后,excitation之前的 activation 情况,5 类,每类 50 个样本,average activations for fifty uniformly sampled channels

【SENet】《Squeeze-and-Excitation Networks》

作者有如下三个发现:

  • lower layer features are typically more general(例如(a),说明特征共享)
  • higher layer features have greater specificity(例如(c)、(d),不同类别的不同 channels**值不一样)
  • (e)中,activation 为1,也即类似于 identity 了, 所以在此处加不加 SE block 不是那么重要,不加的话还可以大量减少计算量,参考本博客 4.4 小节的分析!

小节

利用了 gate mechanism,有普适性,图 5 的关于特征的分析尤为重要,以及 reduction ratio(two fully connection 中)complexity 和 performance 的 trade off!

相关文章: