转载请注明出处:
https://www.cnblogs.com/darkknightzh/p/12150128.html
论文:
HYBRID COARSE-FINE CLASSIFICATION FOR HEAD POSE ESTIMATION
论文网址:
https://arxiv.org/abs/1901.06778
官方pytorch代码:
https://github.com/haofanwang/accurate-head-pose
该论文提出了coarse-fine的分类方式。
论文网络结构如下图所示。输入图像通过骨干网络得到特征后,分别连接到不同的fc层。这些fc层将输入特征映射到-99度至102度之内不同间隔的角度区间(间隔分别为1,3,11,33,99),而后通过softmax得到归一化特征,并分2支,一方面计算期望及期望与真值的MSE loss,另一方面计算交叉熵损失。而后求和,得到最终的损失。
1) MSE loss和deep head pose中接近(区别是此处使用198个类别的分类结果计算期望,deep head pose使用66个类别)。
2) 其他角度区间(除198个类别的角度区间之外)只用于计算交叉熵损失(如下图所示)。
3) 不同角度区间的交叉熵损失权重不同。
4) 本文MSE损失的权重较大(为2)
5) 训练时使用softmax计算概率。测试时使用带temperature的softmax计算概率(由于代码中T=1,实际上等效于softmax)。
6) 从https://arxiv.org/abs/1503.02531可知,给定输入logit ${{z}_{i}}$,其softmax temperature的输出${{q}_{i}}$计算如下:
${{q}_{i}}=\frac{\exp ({{z}_{i}}/T)}{\sum\nolimits_{j}{\exp ({{z}_{j}}/T)}}$
其中T为temperature。通常设置为1(即为softmax)。T越大,输出概率的差异性越小;T越小(越接近0),输出概率的差异性越大。
因而,感觉上图变成下面这样,会更容易理解:
本文损失函数如下:
$Loss=\alpha \centerdot MSE(y,{{y}^{*}})+\sum\limits_{i=1}^{num}{{{\beta }_{i}}\centerdot H({{y}_{i}},y_{i}^{*})}$
其中H代表交叉熵损失。${{\beta }_{i}}$为不同角度区间时交叉熵损失的权重(具体权重可参见代码)。
2. 代码
2.1 网络结构
1 class Multinet(nn.Module): 2 # Hopenet with 3 output layers for yaw, pitch and roll 3 # Predicts Euler angles by binning and regression with the expected value 4 def __init__(self, block, layers, num_bins): 5 self.inplanes = 64 6 super(Multinet, self).__init__() 7 self.conv1 = nn.Conv2d(3, 64, kernel_size=7, stride=2, padding=3, bias=False) 8 self.bn1 = nn.BatchNorm2d(64) 9 self.relu = nn.ReLU(inplace=True) 10 self.maxpool = nn.MaxPool2d(kernel_size=3, stride=2, padding=1) 11 self.layer1 = self._make_layer(block, 64, layers[0]) 12 self.layer2 = self._make_layer(block, 128, layers[1], stride=2) 13 self.layer3 = self._make_layer(block, 256, layers[2], stride=2) 14 self.layer4 = self._make_layer(block, 512, layers[3], stride=2) 15 self.avgpool = nn.AvgPool2d(7) # 至此为Resnet的骨干网络 16 self.fc_yaw = nn.Linear(512 * block.expansion, num_bins) # 和hopenet类似,只是num_bins=198 17 self.fc_pitch = nn.Linear(512 * block.expansion, num_bins) # 和hopenet类似,只是num_bins=198 18 self.fc_roll = nn.Linear(512 * block.expansion, num_bins) # 和hopenet类似,只是num_bins=198 19 20 self.fc_yaw_1 = nn.Linear(512 * block.expansion, 66) # 66和deep head pose一致 21 self.fc_yaw_2 = nn.Linear(512 * block.expansion, 18) # 其他为新的fc层 22 self.fc_yaw_3 = nn.Linear(512 * block.expansion, 6) 23 self.fc_yaw_4 = nn.Linear(512 * block.expansion, 2) 24 25 self.fc_pitch_1 = nn.Linear(512 * block.expansion, 66) 26 self.fc_pitch_2 = nn.Linear(512 * block.expansion, 18) 27 self.fc_pitch_3 = nn.Linear(512 * block.expansion, 6) 28 self.fc_pitch_4 = nn.Linear(512 * block.expansion, 2) 29 30 self.fc_roll_1 = nn.Linear(512 * block.expansion, 66) 31 self.fc_roll_2 = nn.Linear(512 * block.expansion, 18) 32 self.fc_roll_3 = nn.Linear(512 * block.expansion, 6) 33 self.fc_roll_4 = nn.Linear(512 * block.expansion, 2) 34 35 # Vestigial layer from previous experiments 36 self.fc_finetune = nn.Linear(512 * block.expansion + 3, 3) # 未使用 37 38 for m in self.modules(): 39 if isinstance(m, nn.Conv2d): 40 n = m.kernel_size[0] * m.kernel_size[1] * m.out_channels 41 m.weight.data.normal_(0, math.sqrt(2. / n)) 42 elif isinstance(m, nn.BatchNorm2d): 43 m.weight.data.fill_(1) 44 m.bias.data.zero_() 45 46 def _make_layer(self, block, planes, blocks, stride=1): 47 downsample = None 48 if stride != 1 or self.inplanes != planes * block.expansion: 49 downsample = nn.Sequential( 50 nn.Conv2d(self.inplanes, planes * block.expansion, 51 kernel_size=1, stride=stride, bias=False), 52 nn.BatchNorm2d(planes * block.expansion), 53 ) 54 55 layers = [] 56 layers.append(block(self.inplanes, planes, stride, downsample)) 57 self.inplanes = planes * block.expansion 58 for i in range(1, blocks): 59 layers.append(block(self.inplanes, planes)) 60 61 return nn.Sequential(*layers) 62 63 def forward(self, x): 64 x = self.conv1(x) 65 x = self.bn1(x) 66 x = self.relu(x) 67 x = self.maxpool(x) 68 69 x = self.layer1(x) 70 x = self.layer2(x) 71 x = self.layer3(x) 72 x = self.layer4(x) 73 74 x = self.avgpool(x) 75 x = x.view(x.size(0), -1) # 得到骨干网络的特征 76 pre_yaw = self.fc_yaw(x) # 以下得到yaw、pitch、roll等的其他特征 77 pre_pitch = self.fc_pitch(x) 78 pre_roll = self.fc_roll(x) 79 80 pre_yaw_1 = self.fc_yaw_1(x) 81 pre_pitch_1 = self.fc_pitch_1(x) 82 pre_roll_1 = self.fc_roll_1(x) 83 84 pre_yaw_2 = self.fc_yaw_2(x) 85 pre_pitch_2 = self.fc_pitch_2(x) 86 pre_roll_2 = self.fc_roll_2(x) 87 88 pre_yaw_3 = self.fc_yaw_3(x) 89 pre_pitch_3 = self.fc_pitch_3(x) 90 pre_roll_3 = self.fc_roll_3(x) 91 92 pre_yaw_4 = self.fc_yaw_4(x) 93 pre_pitch_4 = self.fc_pitch_4(x) 94 pre_roll_4 = self.fc_roll_4(x) 95 96 return pre_yaw,pre_yaw_1,pre_yaw_2,pre_yaw_3,pre_yaw_4, pre_pitch,pre_pitch_1,pre_pitch_2,pre_pitch_3,pre_pitch_4, pre_roll,pre_roll_1,pre_roll_2,pre_roll_3,pre_roll_4