转载请注明出处:

https://www.cnblogs.com/darkknightzh/p/12150128.html

论文:

HYBRID COARSE-FINE CLASSIFICATION FOR HEAD POSE ESTIMATION

论文网址:

https://arxiv.org/abs/1901.06778

官方pytorch代码:

https://github.com/haofanwang/accurate-head-pose

 

该论文提出了coarse-fine的分类方式。

 

论文网络结构如下图所示。输入图像通过骨干网络得到特征后,分别连接到不同的fc层。这些fc层将输入特征映射到-99度至102度之内不同间隔的角度区间(间隔分别为1,3,11,33,99),而后通过softmax得到归一化特征,并分2支,一方面计算期望及期望与真值的MSE loss,另一方面计算交叉熵损失。而后求和,得到最终的损失。

1)     MSE lossdeep head pose中接近(区别是此处使用198个类别的分类结果计算期望,deep head pose使用66个类别)。

2)     其他角度区间(除198个类别的角度区间之外)只用于计算交叉熵损失(如下图所示)。

3)     不同角度区间的交叉熵损失权重不同。

4)     本文MSE损失的权重较大(为2

5)     训练时使用softmax计算概率。测试时使用带temperaturesoftmax计算概率(由于代码中T=1,实际上等效于softmax)。

6)     https://arxiv.org/abs/1503.02531可知,给定输入logit ${{z}_{i}}$,其softmax temperature的输出${{q}_{i}}$计算如下:

${{q}_{i}}=\frac{\exp ({{z}_{i}}/T)}{\sum\nolimits_{j}{\exp ({{z}_{j}}/T)}}$

其中Ttemperature。通常设置为1(即为softmax)。T越大,输出概率的差异性越小;T越小(越接近0),输出概率的差异性越大。

(原)人脸姿态时别HYBRID COARSE-FINE CLASSIFICATION FOR HEAD POSE ESTIMATION

因而,感觉上图变成下面这样,会更容易理解:

(原)人脸姿态时别HYBRID COARSE-FINE CLASSIFICATION FOR HEAD POSE ESTIMATION

本文损失函数如下:

$Loss=\alpha \centerdot MSE(y,{{y}^{*}})+\sum\limits_{i=1}^{num}{{{\beta }_{i}}\centerdot H({{y}_{i}},y_{i}^{*})}$

其中H代表交叉熵损失。${{\beta }_{i}}$为不同角度区间时交叉熵损失的权重(具体权重可参见代码)。

2. 代码

2.1 网络结构

 1 class Multinet(nn.Module):
 2     # Hopenet with 3 output layers for yaw, pitch and roll
 3     # Predicts Euler angles by binning and regression with the expected value
 4     def __init__(self, block, layers, num_bins):
 5         self.inplanes = 64
 6         super(Multinet, self).__init__()
 7         self.conv1 = nn.Conv2d(3, 64, kernel_size=7, stride=2, padding=3, bias=False)
 8         self.bn1 = nn.BatchNorm2d(64)
 9         self.relu = nn.ReLU(inplace=True)
10         self.maxpool = nn.MaxPool2d(kernel_size=3, stride=2, padding=1)
11         self.layer1 = self._make_layer(block, 64, layers[0])
12         self.layer2 = self._make_layer(block, 128, layers[1], stride=2)
13         self.layer3 = self._make_layer(block, 256, layers[2], stride=2)
14         self.layer4 = self._make_layer(block, 512, layers[3], stride=2)
15         self.avgpool = nn.AvgPool2d(7)    # 至此为Resnet的骨干网络
16         self.fc_yaw = nn.Linear(512 * block.expansion, num_bins)     #  和hopenet类似,只是num_bins=198
17         self.fc_pitch = nn.Linear(512 * block.expansion, num_bins)   #  和hopenet类似,只是num_bins=198
18         self.fc_roll = nn.Linear(512 * block.expansion, num_bins)    #  和hopenet类似,只是num_bins=198
19         
20         self.fc_yaw_1 = nn.Linear(512 * block.expansion, 66)   # 66和deep head pose一致
21         self.fc_yaw_2 = nn.Linear(512 * block.expansion, 18)   # 其他为新的fc层
22         self.fc_yaw_3 = nn.Linear(512 * block.expansion, 6)
23         self.fc_yaw_4 = nn.Linear(512 * block.expansion, 2)
24         
25         self.fc_pitch_1 = nn.Linear(512 * block.expansion, 66)
26         self.fc_pitch_2 = nn.Linear(512 * block.expansion, 18)
27         self.fc_pitch_3 = nn.Linear(512 * block.expansion, 6)
28         self.fc_pitch_4 = nn.Linear(512 * block.expansion, 2)
29         
30         self.fc_roll_1 = nn.Linear(512 * block.expansion, 66)
31         self.fc_roll_2 = nn.Linear(512 * block.expansion, 18)
32         self.fc_roll_3 = nn.Linear(512 * block.expansion, 6)
33         self.fc_roll_4 = nn.Linear(512 * block.expansion, 2)
34 
35         # Vestigial layer from previous experiments
36         self.fc_finetune = nn.Linear(512 * block.expansion + 3, 3)  # 未使用
37 
38         for m in self.modules():
39             if isinstance(m, nn.Conv2d):
40                 n = m.kernel_size[0] * m.kernel_size[1] * m.out_channels
41                 m.weight.data.normal_(0, math.sqrt(2. / n))
42             elif isinstance(m, nn.BatchNorm2d):
43                 m.weight.data.fill_(1)
44                 m.bias.data.zero_()
45 
46     def _make_layer(self, block, planes, blocks, stride=1):
47         downsample = None
48         if stride != 1 or self.inplanes != planes * block.expansion:
49             downsample = nn.Sequential(
50                 nn.Conv2d(self.inplanes, planes * block.expansion,
51                           kernel_size=1, stride=stride, bias=False),
52                 nn.BatchNorm2d(planes * block.expansion),
53             )
54 
55         layers = []
56         layers.append(block(self.inplanes, planes, stride, downsample))
57         self.inplanes = planes * block.expansion
58         for i in range(1, blocks):
59             layers.append(block(self.inplanes, planes))
60 
61         return nn.Sequential(*layers)
62 
63     def forward(self, x):
64         x = self.conv1(x)
65         x = self.bn1(x)
66         x = self.relu(x)
67         x = self.maxpool(x)
68 
69         x = self.layer1(x)
70         x = self.layer2(x)
71         x = self.layer3(x)
72         x = self.layer4(x)
73 
74         x = self.avgpool(x)
75         x = x.view(x.size(0), -1)  # 得到骨干网络的特征
76         pre_yaw = self.fc_yaw(x)     # 以下得到yaw、pitch、roll等的其他特征
77         pre_pitch = self.fc_pitch(x)
78         pre_roll = self.fc_roll(x)
79         
80         pre_yaw_1 = self.fc_yaw_1(x)
81         pre_pitch_1 = self.fc_pitch_1(x)
82         pre_roll_1 = self.fc_roll_1(x)
83         
84         pre_yaw_2 = self.fc_yaw_2(x)
85         pre_pitch_2 = self.fc_pitch_2(x)
86         pre_roll_2 = self.fc_roll_2(x)
87         
88         pre_yaw_3 = self.fc_yaw_3(x)
89         pre_pitch_3 = self.fc_pitch_3(x)
90         pre_roll_3 = self.fc_roll_3(x)
91         
92         pre_yaw_4 = self.fc_yaw_4(x)
93         pre_pitch_4 = self.fc_pitch_4(x)
94         pre_roll_4 = self.fc_roll_4(x)
95 
96         return pre_yaw,pre_yaw_1,pre_yaw_2,pre_yaw_3,pre_yaw_4, pre_pitch,pre_pitch_1,pre_pitch_2,pre_pitch_3,pre_pitch_4, pre_roll,pre_roll_1,pre_roll_2,pre_roll_3,pre_roll_4
View Code

相关文章:

  • 2021-06-24
  • 2021-08-15
  • 2021-05-12
  • 2021-04-12
  • 2021-06-29
  • 2021-10-17
  • 2021-12-29
  • 2021-11-14
猜你喜欢
  • 2021-12-03
  • 2021-10-13
  • 2021-10-10
  • 2022-12-23
  • 2021-07-19
  • 2021-07-11
  • 2021-06-27
相关资源
相似解决方案