End-to-end comparative attention networks for person re-identification

该论文中简单的总结一下关于attention的部分

整体网络架构如下：

End-to-end comparative attention networks for person re-identification
CNN部分采用截断的预训练VGG Net，去除最后三层全连接层，得到特征X
H为网络的输出向量（具体如何得到，后文有解释）
注意网络分支共享参数
使用端到端的学习方式，优化下面提到的多任务loss函数
多重任务loss函数：
三元loss函数：
End-to-end comparative attention networks for person re-identification

分类的softmax loss函数：

多任务loss函数：

Attention组件部分：

单个时间步的LSTM单元：
End-to-end comparative attention networks for person re-identification
X_t表示时间步为t的特征图，但是每一个时间步的X_t采用的是相同的特征图X，即从CNN中直接得到的特征图
l_(t-1)表示根据h_(t-1)产生的attention map，W_(i,h)表示权值参数，和LSTM的参数采用端到端的训练方式一起学习。
End-to-end comparative attention networks for person re-identification
上下文向量：

包含多个时间步的LSTM网络的attention组件：

（其中h_0和c_0采用两层感知机预初始化）

Concatenation层选取m个隐状态h_i整合

由于整个网络比较复杂，loss函数波动较大，最后对R采用L2规则化

End-to-end comparative attention networks for person re-identification

目前不理解的部分：

1）既然网络分支共享参数，那么如何解决三张图片关注的区域物理位置不同？（根据文章中的图片，没有这样的问题，应该是我没有理解）

感谢 Liu H, Feng J, Qi M, et al. End-to-end comparative attention networks for person re-identification[J]. IEEE Transactions on Image Processing, 2017, 26(7): 3492-3506.