【distill.&transfer】Deep Face Recognition Model Compression via Knowledge Transfer and Distillation

作者的动机：

本篇文章提出了一种对人脸识别模型压缩的方法：基于student-teacher paradigm 的脸部识别运用；模型加速是通过降低输入图像的精度，使用相同的网络结构，从而使模型参数不减少，由于图像尺寸减少，数据储存空间也随之减少，整体架构图：

作者提出的方法：

作者使用了三种方法来提升输入低精度图像模型的性能：

1）via knowledge transfer（KT）

2）via knowledge distillation （KD）

3）via their combination

KD paradigm:在全局尺化层抽取特征，进行特征匹配，使用两个loss函数，一个loss函数是 normal classification loss，另一个是feature matching loss。

KT paradigm：使用teacher model 参数初始化student model，使用分类loss进行模型训练；

combine KD and KT：combine above two paradigm；

文中使用的loss函数：

teacher网络使用的loss函数：

【distill.&transfer】Deep Face Recognition Model Compression via Knowledge Transfer and Distillation

student网络使用的loss函数：

【distill.&transfer】Deep Face Recognition Model Compression via Knowledge Transfer and Distillation

特征匹配使用的loss函数为：

Ls = （2）式+（3）式

【distill.&transfer】Deep Face Recognition Model Compression via Knowledge Transfer and Distillation

作者对这loss进行组合使用：

【distill.&transfer】Deep Face Recognition Model Compression via Knowledge Transfer and Distillation

作者使用了两种网络结构进行了模型训练：Inception-BN，100-layer deep residual architecture;

作者使用的训练样本：MS1M 测试数据集：LFW，IJB-C

自己的结论：该方法更加适用于提升一个模型的性能；