人脸对齐（十五）--PIFA with a Single CNN

Pose-Invariant Face Alignment with a Single CNN

4.3 FPS on a Titan X GPU

本文是解决 large-pose face alignment (LPFA)的，所谓的 large face poses 如 profile views with ±90 度 yaw angles
针对大姿态的人脸对齐问题，目前主流的方法是采用 a cascade of CNN regressors 结合不同类型的回归设计和特征提取方法。
目前 the cascade of CNNs 在大姿态的人脸对齐问题上主要存在以下三个问题：
1） Lack of end-to-end training 目前存在的方法， at each cascade stage CNN网络通常是独立训练的。有的甚至是每个阶段使用多个独立的CNN网络，例如不同的特征点的定位使用不同的CNN网络，然后再综合起来。 these CNNs can not be jointly optimized and might lead to a sub-optimal solution
2） Hand-crafted feature extraction 因为后一个阶段的CNN的输入依赖于前一个阶段的 CNN输出，导致每个CNN只能是 shallow CNNs，所以 this framework can not extract deep features
3） Slow training speed 每个CNN独立训练，导致整个网络的训练很慢

为了解决上述问题，我们提出了一个 visualization layer
人脸对齐（十五）--PIFA with a Single CNN

proposed CNN architecture
人脸对齐（十五）--PIFA with a Single CNN

1. 3D and 2D Face Shapes
这里基本的思路是人脸实际上是一个3D 物体，图像中的人脸是一个 2D shape，这个 2D shape 对应着一个 3D shape ，我们希望通过特征点的对应来得到 2D shape 和 3D shape 的对应关系中的参数，最终我们是通过 CNN网络来学习这些对应参数的

3D face model
人脸对齐（十五）--PIFA with a Single CNN

人脸对齐（十五）--PIFA with a Single CNN

visualization block
人脸对齐（十五）--PIFA with a Single CNN

人脸对齐（十五）--PIFA with a Single CNN

2. Visualization Layer
我们这里使用 Z-Buffering 中的 z coordinate of surface normals of each vertex, transformed with the pose. It is an indicator of “frontability” of a vertex, i.e., the amount that the surface normalispointingtowardsthecamera. Thisquantityisused to assign an intensity value at its projected 2D location to construct the visualization image

人脸对齐（十五）--PIFA with a Single CNN

Experimental Results

人脸对齐（十五）--PIFA with a Single CNN

The testing speed of proposed method is 4.3 FPS on a Titan X GPU. It is much faster than the 0.6 FPS speed of [LPFA] and is similar to the 4 FPS speed of [40]