再现 dlib frontal_face_detector() 训练答案

【问题标题】：reproducing dlib frontal_face_detector() training再现 dlib frontal_face_detector() 训练
【发布时间】：2017-12-02 16:41:23
【问题描述】：

我正在尝试重现 dlib 的 frontal_face_detector() 的训练过程。我正在使用相同的数据集（来自 http://dlib.net/files/data/dlib_face_detector_training_data.tar.gz) 正如 dlib 所说，他们使用了正面和侧面的结合 + 它们的反射。

我的问题是： 1. 整个数据集的内存使用率非常高（30+Gb） 2. 对部分数据集的训练不会产生很高的召回率，与 frontal_face_detector 的 80-90 相比，召回率是 50-60%（对未用于训练的图像子集进行测试）。 3. 检测器在低分辨率图像上表现不佳，因此无法检测深度超过 1-1.5 米的人脸。 4. 训练运行时间随着 SVM 的 C 参数显着增加，我必须增加这个参数才能达到更好的召回率（我怀疑这只是过度拟合的产物）

我最初的训练动机是一种。获得适应安装摄像机的特定环境的能力，例如硬负挖掘。湾。通过将 80x80 窗口减小到 64x64 甚至 48x48 来提高深度检测 + 运行时间。

我在正确的道路上吗？我想念什么吗？请帮忙...

【问题讨论】：

标签： face-detection dlib

【解决方案1】：

所使用的训练参数记录在 dlib 代码http://dlib.net/dlib/image_processing/frontal_face_detector.h.html 的注释中。供参考：

        It is built out of 5 HOG filters. A front looking, left looking, right looking, 
    front looking but rotated left, and finally a front looking but rotated right one.

    Moreover, here is the training log and parameters used to generate the filters:
    The front detector:
        trained on mirrored set of labeled_faces_in_the_wild/frontal_faces.xml
        upsampled each image by 2:1
        used pyramid_down<6> 
        loss per missed target: 1
        epsilon: 0.05
        padding: 0
        detection window size: 80 80
        C: 700
        nuclear norm regularizer: 9
        cell_size: 8
        num filters: 78
        num images: 4748
        Train detector (precision,recall,AP): 0.999793 0.895517 0.895368 
        singular value threshold: 0.15

    The left detector:
        trained on labeled_faces_in_the_wild/left_faces.xml
        upsampled each image by 2:1
        used pyramid_down<6> 
        loss per missed target: 2
        epsilon: 0.05
        padding: 0
        detection window size: 80 80
        C: 250
        nuclear norm regularizer: 8
        cell_size: 8
        num filters: 63
        num images: 493
        Train detector (precision,recall,AP): 0.991803  0.86019 0.859486 
        singular value threshold: 0.15

    The right detector:
        trained left-right flip of labeled_faces_in_the_wild/left_faces.xml
        upsampled each image by 2:1
        used pyramid_down<6> 
        loss per missed target: 2
        epsilon: 0.05
        padding: 0
        detection window size: 80 80
        C: 250
        nuclear norm regularizer: 8
        cell_size: 8
        num filters: 66
        num images: 493
        Train detector (precision,recall,AP): 0.991781  0.85782 0.857341 
        singular value threshold: 0.19

    The front-rotate-left detector:
        trained on mirrored set of labeled_faces_in_the_wild/frontal_faces.xml
        upsampled each image by 2:1
        used pyramid_down<6> 
        rotated left 27 degrees
        loss per missed target: 1
        epsilon: 0.05
        padding: 0
        detection window size: 80 80
        C: 700
        nuclear norm regularizer: 9
        cell_size: 8
        num images: 4748
        singular value threshold: 0.12

    The front-rotate-right detector:
        trained on mirrored set of labeled_faces_in_the_wild/frontal_faces.xml
        upsampled each image by 2:1
        used pyramid_down<6> 
        rotated right 27 degrees
        loss per missed target: 1
        epsilon: 0.05
        padding: 0
        detection window size: 80 80
        C: 700
        nuclear norm regularizer: 9
        cell_size: 8
        num filters: 89
        num images: 4748
        Train detector (precision,recall,AP):        1 0.897369 0.897369 
        singular value threshold: 0.15

dlib 文档中解释了参数是什么以及如何设置它们。还有一篇论文描述了训练算法：Max-Margin Object Detection。

是的，运行训练器可能需要大量 RAM。

【讨论】：

在哪里可以找到要下载的数据集和 XML？
这是您发布的网址。
找到了，谢谢。关于向左旋转、向右旋转的版本：它们是否被增强，即人工从正面计算，以及如何？
我想知道它是简单的平面内旋转，还是具有透视变化的投影变换（围绕图像 y 轴）。我以为它在平面内，但不确定。
正平面旋转。