Tensorflow Object Detection API 中多个边界框预测的 IoU 是如何计算的？答案

【问题标题】：How is the IoU calculated for multiple bounding box predictions in Tensorflow Object Detection API?Tensorflow Object Detection API 中多个边界框预测的 IoU 是如何计算的？
【发布时间】：2020-01-06 08:51:55
【问题描述】：

Tensorflow Object Detection API 中多个边界框预测的 IoU 指标是如何计算的？

【问题讨论】：

标签： tensorflow object-detection

【解决方案1】：

对象周围的每个边界框与该对象的真实框都有一个 IoU（与联合的交集）。它是通过将预测边界框和实际正确（ground-truth 框）之间的公共区域（重叠）除以两个框的累积区域来计算的。在计算对象周围框的所有 IoU 后，选择具有最高 IoU 的框作为结果。 Here 解释得更好。

你也可以在this line之后打印IoU值。

【讨论】：

【解决方案2】：

不确定 TensorFlow 究竟是如何做到的，但这是我最近使用它的一种方法，因为我没有在网上找到好的解决方案。我使用 numpy 矩阵来获取 IoU，以及用于多目标检测的其他指标（TP、FP、TN、FN）。

让我们假设您的图像是 6x6。

import cv2

empty_array = np.zeros(36).reshape([6, 6])

array([[0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0.]])

您有 2 个对象的基本事实，一个在图像的左下角，一个较小的在右上角。

bbox_actual_obj1 = [[0, 3], [2, 5]] # top left coord & bottom right coord
bbox_actual_obj2 = [[4, 0], [5, 1]]

使用 OpenCV，您可以将这些对象添加到空图像数组的副本中。

actual = empty.copy()
actual = cv2.rectangle(
    actual,
    bbox_actual_obj1[0],
    bbox_actual_obj1[1],
    1,
    -1
)
actual = cv2.rectangle(
    actual,
    bbox_actual_obj2[0],
    bbox_actual_obj2[1],
    1,
    -1
)

array([[0., 0., 0., 0., 1., 1.],
       [0., 0., 0., 0., 1., 1.],
       [0., 0., 0., 0., 0., 0.],
       [1., 1., 1., 0., 0., 0.],
       [1., 1., 1., 0., 0., 0.],
       [1., 1., 1., 0., 0., 0.]])

现在假设下面是我们预测的边界框：

bbox_pred_obj1 = [[1, 3], [3, 5]] # top left coord & bottom right coord
bbox_pred_obj2 = [[3, 0], [5, 2]]

现在我们做和上面一样的事情，但是改变我们在数组中分配的值。

pred = empty.copy()
pred = cv2.rectangle(
    pred,
    bbox_person2_car1[0],
    bbox_person2_car1[1],
    2,
    -1
)
pred = cv2.rectangle(
    pred,
    bbox_person2_car2[0],
    bbox_person2_car2[1],
    2,
    -1
)

array([[0., 0., 0., 2., 2., 2.],
       [0., 0., 0., 2., 2., 2.],
       [0., 0., 0., 2., 2., 2.],
       [0., 2., 2., 2., 0., 0.],
       [0., 2., 2., 2., 0., 0.],
       [0., 2., 2., 2., 0., 0.]])

如果我们将这些数组转换为矩阵并相加，我们会得到以下结果

actual_matrix = np.matrix(actual)
pred_matrix = np.matrix(pred)
combined = actual_matrix + pred_matrix

matrix([[0., 0., 0., 2., 3., 3.],
        [0., 0., 0., 2., 3., 3.],
        [0., 0., 0., 2., 2., 2.],
        [1., 3., 3., 2., 0., 0.],
        [1., 3., 3., 2., 0., 0.],
        [1., 3., 3., 2., 0., 0.]])

现在我们需要做的就是计算组合矩阵中每个数字的数量，以获得 TP、FP、TN、FN 率。

combined = np.squeeze(
    np.asarray(
        pred_matrix + actual_matrix
    )
)
unique, counts = np.unique(combined, return_counts=True)
zipped = dict(zip(unique, counts))

{0.0: 15, 1.0: 3, 2.0: 8, 3.0: 10}

传说：

真阴性：0
假阴性：1
误报：2
真阳性/交叉点：3
联合：1 + 2 + 3

IoU：0.48 10/(3 + 8 + 10)
精度：0.56 10/(10 + 8)
召回：0.77 10/(10 + 3)
F1：0.6510/(10 + 0.5 * (3 + 8))

【讨论】：