分类报告：标签和目标名称答案

【问题标题】：classification_report: labels and target_names分类报告：标签和目标名称
【发布时间】：2018-07-07 16:19:10
【问题描述】：

我有以下分类报告的输出：

             precision    recall  f1-score   support

          0     0.6772    0.5214    0.5892       491
          1     0.8688    0.9273    0.8971      1678

avg / total     0.8254    0.8354    0.8274      2169

数据集中的真实标签是s 和p。

问题：我如何知道哪个标签是“0”，哪个是“1”？或者：如何按正确的顺序通过labels= 或target_names= 分配标签？

【问题讨论】：

标签： python scikit-learn classification

【解决方案1】：

如无特别说明，将按字母顺序排列。所以很可能是：

0 -> 'p'

1 -> 's'

无论如何，如果您传递实际标签，它们应该按原样显示。例如：

y_true = ['p', 's', 'p', 's', 'p']
y_pred = ['p', 'p', 's', 's', 'p']

print(classification_report(y_true, y_pred))

Output:
             precision    recall  f1-score   support

          p       0.67      0.67      0.67         3
          s       0.50      0.50      0.50         2

avg / total       0.60      0.60      0.60         5

所以不需要做任何事情。但是，如果您更改了标签，则可以将它们传递到 target_names 参数中以显示在报告中。

假设您已将 'p' 转换为 0 并将 's' 转换为 1，那么您的代码变为：

y_true = [0, 1, 0, 1, 0]
y_pred = [0, 0, 1, 1, 0]

# Without the target_names
print(classification_report(y_true, y_pred))

          0       0.67      0.67      0.67         3
          1       0.50      0.50      0.50         2

avg / total       0.60      0.60      0.60         5

#With target_names
print(classification_report(y_true, y_pred, target_names=['p', 's']))

          p       0.67      0.67      0.67         3
          s       0.50      0.50      0.50         2

avg / total       0.60      0.60      0.60         5

【讨论】：

对于那些感兴趣的人，如果没有提供标签，则使用这里的unique_labels 函数自动生成：github.com/scikit-learn/scikit-learn/blob/main/sklearn/utils/… target_names 如果覆盖应该从这个函数的返回值映射，确实排序返回之前的标签。

【解决方案2】：

如果您使用 sklearn.preprocess.LabelEncoder 编码原始标签，您可以使用 inverse_transform 获取原始标签

target_strings = label_encoder.inverse_transform(np.arange(num_classes))
metrics.classification_report(dev_gold, dev_predicted, target_names=target_strings)

【讨论】：

【解决方案3】：

您可以使用分类器的classes_ 属性来获取标签列表，它们是数组索引的。

classes_ : 形状数组 = [n_classes] 或此类数组的列表

类标签（单输出问题），或数组的列表类标签（多输出问题）。

【讨论】：