高分类指标结果答案

【问题标题】：High classification metric results高分类指标结果
【发布时间】：2020-11-08 21:15:19
【问题描述】：

我正在尝试使用机器学习来识别作物类型。这是一个像素级的分类。我有 16 个类（目标），这是我的训练和测试数据集的形状：

X_train, X_test, Y_train, Y_test=train_test_split(Features, Labels, test_size=0.25)
X_train.shape, X_test.shape, Y_train.shape, Y_test.shape
#((48330, 420), (16110, 420), (48330,), (16110,))

我想先用基线模型进行实验，所以我做了以下操作：

classifier=RandomForestClassifier()
classifier.fit(X_train, Y_train)
y_pred = classifier.predict(X_test)

print(confusion_matrix(Y_test,y_pred))
print(classification_report(Y_test,y_pred))
print(accuracy_score(Y_test, y_pred))

这是最终结果：

我不确定这里发生了什么，为什么我有这么高的指标？ PS：我的数据集高度不平衡。

【问题讨论】：

标签： python machine-learning random-forest multiclass-classification

【解决方案1】：

您的数据集不平衡。尝试先修复它，然后使用超参数调整。

【讨论】：

【解决方案2】：

您可能会查看您的训练和测试数据，很可能您的数据没有按照您想要的方式排列。

【讨论】：

你能详细说明吗？ span>
您确定您的训练和测试数据集之间没有重叠吗？

【解决方案3】：

为什么不从分类器中较少的树数开始，将 max_depth 设置为 2 或 3？这应该是一个很好的起点。如果它仍然做同样的事情，那么进一步简化模型。

【讨论】：