使用 xgboost/预测单个数据点将 id/index 与预测匹配答案

【问题标题】：Match id/index to prediction with xgboost/ predict individual datatpoints使用 xgboost/预测单个数据点将 id/index 与预测匹配
【发布时间】：2021-06-06 07:27:40
【问题描述】：

我一直在尝试构建一个数据框，其中有一列包含来自模型的预测值，但没有成功。

为了一个简单的例子，我将使用 iris 数据集：

import numpy as np
import pandas as pd
from sklearn.datasets import load_iris

iris = load_iris()
df = pd.DataFrame(np.concatenate((iris.data, np.array([iris.target]).T), axis=1), columns=iris.feature_names + ['target'])
df.head()

这将输出：

对于构建模型的后续步骤，我将拥有

# Get the x and y for the experiment
X = df.drop('target', 1).values
y = df["target"].values

from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25, random_state=1)

#Create an XGB classifier and instance of the same
from xgboost import XGBClassifier
clf = XGBClassifier()
clf.fit(X_train, y_train)
y_pred = clf.predict(X_test)

此时我被阻止了。我查看了一些关于如何检索单个数据点的索引/ID（每行都是一个数据点）的帖子，但没有成功。

无论如何，我可以将预测与每一行相匹配吗？或者作为替代方案，测试各个行以便我知道它们的预测结果？

【问题讨论】：

标签： python-3.x pandas dataframe scikit-learn xgboost

【解决方案1】：

这样做的一种简单方法是将您的X 和y 保留为数据框（即删除.values）：

X = df.drop('target', 1)
y = df["target"]
# rest of your code as is

因此，在运行其余代码后，即拟合模型并获得预测 y_pred，您可以将 target 和 prediction 列添加回您的 X_test（现在是一个数据框）：

X_test = X_test.assign(target = y_test.values)
X_test = X_test.assign(prediction = y_pred)

print(X_test.head())
# result:
     sepal length (cm)  sepal width (cm)  ...  target  prediction
14                 5.8               4.0  ...     0.0         0.0
98                 5.1               2.5  ...     1.0         1.0
75                 6.6               3.0  ...     1.0         1.0
16                 5.4               3.9  ...     0.0         0.0
131                7.9               3.8  ...     2.0         2.0

[5 rows x 6 columns]

【讨论】：