使用逻辑回归模型获得 1 而不是 [0;1] 的预测概率答案

【问题标题】：Get predictions probability of 1 rather than [0;1] using Logistic Regression model使用逻辑回归模型获得 1 而不是 [0;1] 的预测概率
【发布时间】：2021-10-30 13:43:39
【问题描述】：

我在训练集和测试集上拟合了LogisticRegression，准确率约为 80%

然后我想对测试集进行预测，根据answered_correctly 是否为每个student_id 给出分数[1 表示是，0 表示否]。

我这样做了：

features_X = X.columns # getting columns names of X 

# X_test is an array created from a previous train_test_split step.
test_df = pd.DataFrame(columns=features_X, data=X_test)

predictions = grid_logit.predict(test_df[features_X])
#Create a  DataFrame with predictions
submission = pd.DataFrame({'Id':test_df['student_id'],'Answered_correctly':predictions})

#Visualize the first 5 rows
submission.head()

Id           Answered_correctly
12992348        0
7268428         0
9497321         1 
588792          1
5045118         1

如您所见，它将每个用户分类在 0 和 1 之间。

我想要的是这样的：

Id            Answered_correctly
12992348            0.32
7268428             0.52
9497321             0.65

answered_correctly_values 对应于属于第 1 类的概率。

注意：使用predict_probafunction 返回错误：

Exception: Data must be 1-dimensional

编辑：我用predict_proba(test_df[[features_X]]) 替换了predict 但它返回一个错误：None of [[ features_X cols]] are in the [columns]

【问题讨论】：

predict_proba 为您提供所需的内容。您没有显示您尝试调用它的代码
@krisograbek 我没有显示它，因为我只是在上面的代码中替换了它。 grid_logit.predict_proba

标签： python machine-learning scikit-learn classification logistic-regression

【解决方案1】：

predict_proba 返回每个类的概率估计值。假设您有两个类（0 和 1），它将返回一个形状为 (n_samples, 2) 的数组。

错误消息来自 pandas 数据框，因为它要求您仅传递一维数据。如上所述，predictions 只是一个二维输出。

仅将第 1 类 (predictions[:, 1]) 的概率估计值传递给数据框构造函数，它应该可以正常工作：

submission = pd.DataFrame({'Id': test_df['student_id'], 'Answered_correctly': predictions[:, 1]})

补充说明：

如果test_df 具有features_X 给出的所有列，则不需要传递test_df[features_X]，因为test_df 应该足够了：

predictions = grid_logit.predict_proba(test_df)

【讨论】：