【发布时间】:2022-01-25 09:02:26
【问题描述】:
我正在使用下面示例中概述的 xgboost 多类分类器。对于 X_test 数据框中的每一行,模型输出一个列表,其中列表元素是对应于每个类别“a”、“b”、“c”或“d”的概率,例如[0.44767836 0.2043365 0.15775423 0.19023092].
如何判断列表中的哪个元素对应于哪个类/类别(a、b、c 或 d)?我的目标是在数据框 a、b、c、d 上创建 4 个额外的列,并将匹配概率作为每列中的行值。
import numpy as np
import pandas as pd
import xgboost as xgb
import random
from sklearn import preprocessing
from sklearn.model_selection import train_test_split
#Create Example Data
np.random.seed(312)
data = np.random.random((10000, 3))
y = [random.choice('abcd') for _ in range(data.shape[0])]
features = ["x1", "x2", "x3"]
df = pd.DataFrame(data=data, columns=features)
df['y']=y
#Encode target variable
labelencoder = preprocessing.LabelEncoder()
df['y_target'] = labelencoder.fit_transform(df['y'])
#Train Test Split
X_train, X_test, y_train, y_test = train_test_split(df[features], df['y_target'], test_size=0.2, random_state=42, stratify=y)
#Train Model
dtrain = xgb.DMatrix(X_train, label=y_train)
dtest = xgb.DMatrix(X_test, label=y_test)
param = { 'objective':'multi:softprob',
'random_state': 20,
'tree_method': 'gpu_hist',
'num_class':4
}
xgb_model = xgb.train(param, dtrain, 100)
predictions=xgb_model.predict(dtest)
print(predictions)
【问题讨论】:
标签: python scikit-learn xgboost multiclass-classification