【发布时间】:2021-12-22 11:49:33
【问题描述】:
我的管道运行良好,现在我想检查 top-k 准确性。我显然可以通过以困难的方式运行一个循环来做到这一点,但是我怎样才能使用给定的函数做同样的事情呢?
from sklearn.metrics import top_k_accuracy_score
# x and y can be any random feature and labels. Please assume
y = df_whole['target'].values.ravel() # get 1-D y labels currently in String format
set_y = set(y) # unique classes
class_int_mapping = dict(zip(set_y,range(len(set_y)))) # change car : 0, bus : 1 etc..
y = np.array([class_int_mapping[i] for i in y]) # array. List also works
x_train, x_test, y_train, y_test = train_test_split(x,y,test_size = 0.25,stratify = y)
当我训练和测试我的管道时,它会给出预期的结果。请假设任何分类管道。当我这样做时,
print(pipeline.predict_proba(x_train).shape, pipeline.predict_proba(x_test).shape)
>> (19794, 269) (6599, 269)
当我这样做时:
top_k_accuracy_score(y_test,pipeline.predict_proba(x_test), k = 5)
它给我的错误是:
ValueError: Number of classes in 'y_true' (255) not equal to the number of classes in 'y_score' (269).
这是怎么回事?
P.S.:目前,我的做法是:
probs = pipeline.predict_proba(x_test)
topn = np.argsort(probs, axis = 1)[:,-5:]
top_k_acc_result = np.mean(np.array([1 if y_test[k] in topn[k] else 0 for k in range(len(topn))]))
【问题讨论】:
标签: python numpy machine-learning scikit-learn classification