top_k_accuracy_score() 给出形状不匹配错误：“y_true”中的类数（255）不等于“y_score”中的类数（269）答案

【问题标题】：top_k_accuracy_score() giving shape mismatch error: Number of classes in 'y_true' (255) not equal to the number of classes in 'y_score' (269)top_k_accuracy_score() 给出形状不匹配错误：“y_true”中的类数（255）不等于“y_score”中的类数（269）
【发布时间】：2021-12-22 11:49:33
【问题描述】：

我的管道运行良好，现在我想检查 top-k 准确性。我显然可以通过以困难的方式运行一个循环来做到这一点，但是我怎样才能使用给定的函数做同样的事情呢？

from sklearn.metrics import top_k_accuracy_score

# x and y can be any random feature and labels. Please assume

y = df_whole['target'].values.ravel() # get 1-D y labels currently in String format

set_y = set(y) # unique classes
class_int_mapping = dict(zip(set_y,range(len(set_y)))) # change car : 0, bus : 1 etc..

y = np.array([class_int_mapping[i] for i in y]) # array. List also works

x_train, x_test, y_train, y_test = train_test_split(x,y,test_size = 0.25,stratify = y)

当我训练和测试我的管道时，它会给出预期的结果。请假设任何分类管道。当我这样做时，

print(pipeline.predict_proba(x_train).shape, pipeline.predict_proba(x_test).shape)

>> (19794, 269) (6599, 269)

当我这样做时：

top_k_accuracy_score(y_test,pipeline.predict_proba(x_test), k = 5)

它给我的错误是：

ValueError: Number of classes in 'y_true' (255) not equal to the number of classes in 'y_score' (269).

这是怎么回事？

P.S.：目前，我的做法是：

probs = pipeline.predict_proba(x_test)
topn = np.argsort(probs, axis = 1)[:,-5:]

top_k_acc_result = np.mean(np.array([1 if y_test[k] in topn[k] else 0 for k in range(len(topn))]))

【问题讨论】：

标签： python numpy machine-learning scikit-learn classification

【解决方案1】：

您的预测中缺少一些标签，因此概率中的列数和类别数不相符。您可以使用top_k_accuracy_score(..,labels=) 提供标签

例如：

from sklearn.datasets import make_classification
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import top_k_accuracy_score
from sklearn.model_selection import train_test_split

X, Y = make_classification(n_samples=500,n_classes=6,n_informative=7,random_state=33)

x_train, x_test, y_train, y_test = train_test_split(X,Y,test_size = 0.25,stratify = Y)

clf = RandomForestClassifier()
clf.fit(x_train,y_train)

如果我们这样做，效果会很好：

top_k_accuracy_score(y_test,clf.predict_proba(x_test), k = 2)

如果由于某种原因我们在预测中缺少第 5 类，则会引发错误：

ix = y_test != 5
top_k_accuracy_score(y_test[ix],clf.predict_proba(x_test[ix,:]), k = 2)

您可以提供标签：

top_k_accuracy_score(Y[ix],clf.predict_proba(X[ix,:]), k = 2,labels=np.unique(Y))

【讨论】：