【问题标题】:top_k_accuracy_score() giving shape mismatch error: Number of classes in 'y_true' (255) not equal to the number of classes in 'y_score' (269)top_k_accuracy_score() 给出形状不匹配错误:“y_true”中的类数(255)不等于“y_score”中的类数(269)
【发布时间】:2021-12-22 11:49:33
【问题描述】:

我的管道运行良好,现在我想检查 top-k 准确性。我显然可以通过以困难的方式运行一个循环来做到这一点,但是我怎样才能使用给定的函数做同样的事情呢?

from sklearn.metrics import top_k_accuracy_score

# x and y can be any random feature and labels. Please assume

y = df_whole['target'].values.ravel() # get 1-D y labels currently in String format

set_y = set(y) # unique classes
class_int_mapping = dict(zip(set_y,range(len(set_y)))) # change car : 0, bus : 1 etc..

y = np.array([class_int_mapping[i] for i in y]) # array. List also works

x_train, x_test, y_train, y_test = train_test_split(x,y,test_size = 0.25,stratify = y)

当我训练和测试我的管道时,它会给出预期的结果。请假设任何分类管道。当我这样做时,

print(pipeline.predict_proba(x_train).shape, pipeline.predict_proba(x_test).shape)

>> (19794, 269) (6599, 269)

当我这样做时:

top_k_accuracy_score(y_test,pipeline.predict_proba(x_test), k = 5)

它给我的错误是:

ValueError: Number of classes in 'y_true' (255) not equal to the number of classes in 'y_score' (269).

这是怎么回事?

P.S.:目前,我的做法是:

probs = pipeline.predict_proba(x_test)
topn = np.argsort(probs, axis = 1)[:,-5:]

top_k_acc_result = np.mean(np.array([1 if y_test[k] in topn[k] else 0 for k in range(len(topn))]))

【问题讨论】:

    标签: python numpy machine-learning scikit-learn classification


    【解决方案1】:

    您的预测中缺少一些标签,因此概率中的列数和类别数不相符。您可以使用top_k_accuracy_score(..,labels=) 提供标签

    例如:

    from sklearn.datasets import make_classification
    from sklearn.ensemble import RandomForestClassifier
    from sklearn.metrics import top_k_accuracy_score
    from sklearn.model_selection import train_test_split
    
    X, Y = make_classification(n_samples=500,n_classes=6,n_informative=7,random_state=33)
    
    x_train, x_test, y_train, y_test = train_test_split(X,Y,test_size = 0.25,stratify = Y)
    
    clf = RandomForestClassifier()
    clf.fit(x_train,y_train)
    

    如果我们这样做,效果会很好:

    top_k_accuracy_score(y_test,clf.predict_proba(x_test), k = 2)
    

    如果由于某种原因我们在预测中缺少第 5 类,则会引发错误:

    ix = y_test != 5
    top_k_accuracy_score(y_test[ix],clf.predict_proba(x_test[ix,:]), k = 2)
    

    您可以提供标签:

    top_k_accuracy_score(Y[ix],clf.predict_proba(X[ix,:]), k = 2,labels=np.unique(Y))
    

    【讨论】:

      猜你喜欢
      • 2021-03-27
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2017-03-07
      • 2020-11-18
      • 1970-01-01
      • 2015-11-06
      相关资源
      最近更新 更多