【问题标题】:sklearn cross_val_score() returns NaN values when I use "r2" as scoring当我使用“r2”作为评分时,sklearn cross_val_score() 返回 NaN 值
【发布时间】:2021-03-21 17:16:21
【问题描述】:

我正在尝试使用 sklearn cross_val_score()。以下是我尝试过的示例:

# loocv evaluate random forest on the housing dataset
from numpy import mean
from numpy import std
from numpy import absolute
from pandas import read_csv
from sklearn.model_selection import LeaveOneOut
from sklearn.model_selection import cross_val_score
from sklearn.ensemble import RandomForestRegressor

# load dataset
url = 'https://raw.githubusercontent.com/jbrownlee/Datasets/master/housing.csv'
dataframe = read_csv(url, header=None)
data = dataframe.values
# split into inputs and outputs
X, y = data[:, :-1], data[:, -1]
print(X.shape, y.shape)

# create loocv procedure
cv = LeaveOneOut()
# create model
model = RandomForestRegressor(random_state=1)

# evaluate model
scores = cross_val_score(model, X, y, scoring='neg_mean_absolute_error', cv=cv, n_jobs=-1)
# force positive
scores = absolute(scores)
# report performance
print('MAE: %.3f (%.3f)' % (mean(scores), std(scores)))

上面的代码运行良好,没有任何问题。但是,当我将scoring 更改为r2 时,scores 中的所有值都将变为nan

【问题讨论】:

    标签: scikit-learn regression nan cross-validation


    【解决方案1】:

    问题是使用LeaveOneOut()r2 作为评分函数。 LeaveOneOut() 将以这样一种方式拆分数据,即仅一个样本用于测试,其余样本用于训练。问题来了,当您使用以下公式在验证集上计算 r2 时:

    由于n=1(只有一个样本要验证),所以分母变为零,所以y_bar = y_i 因为平均值等于您拥有的一个数字,这导致您观察到的nan。如果您的cv = No. of data points 如下图,则必然会发生这种情况:

    # evaluate model
    scores = cross_val_score(model, X[0:10], y[0:10], scoring='r2', cv=10, n_jobs=-1)
    # force positive
    scores = absolute(scores)
    # report performance
    print('MAE: %.3f (%.3f)' % (mean(scores), std(scores)))
    MAE: nan (nan)
    

    现在,当我为 n 设置其他值时,它可以正常工作:

    # evaluate model
    scores = cross_val_score(model, X[0:10], y[0:10], scoring='r2', cv=3, n_jobs=-1)
    # force positive
    scores = absolute(scores)
    # report performance
    print('MAE: %.3f (%.3f)' % (mean(scores), std(scores)))
    MAE: 0.662 (0.229)
    

    【讨论】:

      猜你喜欢
      • 2020-05-27
      • 2016-06-22
      • 2021-10-08
      • 2019-05-19
      • 2021-03-02
      • 2017-07-14
      • 2019-10-11
      • 2021-01-30
      • 2020-07-09
      相关资源
      最近更新 更多