【问题标题】:Error in using accuracy_score from sklearn in Logistic Regression在 Logistic 回归中使用 sklearn 的 accuracy_score 时出错
【发布时间】:2021-12-24 02:35:09
【问题描述】:

我正在使用 Elastic Net 正则化方法进行逻辑回归。我试图预测哪些变量是正相关或负相关的。运行 accuracy_score(y_true,y_pred) 后出现错误,但出现错误:“ValueError: Found input variables with contrast numbers of samples: [9076, 9075]”。数据框的大小为 18151 obs。如何修复错误?是不是当我在 50% 时进行 train_test_split 时,我得到一个奇数子样本和一个偶数子样本?

X2=df.iloc[:,23:41]
y2=df["diab_inc"].values.reshape(-1,1)
X2_train,X2_test,y2_train,y2_test=train_test_split(X2,y2,test_size=0.5,random_state=1234)

print (len(X2_train),len(X2_test),len(y2_train),len(y2_test))
[9075 9076 9075 9076]

l1_ratio=(.001,.005,.01,.05,.1,.3,.5,.7,.9,1)
select=SelectFromModel(LogisticRegressionCV(cv=5, penalty='elasticnet', solver="saga", l1_ratios=l1_ratio, max_iter=10000)).fit(X2_train, y2_train)
print("Accuracy {0:2%}".format(accuracy_score(y2_test,select.estimator_.predict(X2_train))))

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
----> 1 print("Accuracy {0:2%}".format(accuracy_score(y2_test,select.estimator_.predict(X2_train))))

/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/sklearn/utils/validation.py in inner_f(*args, **kwargs)
     61             extra_args = len(args) - len(all_args)
     62             if extra_args <= 0:
---> 63                 return f(*args, **kwargs)
     64 
     65             # extra_args > 0

/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/sklearn/metrics/_classification.py in accuracy_score(y_true, y_pred, normalize, sample_weight)
    200 
    201     # Compute accuracy for each possible representation
--> 202     y_type, y_true, y_pred = _check_targets(y_true, y_pred)
    203     check_consistent_length(y_true, y_pred, sample_weight)
    204     if y_type.startswith('multilabel'):

/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/sklearn/metrics/_classification.py in _check_targets(y_true, y_pred)
     81     y_pred : array or indicator matrix
     82     """
---> 83     check_consistent_length(y_true, y_pred)
     84     type_true = type_of_target(y_true)
     85     type_pred = type_of_target(y_pred)

/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/sklearn/utils/validation.py in check_consistent_length(*arrays)
    317     uniques = np.unique(lengths)
    318     if len(uniques) > 1:
--> 319         raise ValueError("Found input variables with inconsistent numbers of"
    320                          " samples: %r" % [int(l) for l in lengths])
    321 

ValueError: Found input variables with inconsistent numbers of samples: [9076, 9075]

【问题讨论】:

    标签: python logistic-regression train-test-split


    【解决方案1】:

    您要做的是对 X2_test 数据进行预测,并将其与基本事实 y2_test 进行比较。目前,您正在使用训练数据进行预测。训练数据和测试数据的大小不同,因为您的完整数据集有奇数行并且您将其拆分 50%,因此会出现错误。

    accuracy_score(y2_test,select.estimator_.predict(X2_test))
    

    【讨论】:

      猜你喜欢
      • 2020-02-04
      • 2021-08-03
      • 2019-03-29
      • 1970-01-01
      • 2018-11-11
      • 2020-10-25
      • 2016-09-18
      • 2016-02-12
      • 1970-01-01
      相关资源
      最近更新 更多