【发布时间】:2020-02-02 18:24:34
【问题描述】:
我正在构建一个逻辑回归模型,并且想了解对我的输出贡献最大的特征(1 或 0)。试图了解客户是否回到我的网站,是什么功能让他们回来。我被这个 fit 函数困住了。它在我身上出错,我不知道为什么。这似乎表明我有一些空值,但我已经清理了我的数据并删除了空值。
import pandas as pd
import numpy as np
from sklearn import preprocessing
import matplotlib.pyplot as plt
from sklearn.feature_selection import SelectKBest
from sklearn.feature_selection import chi2
#load data
df = pd.read_csv('jupyter.csv', header = 0)
array = dataframe.values
X = array[:,1:13]
Y = array[:,14]
print (X.shape)
print (Y.shape)
(544219, 12)
(544219,)
# feature extraction
test = SelectKBest(score_func=chi2, k=4)
fit = test.fit(X, Y)
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-63-f91db4d08897> in <module>
1 # feature extraction
2 test = SelectKBest(score_func=chi2, k=4)
----> 3 fit = test.fit(X, Y)
4 # summarize scores
5 #numpy.set_printoptions(precision=3)
/opt/anaconda3/lib/python3.7/site-packages/sklearn/feature_selection/univariate_selection.py in fit(self, X, y)
339 self : object
340 """
--> 341 X, y = check_X_y(X, y, ['csr', 'csc'], multi_output=True)
342
343 if not callable(self.score_func):
/opt/anaconda3/lib/python3.7/site-packages/sklearn/utils/validation.py in check_X_y(X, y, accept_sparse, accept_large_sparse, dtype, order, copy, force_all_finite, ensure_2d, allow_nd, multi_output, ensure_min_samples, ensure_min_features, y_numeric, warn_on_dtype, estimator)
720 if multi_output:
721 y = check_array(y, 'csr', force_all_finite=True, ensure_2d=False,
--> 722 dtype=None)
723 else:
724 y = column_or_1d(y, warn=True)
/opt/anaconda3/lib/python3.7/site-packages/sklearn/utils/validation.py in check_array(array, accept_sparse, accept_large_sparse, dtype, order, copy, force_all_finite, ensure_2d, allow_nd, ensure_min_samples, ensure_min_features, warn_on_dtype, estimator)
540 if force_all_finite:
541 _assert_all_finite(array,
--> 542 allow_nan=force_all_finite == 'allow-nan')
543
544 if ensure_min_samples > 0:
/opt/anaconda3/lib/python3.7/site-packages/sklearn/utils/validation.py in _assert_all_finite(X, allow_nan)
58 elif X.dtype == np.dtype('object') and not allow_nan:
59 if _object_dtype_isnan(X).any():
---> 60 raise ValueError("Input contains NaN")
61
62
ValueError: Input contains NaN
【问题讨论】:
-
在拟合之前,尝试
X = X.reset_index()和Y = Y.reset_index() -
所以
df.iloc[:, :14].isnull().sum().sum()返回 0?如果没有我建议你看看df[df.iloc[:, :14].isnull().any(1)] -
是的@ALollz。它返回 0。
-
所以刚刚注意到
array = dataframe.values似乎正在使用一些变量dataframe,它不是您在上一行中读到的df? -
谢谢你!是的,那是我的一个错误。 ```
标签: python pandas machine-learning scikit-learn missing-data