【发布时间】:2014-10-04 15:09:49
【问题描述】:
我正在尝试使用 Scikit learn 实现网格搜索以选择 KNN 回归的最佳参数。 特别是我正在尝试做的事情:
parameters = [{'weights': ['uniform', 'distance'], 'n_neighbors': [5, 10, 20, 30, 40, 50, 60, 70, 80, 90, 100]}]
clf = GridSearchCV(neighbors.KNeighborsRegressor(), parameters)
clf.fit(features, rewards)
不幸的是,我收到 ValueError: Array contains NaN or infinity.
/Users/zikesjan/anaconda/lib/python2.7/site-packages/sklearn/grid_search.pyc in fit(self, X, y, **params)
705 " The params argument will be removed in 0.15.",
706 DeprecationWarning)
--> 707 return self._fit(X, y, ParameterGrid(self.param_grid))
708
709
/Users/zikesjan/anaconda/lib/python2.7/site-packages/sklearn/grid_search.pyc in _fit(self, X, y, parameter_iterable)
491 X, y, base_estimator, parameters, train, test,
492 self.scorer_, self.verbose, **self.fit_params)
--> 493 for parameters in parameter_iterable
494 for train, test in cv)
495
/Users/zikesjan/anaconda/lib/python2.7/site-packages/sklearn/externals/joblib/parallel.pyc in __call__(self, iterable)
515 try:
516 for function, args, kwargs in iterable:
--> 517 self.dispatch(function, args, kwargs)
518
519 self.retrieve()
/Users/zikesjan/anaconda/lib/python2.7/site-packages/sklearn/externals/joblib/parallel.pyc in dispatch(self, func, args, kwargs)
310 """
311 if self._pool is None:
--> 312 job = ImmediateApply(func, args, kwargs)
313 index = len(self._jobs)
314 if not _verbosity_filter(index, self.verbose):
/Users/zikesjan/anaconda/lib/python2.7/site-packages/sklearn/externals/joblib/parallel.pyc in __init__(self, func, args, kwargs)
134 # Don't delay the application, to avoid keeping the input
135 # arguments in memory
--> 136 self.results = func(*args, **kwargs)
137
138 def get(self):
/Users/zikesjan/anaconda/lib/python2.7/site-packages/sklearn/grid_search.pyc in fit_grid_point(X, y, base_estimator, parameters, train, test, scorer, verbose, loss_func, **fit_params)
309 this_score = scorer(clf, X_test, y_test)
310 else:
--> 311 this_score = clf.score(X_test, y_test)
312 else:
313 clf.fit(X_train, **fit_params)
/Users/zikesjan/anaconda/lib/python2.7/site-packages/sklearn/base.pyc in score(self, X, y)
320
321 from .metrics import r2_score
--> 322 return r2_score(y, self.predict(X))
323
324
/Users/zikesjan/anaconda/lib/python2.7/site-packages/sklearn/metrics/metrics.pyc in r2_score(y_true, y_pred)
2181
2182 """
-> 2183 y_type, y_true, y_pred = _check_reg_targets(y_true, y_pred)
2184
2185 if len(y_true) == 1:
/Users/zikesjan/anaconda/lib/python2.7/site-packages/sklearn/metrics/metrics.pyc in _check_reg_targets(y_true, y_pred)
59 Estimated target values.
60 """
---> 61 y_true, y_pred = check_arrays(y_true, y_pred)
62
63 if y_true.ndim == 1:
/Users/zikesjan/anaconda/lib/python2.7/site-packages/sklearn/utils/validation.pyc in check_arrays(*arrays, **options)
231 else:
232 array = np.asarray(array, dtype=dtype)
--> 233 _assert_all_finite(array)
234
235 if copy and array is array_orig:
/Users/zikesjan/anaconda/lib/python2.7/site-packages/sklearn/utils/validation.pyc in _assert_all_finite(X)
25 if (X.dtype.char in np.typecodes['AllFloat'] and not np.isfinite(X.sum())
26 and not np.isfinite(X).all()):
---> 27 raise ValueError("Array contains NaN or infinity.")
28
29
ValueError: Array contains NaN or infinity.
基于此post,我已经尝试使用 fit 代替上面的那一行:
clf.fit(np.asarray(features).astype(float), np.asarray(rewards).astype(float))
然后基于这个post我什至尝试了这个:
scaler = preprocessing.StandardScaler().fit(np.asarray(features).astype(float))
transformed_features = scaler.transform(np.asarray(features).astype(float))
clf.fit(transformed_features, rewards)
但不幸的是没有任何成功。所以我想问一下是否有人知道问题可能出在哪里以及如何使我的代码正常工作。
非常感谢您。
编辑:
我发现如果我只有以下参数,我不会收到此错误:
parameters = [{'weights': ['uniform'], 'n_neighbors': [5, 10, 20, 30, 40, 50, 60, 70, 80, 90, 100]}]
所以看起来问题出在权重=距离的情况下。有人知道为什么吗?
出现了另一个与此相关的问题,我正在询问 here。
编辑 2:
如果我在调试时设置日志记录运行我的代码,我会收到以下警告:
/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/sklearn/neighbors/regression.py:160: RuntimeWarning: invalid value encountered in divide
y_pred[:, j] = num / denom
因此,除以零显然存在问题。所以我的问题是为什么在regression.py的第160行有scikit除以0?
【问题讨论】:
-
numpy.isnan(features).any()或numpy.isnan(rewards)是否会产生True?
标签: python numpy scikit-learn