【发布时间】:2018-05-27 05:09:23
【问题描述】:
我刚开始学习机器学习,在练习其中一项任务时,我遇到了值错误,但我遵循了与讲师相同的步骤。
我收到值错误,请帮助。
dff
Country Name
0 AUS Sri
1 USA Vignesh
2 IND Pechi
3 USA Raj
首先我执行了标签编码,
X=dff.values
label_encoder=LabelEncoder()
X[:,0]=label_encoder.fit_transform(X[:,0])
out:
X
array([[0, 'Sri'],
[2, 'Vignesh'],
[1, 'Pechi'],
[2, 'Raj']], dtype=object)
然后对同一个 X 执行一次热编码
onehotencoder=OneHotEncoder( categorical_features=[0])
X=onehotencoder.fit_transform(X).toarray()
我收到以下错误:
ValueError Traceback (most recent call last)
<ipython-input-472-be8c3472db63> in <module>()
----> 1 X=onehotencoder.fit_transform(X).toarray()
C:\ProgramData\Anaconda3\lib\site-packages\sklearn\preprocessing\data.py in fit_transform(self, X, y)
1900 """
1901 return _transform_selected(X, self._fit_transform,
-> 1902 self.categorical_features, copy=True)
1903
1904 def _transform(self, X):
C:\ProgramData\Anaconda3\lib\site-packages\sklearn\preprocessing\data.py in _transform_selected(X, transform, selected, copy)
1695 X : array or sparse matrix, shape=(n_samples, n_features_new)
1696 """
-> 1697 X = check_array(X, accept_sparse='csc', copy=copy, dtype=FLOAT_DTYPES)
1698
1699 if isinstance(selected, six.string_types) and selected == "all":
C:\ProgramData\Anaconda3\lib\site-packages\sklearn\utils\validation.py in check_array(array, accept_sparse, dtype, order, copy, force_all_finite, ensure_2d, allow_nd, ensure_min_samples, ensure_min_features, warn_on_dtype, estimator)
380 force_all_finite)
381 else:
--> 382 array = np.array(array, dtype=dtype, order=order, copy=copy)
383
384 if ensure_2d:
ValueError: could not convert string to float: 'Raj'
请编辑我的问题有什么问题,提前谢谢!
【问题讨论】:
-
为什么不将
'Name'列更改为数字,就像对'Country'所做的那样。 OneHotEncoder 只处理数字 X。所以要么在发送到 OneHotEncoder 之前从 X 中删除它,要么转换为数字。 -
我只通过了一行
X[:,0]=onehotencoder.fit_transform(X[:,0]).toarray(),但仍然是\sklearn\utils\validation.py:395: DeprecationWarning: Passing 1d arrays as data is deprecated in 0.17 and will raise ValueError in 0.19. Reshape your data either using X.reshape(-1, 1) if your data has a single feature or X.reshape(1, -1) if it contains a single sample. DeprecationWarning) -
是的,那是因为您将 rank1 数组,即
X[:,0]传递给已弃用的onehotencoder.fit_transform。因此,您需要通过X[:,0].reshape(-1,1)或使用np.newaxis来重塑它。 -
@AruneshSingh,谢谢,可以用我的数据发布您的答案吗?我尝试重塑并得到
DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel(). y = column_or_1d(y, warn=True),我的输出是array([[1.0, 2], [2.0, 3], [1.0, 0], [2.0, 1]], dtype=object),应该是1还是0吧? -
这些只是警告,因此不会影响您的结果。
标签: python scikit-learn preprocessor sklearn-pandas one-hot-encoding