【发布时间】:2021-09-27 08:22:04
【问题描述】:
我正在处理一些customer_data,作为第一步,我想进行 PCA,然后作为第二步进行聚类。
由于在将数据提供给 PCA 之前需要完成编码(和缩放),我认为将它们全部放入管道中会很好。 - 不幸的是,这似乎不起作用。
如何创建这个管道,这样做是否有意义?
# Creating pipeline objects
encoder = OneHotEncoder(drop='first')
scaler = StandardScaler(with_mean=False)
pca = PCA()
# Create pipeline
pca_pipe = make_pipeline(encoder,
scaler,
pca)
# Fit data to pipeline
pca_pipe.fit_transform(customer_data_raw)
我收到以下错误消息:
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-27-c4ce88042a66> in <module>()
20
21 # Fit data to pipeline
---> 22 pca_pipe.fit_transform(customer_data_raw)
2 frames
/usr/local/lib/python3.7/dist-packages/sklearn/decomposition/_pca.py in _fit(self, X)
385 # This is more informative than the generic one raised by check_array.
386 if issparse(X):
--> 387 raise TypeError('PCA does not support sparse input. See '
388 'TruncatedSVD for a possible alternative.')
389
TypeError: PCA does not support sparse input. See TruncatedSVD for a possible alternative.
【问题讨论】:
-
什么不起作用?如果有错误消息,请提供(在问题中)完整的错误回溯。否则,请提供期望和实际行为的示例。
-
我收到以下错误:
TypeError: PCA does not support sparse input. See TruncatedSVD for a possible alternative.
标签: python scikit-learn pipeline pca