fit_transform 只有大小为 1 的数组可以转换为 Python 标量答案

【问题标题】：fit_transform Only Size-1 Arrays Can Be Converted To Python Scalarsfit_transform 只有大小为 1 的数组可以转换为 Python 标量
【发布时间】：2021-07-08 02:14:44
【问题描述】：

据我了解，当我尝试传递数组而不是单个值时会发生此错误，但我认为 StandardScaler 应该接受矩阵。

对于正在提取的数据的上下文，我处理了一个包含 1,000 个垃圾箱图像的目录，我稍后会对其进行整形。我认为重塑可以解决问题，但我们到了。

编辑：更新代码以包含导入和错误消息，它应该使用提供的内容运行。

import os
from os import listdir
from os.path import isfile, join
import matplotlib as mpl
import matplotlib.pyplot as plt
from IPython.display import display
%matplotlib inline
import pandas as pd
import numpy as np
from PIL import Image
from skimage.feature import hog
from skimage.color import rgb2gray
from skimage.color import rgba2rgb
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
from sklearn.preprocessing import StandardScaler
from sklearn.decomposition import PCA
from sklearn.svm import SVC
from sklearn.metrics import roc_curve, auc

def returnImageArray(filename, directory = "./garbage bins/"):
    filePath = os.path.join(directory, filename)
    image = Image.open(filePath)
    return np.array(image)

def processImageFeatures(image):
    features = image.flatten()
    grayscaleConversion = rgb2gray(rgba2rgb(image))
    hogify = hog(grayscaleConversion, block_norm='L2-Hys', pixels_per_cell=(16, 16))
    flatify = np.hstack(features)
    return flatify

allFeatures = []
for file in fileNames:
    image = returnImageArray(file)
    feat = processImageFeatures(image)
    allFeatures.append(feat)
    
allFeaturesArray = np.array(allFeatures, dtype=object)

print(allFeaturesArray.shape)
reshapedF = np.array(allFeaturesArray).reshape(-1, 1)
print(reshapedF.shape)
print(reshapedF[1])
scaler = StandardScaler()

 ##----
garbage = scaler.fit_transform(reshapedF) ##ERROR HERE: only size-1 arrays can be converted to Python scalars
 ##----

pca = PCA(n_components=1000)
garbagePCA = scale.fit_transform(gabrage)

print(allFeaturesArray.shape)
#(1000,)
#(1000, 1)
#[array([196, 179, 146, ..., 187, 164, 255], dtype=uint8)]

警告

<ipython-input-6-6c460760299f>:7: VisibleDeprecationWarning: Creating an ndarray from ragged nested sequences (which is a list-or-tuple of lists-or-tuples-or ndarrays with different lengths or shapes) is deprecated. If you meant to do this, you must specify 'dtype=object' when creating the ndarray   
    allFeaturesArray = np.array(allFeatures)

错误

TypeError                                 Traceback (most recent call last)
TypeError: only size-1 arrays can be converted to Python scalars

The above exception was the direct cause of the following exception:

ValueError                                Traceback (most recent call last)
<ipython-input-7-4533ce454d0d> in <module>
      6 scaler = StandardScaler()
      7 
----> 8 garbage = scaler.fit_transform(reshapedF)
      9 
     10 pca = PCA(n_components=1000)

~\anaconda3\lib\site-packages\sklearn\base.py in fit_transform(self, X, y, **fit_params)
    688         if y is None:
    689             # fit method of arity 1 (unsupervised transformation)
--> 690             return self.fit(X, **fit_params).transform(X)
    691         else:
    692             # fit method of arity 2 (supervised transformation)

~\anaconda3\lib\site-packages\sklearn\preprocessing\_data.py in fit(self, X, y)
    665         # Reset internal state before fitting
    666         self._reset()
--> 667         return self.partial_fit(X, y)
    668 
    669     def partial_fit(self, X, y=None):

~\anaconda3\lib\site-packages\sklearn\preprocessing\_data.py in partial_fit(self, X, y)
    694             Transformer instance.
    695         """
--> 696         X = self._validate_data(X, accept_sparse=('csr', 'csc'),
    697                                 estimator=self, dtype=FLOAT_DTYPES,
    698                                 force_all_finite='allow-nan')

~\anaconda3\lib\site-packages\sklearn\base.py in _validate_data(self, X, y, reset, validate_separately, **check_params)
    418                     f"requires y to be passed, but the target y is None."
    419                 )
--> 420             X = check_array(X, **check_params)
    421             out = X
    422         else:

~\anaconda3\lib\site-packages\sklearn\utils\validation.py in inner_f(*args, **kwargs)
     70                           FutureWarning)
     71         kwargs.update({k: arg for k, arg in zip(sig.parameters, args)})
---> 72         return f(**kwargs)
     73     return inner_f
     74 

~\anaconda3\lib\site-packages\sklearn\utils\validation.py in check_array(array, accept_sparse, accept_large_sparse, dtype, order, copy, force_all_finite, ensure_2d, allow_nd, ensure_min_samples, ensure_min_features, estimator)
    596                     array = array.astype(dtype, casting="unsafe", copy=False)
    597                 else:
--> 598                     array = np.asarray(array, order=order, dtype=dtype)
    599             except ComplexWarning:
    600                 raise ValueError("Complex data not supported\n"

~\anaconda3\lib\site-packages\numpy\core\_asarray.py in asarray(a, dtype, order)
     81 
     82     """
---> 83     return array(a, dtype, copy=False, order=order)
     84 
     85 

ValueError: setting an array element with a sequence.

图片集

这是 returnImageArray() 从中检索图像的目录列表。

【问题讨论】：

为什么是object？ np.array(allFeatures, dtype=object)
建议提供一段工作代码，即包括导入语句，这样试图重现您遇到的错误的人就不会花时间重新创建您的问题。
请提供错误和堆栈跟踪
我使用“dtype=object”的原因是因为它向我发出了警告，我应该删除它还是有更好的方法来处理我收到的警告？具体来说，警告是：<ipython-input-6-6c460760299f>:7: VisibleDeprecationWarning: Creating an ndarray from ragged nested sequences (which is a list-or-tuple of lists-or-tuples-or ndarrays with different lengths or shapes) is deprecated. If you meant to do this, you must specify 'dtype=object' when creating the ndarray allFeaturesArray = np.array(allFeatures)
但即使我删除了 dtype，它似乎也给出了相同的结果。

标签： python pandas numpy scikit-learn

【解决方案1】：

稍微研究了您的问题后，我意识到这与 hpaulj 关于dtype=object 用法的问题有关。我猜你用它来避免在尝试将不同大小的数组列表转换为单个数组时遇到的错误。

问题在于reshapedF 是一个包含两个对象的数组，每个对象本身都是不同大小的数组。

您需要创建一个作为合并其他数组的结果的数组，例如，使用numpy.stack() 或numpy.concatenate() 或任何适合您想要实现的东西。

重点是StandardScaler.fit_transform() 的输入必须是一个

类似数组的形状（n_samples, n_features）

根据manual.

为此，您需要跨样本具有相同数量的特征。这可以通过在图像中具有相同数量的特征（如果这对您的情况有意义）或通过使用相同大小的图像来实现。

【讨论】：

我使用“dtype=object”的原因是因为它向我发出了警告，我应该删除它还是有更好的方法来处理我收到的警告？具体来说，警告是：<ipython-input-6-6c460760299f>:7: VisibleDeprecationWarning: Creating an ndarray from ragged nested sequences (which is a list-or-tuple of lists-or-tuples-or ndarrays with different lengths or shapes) is deprecated. If you meant to do this, you must specify 'dtype=object' when creating the ndarray allFeaturesArray = np.array(allFeatures)
为了澄清，我更新了我的帖子。

【解决方案2】：

图像必须具有完全相同的分辨率，因为每个单独图像的形状会产生不同大小的矩阵。一旦它们的分辨率相同，它就不会出现问题。

【讨论】：

这是我回答的更通用的含义，您需要具有相同数量的特征，要么具有相同大小的图像，要么通过保持最小数量来固定特征数量图片（如果有意义的话）。我已经更新了我的答案，使其更加明确。