【问题标题】:Sparse Matrix error in MLPRegressorMLPRegressor 中的稀疏矩阵错误
【发布时间】:2018-08-17 18:45:57
【问题描述】:

上下文

我在尝试使用稀疏矩阵作为sklearn.neural_network.MLPRegressor 的输入时遇到了错误。名义上,这种方法能够处理稀疏矩阵。我认为这可能是scikit-learn 中的一个错误,但在我提交问题之前想在这里检查一下。

问题

scipy.sparse 输入传递给sklearn.neural_network.MLPRegressor 时,我得到:

ValueError: input must be a square array

错误是由numpy.matrixlab.defmatrix 中的matrix_power 函数引发的。这似乎是因为matrix_power 将稀疏矩阵传递给numpy.asanyarray (L137),它返回一个大小=1、ndim=0 的数组,其中包含稀疏矩阵对象。 matrix_power 然后执行一些维度检查 (L138-141) 以确保输入是方阵,因为numpy.asanyarray 返回的数组不是方阵而失败,即使底层稀疏矩阵 正方形。

据我所知,问题源于numpy.asanyarray 阻止确定稀疏矩阵的维度。稀疏矩阵本身有一个 size 属性,允许它通过维度检查,但前提是它没有通过asanyarray

认为这可能是一个错误,但在我确认我不只是一个白痴之前,我不想讨论提交问题!请看下面,检查。

如果它一个错误,那么哪里是最适合提出问题的地方?数字货币?科学派?还是 Scikit-Learn?

小例子

环境

Arch Linux
kernel     4.15.7-1

Python     3.6.4
numpy      1.14.1
scipy      1.0.0
sklearn    0.19.1

代码

import numpy as np
from scipy import sparse
from sklearn import model_selection
from sklearn.preprocessing import StandardScaler, Imputer
from sklearn.neural_network import MLPRegressor

## Generate some synthetic data

def fW(A, B, C):
    return A * np.random.normal(.3, .1) + B * np.random.normal(.6, .1)

def fX(A, B, C):
    return B * np.random.normal(-1, .1) + A * np.random.normal(-.9, .1) / C

# independent variables
N = int(1e4)
A = np.random.uniform(2, 12, N)
B = np.random.uniform(2, 12, N)
C = np.random.uniform(2, 12, N)

# synthetic data
mW = fW(A, B, C)
mX = fX(A, B, C)

# combine datasets
real = np.vstack([A, B, C]).T
meas = np.vstack([mW, mX]).T

# add noise to meas
meas *= np.random.normal(1, 0.0001, meas.shape)

## Make data sparse
prob_null = 0.2
real[np.random.choice([True, False], real.shape, p=[prob_null, 1-prob_null])] = np.nan
meas[np.random.choice([True, False], meas.shape, p=[prob_null, 1-prob_null])] = np.nan

# NB: problem persists whichever sparse matrix method is used.
real = sparse.csr_matrix(real)
meas = sparse.csr_matrix(meas)

# replace missing values with mean
rmnan = Imputer()
real = rmnan.fit_transform(real)
meas = rmnan.fit_transform(meas)

# split into test/training sets
real_train, real_test, meas_train, meas_test = model_selection.train_test_split(real, meas, test_size=0.3)

# create scalers and apply to data
real_scaler = StandardScaler(with_mean=False)
meas_scaler = StandardScaler(with_mean=False)

real_scaler.fit(real_train)
meas_scaler.fit(meas_train)

treal_train = real_scaler.transform(real_train)
tmeas_train = meas_scaler.transform(meas_train)

treal_test = real_scaler.transform(real_test)
tmeas_test = meas_scaler.transform(meas_test)

nn = MLPRegressor((100,100,10), solver='lbfgs', early_stopping=True, activation='tanh')
nn.fit(tmeas_train, treal_train)

## ERROR RAISED HERE

## The problem:

# the sparse matrix has a shape attribute that would pass the square matrix validation
tmeas_train.shape

# but not after it's been through asanyarray
np.asanyarray(tmeas_train).shape

【问题讨论】:

  • 如果将asarray 应用于该输入,则未设置为使用稀疏矩阵。这种包装不仅仅隐藏了形状。它隐藏了矩阵以防止进一步用作矩阵。你需要自己做toarray

标签: python numpy scipy scikit-learn sparse-matrix


【解决方案1】:

MLPRegressor.fit() as given in documentation 支持X 的稀疏矩阵,但不支持y 的稀疏矩阵

参数:

X : 类数组或稀疏矩阵,形状 (n_samples, n_features)

The input data.

y : 类数组,形状 (n_samples,) 或 (n_samples, n_outputs)

The target values (class labels in classification, real numbers in regression).

我能够成功运行您的代码:

nn.fit(tmeas_train, treal_train.toarray())

【讨论】:

  • 谢谢!我怀疑我是个白痴。确认!!
猜你喜欢
  • 1970-01-01
  • 1970-01-01
  • 2018-01-19
  • 2014-05-14
  • 1970-01-01
  • 2012-01-10
  • 2015-04-26
  • 1970-01-01
  • 1970-01-01
相关资源
最近更新 更多