Sklearn：ValueError：发现样本数量不一致的输入变量：[500, 1]答案

【问题标题】：Sklearn: ValueError:Found input variables with inconsistent numbers of samples: [500, 1]Sklearn：ValueError：发现样本数量不一致的输入变量：[500, 1]
【发布时间】：2019-06-06 08:49:42
【问题描述】：

我正在使用 python 的 sklearn 库解决机器学习问题

我正在使用 pandas 数据框，我想使用我的本地数据训练线性回归模型并预测新值。这是我的代码示例。

customers= pd.read_csv('Ecommerce Customers')
X= customers[['Avg. Session Length', 'Time on App','Time on Website', 'Length of Membership']]
y=['Yearly Amount Spent']

当我尝试在下面的代码中运行时

from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=101)

它给了我一个错误

Found input variables with inconsistent numbers of samples: [500, 1]

在我的数据集中它有 500 行和 8 列 sklearn 版本是

import sklearn
format(sklearn.__version__)
'0.20.1'

请帮助我。提前致谢

【问题讨论】：

标签： python pandas machine-learning scikit-learn

【解决方案1】：

仔细查看您的代码，您不会像您可能打算那样将y 视为数据框customers 的列；正如你所拥有的

y=['Yearly Amount Spent']

y 只是一个 1 元素列表：

y
# ['Yearly Amount Spent']

因此 scikit-learn 有理由抱怨标签的长度 y 仅为 1。

改成

y=customers['Yearly Amount Spent']

【讨论】：