Python SKLearn：预测序列时出现“输入形状错误”错误答案

【问题标题】：Python SKLearn: 'Bad input shape' error when predicting a sequencePython SKLearn：预测序列时出现“输入形状错误”错误
【发布时间】：2019-04-08 10:22:07
【问题描述】：

我有一个 Excel 文件，它在每一列中存储一个序列（从顶部单元格到底部单元格读取），并且序列的趋势与上一列相似。所以我想预测这个数据集中第 n 列的序列。

我的数据集样本：

看到每一列都有一组值/序列，当我们向右移动时它们会有所进展，所以我想预测例如Z 列中的值。

到目前为止，这是我的代码：

import numpy as np
import pandas as pd
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split

# Read the Excel file in rows
df = pd.read_excel(open('vec_sol2.xlsx', 'rb'),
                header=None, sheet_name='Sheet1')
print(type(df))
length = len(df.columns)
# Get the sequence for each row

x_train, x_test, y_train, y_test = train_test_split(
    np.reshape(range(0, length - 1), (-1, 1)), df, test_size=0.25, random_state=0)

print("y_train shape: ", y_train.shape)

pred_model = LogisticRegression()
pred_model.fit(x_train, y_train)
print(pred_model)

我会尽量解释逻辑：

x_train 和 x_test 将只是与序列关联的索引/列号。
y_train 是一个序列数组。
总共有 51 列，因此将其拆分为 25% 为测试数据会产生 37 个训练序列和 13 个测试序列。

我在调试时设法得到了每个 var 的形状，它们是：

x_train : (37, 1)
x_test : (13, 1)
y_train : (37, 51)
y_test : (13, 51)

但是现在，运行程序给了我这个错误：

ValueError: bad input shape (37, 51)

我的错误是什么？

【问题讨论】：

标签： python pandas scikit-learn prediction valueerror

【解决方案1】：

我不明白你为什么要使用这个：

x_train, x_test, y_train, y_test = train_test_split(
np.reshape(range(0, length - 1), (-1, 1)), df, test_size=0.25, random_state=0)

您在df 中有数据。从中提取X 和y，然后将其拆分以进行训练和测试。

试试这个：

X = df.iloc[:,:-1]
y = df.iloc[:, -1:]

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.20, random_state=0)

否则，您分享的统计数据显示您正在尝试从一个功能中获得 51 个列输出，如果您仔细想想，这很奇怪。

【讨论】：

谢谢。但是现在 X 指的是什么？另外，第二个问题，是否有可能从前列的值中预测出一组值，正如我在本线程开头所述的那样？
我现在在使用您的解决方案时收到此错误：A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().
X 表示输入向量，y 表示输出向量。如果您的意思是一个纯新列，其行数与提供给模型以进行预测的输入具有相同的行数，那么设置是的。对于错误，请检查此链接stackoverflow.com/questions/34165731/…