【问题标题】:ValueError: Found arrays with inconsistent numbers of samples [1,299]ValueError:发现样本数量不一致的数组 [1,299]
【发布时间】:2016-05-17 00:37:57
【问题描述】:

这里是数据文件herehere。您可以通过单击链接下载它。我正在使用 Pandas、Numpy 和 Python3。

这是我的代码:

import pandas as pa
import numpy as nu
from sklearn.linear_model import Perceptron
from sklearn.metrics import accuracy_score
from sklearn.preprocessing import StandardScaler

def get_accuracy(X_train, y_train, X_test, y_test):
    perceptron = Perceptron()
    perceptron.fit(X_train, y_train)
    perceptron.transform(X_train)
    prediction = perceptron.predict(X_test)
    result = accuracy_score(y_test, prediction)
    return result

test_data = pa.read_csv("C:/Users/Roman/Downloads/perceptron-test.csv")
test_data.columns = ["class", "f1", "f2"]
train_data = pa.read_csv("C:/Users/Roman/Downloads/perceptron-train.csv")
train_data.columns = ["class", "f1", "f2"]

scaler = StandardScaler()
scaler.fit_transform(train_data[train_data.columns[1:]]).reshape(-1,1)
X_train = scaler.transform(train_data[train_data.columns[1:]])

scaler.fit_transform(train_data[train_data.columns[0]])
y_train = scaler.transform(train_data[train_data.columns[0]])

scaler.fit_transform(test_data[test_data.columns[1:]])
X_test = scaler.transform(test_data[test_data.columns[1:]])

scaler.fit_transform(test_data[test_data.columns[0]])
y_test = scaler.transform(test_data[test_data.columns[0]])




scaled_accuracy = get_accuracy(nu.ravel(X_train), nu.ravel(y_train),    nu.ravel(X_test), nu.ravel(y_test))
print(scaled_accuracy)

这是我得到的错误:

Traceback (most recent call last):
  File "C:/Users/Roman/PycharmProjects/data_project-1/lecture_2_perceptron.py", line 33, in <module>
    scaled_accuracy = get_accuracy(nu.ravel(X_train), nu.ravel(y_train), nu.ravel(X_test), nu.ravel(y_test))
  File "C:/Users/Roman/PycharmProjects/data_project-1/lecture_2_perceptron.py", line 9, in get_accuracy
    perceptron.fit(X_train, y_train)
  File "C:\Users\Roman\AppData\Roaming\Python\Python35\site-packages\sklearn\linear_model\stochastic_gradient.py", line 545, in fit
    sample_weight=sample_weight)
  File "C:\Users\Roman\AppData\Roaming\Python\Python35\site-packages\sklearn\linear_model\stochastic_gradient.py", line 389, in _fit
    X, y = check_X_y(X, y, 'csr', dtype=np.float64, order="C")
  File "C:\Users\Roman\AppData\Roaming\Python\Python35\site-packages\sklearn\utils\validation.py", line 520, in check_X_y
    check_consistent_length(X, y)
  File "C:\Users\Roman\AppData\Roaming\Python\Python35\site-packages\sklearn\utils\validation.py", line 176, in check_consistent_length
    "%s" % str(uniques))
**ValueError: Found arrays with inconsistent numbers of samples: [  1 299]**

如果不缩放数据,一切正常。但缩放后没有。

【问题讨论】:

  • 您能分享一下您的 CSV 文件的内容吗?我的意思是,如果没有数据,就无法复制输出,你看!
  • 调用fit_transform返回缩放后的数据;尝试将您的 fit_transforms 设置为等于您的 X 和 y 训练/测试对象

标签: python numpy pandas machine-learning scikit-learn


【解决方案1】:

您不应该在每次使用缩放器时调用fit_transform。你应该在训练数据上 fit 一次,然后只在 transform 上,否则你会得到不同的训练和测试表示(导致提供错误)。缩放标签也没有意义。

【讨论】:

    猜你喜欢
    • 2016-05-16
    • 2016-05-24
    • 2016-09-12
    • 1970-01-01
    • 2017-10-24
    • 2021-06-20
    • 2018-06-25
    • 2018-12-04
    • 2021-05-31
    相关资源
    最近更新 更多