ValueError：发现样本数量不一致的输入变量：[4, 103]答案

【问题标题】：ValueError: Found input variables with inconsistent numbers of samples: [4, 103]ValueError：发现样本数量不一致的输入变量：[4, 103]
【发布时间】：2019-10-20 12:42:46
【问题描述】：

我一直在尝试从一本书中自学机器学习，这是我第一次尝试“偏离路径”算法。准备好数据后，我使用了导入的拆分功能，然后尝试进行一些预测。但是，即使在手动验证每个功能都有相同数量的 # 之后，我也会收到错误说明：

Traceback (most recent call last):
  File "main.py", line 89, in <module>
    xTrain, xTest, yTrain, yTest = tts(new_data, netGood, random_state=0)
  File "/home/runner/.local/lib/python3.6/site-packages/sklearn/model_selection/_split.py", line 2096, in train_test_split
    arrays = indexable(*arrays)
  File "/home/runner/.local/lib/python3.6/site-packages/sklearn/utils/validation.py", line 230, in indexable
    check_consistent_length(*result)
  File "/home/runner/.local/lib/python3.6/site-packages/sklearn/utils/validation.py", line 205, in check_consistent_length
    " samples: %r" % [int(l) for l in lengths])
ValueError: Found input variables with inconsistent numbers of samples: [4, 103]

问题是我使用 print 语句来验证每个特征恰好有 103 个条目，所以我不知道为什么错误认为特征不准确。任何帮助，将不胜感激。如果我在有人回答之前解决了它，我会更新答案。

from yahoo_historical import Fetcher
import pandas as pd
from IPython.display import display

data_Range = Fetcher("AAPL", [2019, 1, 1], [2019, 6, 1])

data = data_Range.getHistorical()

slopes = []

volumes = data['Volume'][1:]
highes = data['High']

for index in range(len(highes) - 1):
  slopes.append(highes[index + 1] - highes[index])

rLocale = []

for index in range(len(slopes)):

  #need to implement base cases
  if index is 0:
    if slopes[index] > slopes[index + 1]:
      rLocale.append(1)
    else:
      rLocale.append(-1)

  elif index is len(slopes) - 1:
    if slopes[index] > slopes[index - 1]:
      rLocale.append(1)
    else:
      rLocale.append(-1)

  else:
    behind = slopes[index - 1]
    current = slopes[index]
    infront = slopes[index + 1]

    if current > behind and current > infront:
      rLocale.append(1)
    if (current > behind and current < infront) or (current < behind and current > infront):
      rLocale.append(0)
    if current < behind and current < infront:
      rLocale.append(-1)


netGood = []

for index in range(1, len(highes)):
  if highes[index] >= highes[index - 1]:
    netGood.append(1)
  else:
    netGood.append(-1)

highes = highes[:-1]

new_data = [slopes, rLocale, highes, volumes]
print(len(new_data[0]))
print(len(new_data[1]))
print(len(new_data[2]))
print(len(new_data[3]))
print(len(netGood))

print('---------------------------')

from sklearn.neighbors import KNeighborsClassifier
clf = KNeighborsClassifier(n_neighbors=3)

from sklearn.model_selection import train_test_split as tts
xTrain, xTest, yTrain, yTest = tts(new_data, netGood, random_state=0)

clf.fit(new_data, netGood)
print(clf.predict(new_data))

控制台日志：

103
103
103
103
103
---------------------------

【问题讨论】：

标签： python scikit-learn

【解决方案1】：

您需要有new_data，这是一组观察结果。现在，您拥有一系列功能。只是转置它应该可以解决它：

import numpy as np
new_data = np.transpose(new_data)

【讨论】：