【问题标题】:ValueError: Found input variables with inconsistent numbers of samples: [4, 103]ValueError:发现样本数量不一致的输入变量:[4, 103]
【发布时间】:2019-10-20 12:42:46
【问题描述】:

我一直在尝试从一本书中自学机器学习,这是我第一次尝试“偏离路径”算法。准备好数据后,我使用了导入的拆分功能,然后尝试进行一些预测。但是,即使在手动验证每个功能都有相同数量的 # 之后,我也会收到错误说明:

Traceback (most recent call last):
  File "main.py", line 89, in <module>
    xTrain, xTest, yTrain, yTest = tts(new_data, netGood, random_state=0)
  File "/home/runner/.local/lib/python3.6/site-packages/sklearn/model_selection/_split.py", line 2096, in train_test_split
    arrays = indexable(*arrays)
  File "/home/runner/.local/lib/python3.6/site-packages/sklearn/utils/validation.py", line 230, in indexable
    check_consistent_length(*result)
  File "/home/runner/.local/lib/python3.6/site-packages/sklearn/utils/validation.py", line 205, in check_consistent_length
    " samples: %r" % [int(l) for l in lengths])
ValueError: Found input variables with inconsistent numbers of samples: [4, 103]

问题是我使用 print 语句来验证每个特征恰好有 103 个条目,所以我不知道为什么错误认为特征不准确。任何帮助,将不胜感激。如果我在有人回答之前解决了它,我会更新答案。

from yahoo_historical import Fetcher
import pandas as pd
from IPython.display import display

data_Range = Fetcher("AAPL", [2019, 1, 1], [2019, 6, 1])

data = data_Range.getHistorical()

slopes = []

volumes = data['Volume'][1:]
highes = data['High']

for index in range(len(highes) - 1):
  slopes.append(highes[index + 1] - highes[index])

rLocale = []

for index in range(len(slopes)):

  #need to implement base cases
  if index is 0:
    if slopes[index] > slopes[index + 1]:
      rLocale.append(1)
    else:
      rLocale.append(-1)

  elif index is len(slopes) - 1:
    if slopes[index] > slopes[index - 1]:
      rLocale.append(1)
    else:
      rLocale.append(-1)

  else:
    behind = slopes[index - 1]
    current = slopes[index]
    infront = slopes[index + 1]

    if current > behind and current > infront:
      rLocale.append(1)
    if (current > behind and current < infront) or (current < behind and current > infront):
      rLocale.append(0)
    if current < behind and current < infront:
      rLocale.append(-1)


netGood = []

for index in range(1, len(highes)):
  if highes[index] >= highes[index - 1]:
    netGood.append(1)
  else:
    netGood.append(-1)

highes = highes[:-1]

new_data = [slopes, rLocale, highes, volumes]
print(len(new_data[0]))
print(len(new_data[1]))
print(len(new_data[2]))
print(len(new_data[3]))
print(len(netGood))

print('---------------------------')

from sklearn.neighbors import KNeighborsClassifier
clf = KNeighborsClassifier(n_neighbors=3)

from sklearn.model_selection import train_test_split as tts
xTrain, xTest, yTrain, yTest = tts(new_data, netGood, random_state=0)

clf.fit(new_data, netGood)
print(clf.predict(new_data))

控制台日志:

103
103
103
103
103
---------------------------

【问题讨论】:

    标签: python scikit-learn


    【解决方案1】:

    您需要有new_data,这是一组观察结果。现在,您拥有一系列功能。只是转置它应该可以解决它:

    import numpy as np
    new_data = np.transpose(new_data)
    

    【讨论】:

      猜你喜欢
      • 2022-01-11
      • 1970-01-01
      • 2021-06-20
      • 2018-06-25
      • 2018-12-04
      • 2021-05-31
      • 2020-08-19
      • 2021-09-12
      • 2021-08-06
      相关资源
      最近更新 更多