【问题标题】:How to solve "ValueError: y should be a 1d array, got an array of shape () instead." for Extremely Random Forest Regressor?如何解决“ValueError: y 应该是一维数组,得到了一个形状为 () 的数组。”对于极端随机森林回归器?
【发布时间】:2021-11-05 03:03:45
【问题描述】:
import numpy as np
import matplotlib.pyplot as plt
from sklearn.metrics import classification_report, mean_absolute_error
from sklearn import cross_validation, preprocessing
from sklearn.ensemble import ExtraTreesRegressor
from sklearn.metrics import classification_report

# Load input data
input_file = 'traffic_data.txt'
data = []
with open(input_file, 'r') as f:
    for line in f.readlines():
        items = line[:-1].split(',')
        data.append(items)

data = np.array(data)

# Convert string data to numerical data
label_encoder = [] 
X_encoded = np.empty(data.shape)
for i, item in enumerate(data[0]):
    if item.isdigit():
        X_encoded[:, i] = data[:, i]
    else:
        label_encoder.append(preprocessing.LabelEncoder())
        X_encoded[:, i] = label_encoder[-1].fit_transform(data[:, i])

X = X_encoded[:, :-1].astype(int)
y = X_encoded[:, -1].astype(int)

# Split data into training and testing datasets 
X_train, X_test, y_train, y_test = cross_validation.train_test_split(
        X, y, test_size=0.25, random_state=5)

# Extremely Random Forests regressor
params = {'n_estimators': 100, 'max_depth': 4, 'random_state': 0}
regressor = ExtraTreesRegressor(**params)
regressor.fit(X_train, y_train)

# Compute the regressor performance on test data
y_pred = regressor.predict(X_test)
print("Mean absolute error:", round(mean_absolute_error(y_test, y_pred), 2))

# Testing encoding on single data instance
test_datapoint = ['Saturday', '10:20', 'Atlanta', 'no']
test_datapoint_encoded = [-1] * len(test_datapoint)
count = 0
for i, item in enumerate(test_datapoint):
    if item.isdigit():
        test_datapoint_encoded[i] = int(test_datapoint[i])
    else:
        test_datapoint_encoded[i] = int(label_encoder[count].transform(test_datapoint[i]))
        count = count + 1 

test_datapoint_encoded = np.array(test_datapoint_encoded)

# Predict the output for the test datapoint
print("Predicted traffic:", int(regressor.predict([test_datapoint_encoded])[0]))

** 错误似乎发生在第二个 for 循环的 else 部分,但无法找出原因:

文件“F:\Python_Workspace\AI\traffic_prediction.py”,第 114 行,在 test_datapoint_encoded[i] = int(label_encoder[count].transform(test_datapoint[i]))

文件“C:\Users\BISWADEEP\anaconda3\lib\site-packages\sklearn\preprocessing_label.py”,第 133 行,在转换中 y = column_or_1d(y, warn=True)

文件“C:\Users\BISWADEEP\anaconda3\lib\site-packages\sklearn\utils\validation.py”,第 63 行,inner_f 返回 f(*args, **kwargs)

文件“C:\Users\BISWADEEP\anaconda3\lib\site-packages\sklearn\utils\validation.py”,第 864 行,在 column_or_1d 引发 ValueError(

ValueError: y 应该是一个一维数组,得到一个形状为 () 的数组。 **

【问题讨论】:

    标签: python arrays numpy scikit-learn artificial-intelligence


    【解决方案1】:

    你能写出错误的行数是多少,并尝试检查输入数组的形状 - 看起来输入是空的。有时要预测值,您应该为模型重塑数组。: reshape_array=np.reshape(some_array,(-1,1))

    【讨论】:

    • 抱歉回复晚了。这是第错误:文件“F:\Python_Workspace\AI\traffic_prediction.py”,第 61 行,在 test_datapoint_encoded[i] = int(label_encoder[count].transform(test_datapoint[i]))
    • 我认为我正在关注的这本书有旧版本的 Python,这就是为什么所有代码​​都不能在新版本中工作的原因。
    • 我应该重塑哪个数组?我试图重塑 X 和 y 但大多数时候它会抛出 ValueError: Found input variables with contrast numbers of samples
    • X 的形状为 (17568, 4),y 的形状为 (17568, )
    • Shape() from error 建议您有空数组。您的代码中的 linę od 错误是什么?
    猜你喜欢
    • 1970-01-01
    • 2021-10-15
    • 2021-03-10
    • 1970-01-01
    • 1970-01-01
    • 2022-01-17
    • 1970-01-01
    • 2022-01-09
    • 1970-01-01
    相关资源
    最近更新 更多