【问题标题】:Visualize Trees and OOB error: 'numpy.ndarray' object is not callable可视化树和 OOB 错误:“numpy.ndarray”对象不可调用
【发布时间】:2020-04-28 06:54:02
【问题描述】:

我想可视化我的 RandomForestRegresser 和 GradietBoostRegressor 的树数和 oob 错误。所以我已经对这些行进行了编码,但由于某种原因,'numpy.ndarray' 对象是不可调用的。有人知道为什么这不起作用吗?我希望你有一个愉快的一天,谢谢!

train_results = []
test_results = []
list_nb_trees = [5, 10, 15, 20, 25, 30, 35, 40, 45, 50, 55, 60, 65,70 , 75, 80, 85, 90, 95, 100]

for nb_trees in list_nb_trees:
    rf = RandomForestRegressor(n_estimators=nb_trees,
                               max_depth= None,
                               max_features= 50,
                               min_samples_leaf= 5,
                               min_samples_split= 2,
                               random_state= 42,
                               oob_score= True, 
                               n_jobs= -1)
    rf.fit(X_train_v1, y_train_v1)

train_results.append(mean_squared_error(y_train_v1, rf.oob_prediction_(X_train_v1)))
test_results.append(mean_squared_error(y_test_v1, rf.oob_prediction_(X_test_v1)))

plt.figure(figsize=(15, 5))
line2, = plt.plot(list_nb_trees, test_results, color="g", label="Test OOB Score")
line1, = plt.plot(list_nb_trees, train_results, color="b", label="Training  OOB Score")
plt.title('Trainings- und Test Out-of-Bag Score')
plt.legend(handler_map={line1: HandlerLine2D(numpoints=2)})
plt.ylabel('MSE')
plt.xlabel('n_estimators')
plt.show()
/opt/conda/lib/python3.7/site-packages/sklearn/ensemble/forest.py:737: UserWarning: Some inputs do not have OOB scores. This probably means too few trees were used to compute any reliable oob estimates.
  warn("Some inputs do not have OOB scores. "
/opt/conda/lib/python3.7/site-packages/sklearn/ensemble/forest.py:737: UserWarning: Some inputs do not have OOB scores. This probably means too few trees were used to compute any reliable oob estimates.
  warn("Some inputs do not have OOB scores. "
/opt/conda/lib/python3.7/site-packages/sklearn/ensemble/forest.py:737: UserWarning: Some inputs do not have OOB scores. This probably means too few trees were used to compute any reliable oob estimates.
  warn("Some inputs do not have OOB scores. "
/opt/conda/lib/python3.7/site-packages/sklearn/ensemble/forest.py:737: UserWarning: Some inputs do not have OOB scores. This probably means too few trees were used to compute any reliable oob estimates.
  warn("Some inputs do not have OOB scores. "
/opt/conda/lib/python3.7/site-packages/sklearn/ensemble/forest.py:737: UserWarning: Some inputs do not have OOB scores. This probably means too few trees were used to compute any reliable oob estimates.
  warn("Some inputs do not have OOB scores. "
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-282-80f6bbb31b23> in <module>
     14     rf.fit(X_train_v1, y_train_v1)
     15 
---> 16 train_results.update(mean_squared_error(y_train_v1, rf.oob_prediction_(X_train_v1)))
     17 test_results.update(mean_squared_error(y_test_v1, rf.oob_prediction_(X_test_v1)))
     18 

TypeError: 'numpy.ndarray' object is not callable

【问题讨论】:

  • 如果你发布错误的回溯会容易得多,但我怀疑rf.oob_prediction_是一个数组而不是一个函数,所以你不能调用它。
  • 添加回溯

标签: python numpy matplotlib random-forest mse


【解决方案1】:

看看hereoob_prediction_ 是一个数组,其中包含您的训练集上的 oob 预测。

因此,您的代码应该更像:

train_oob_mse = mean_squared_error(y_train_v1, rf.oob_prediction_)

从某种意义上说,所有测试样本都是“袋装”,但这样称呼它并不常见。这只是测试错误。你必须预测才能计算它:

test_mse = mean_squared_error(y_test_v1, rf.predict(X_test_v1))

话虽如此,您的代码只保留最后一次训练的 rf,因此,您的 *_results 将只包含一个值,但我认为这只是复制/粘贴的错误。此外,警告"Some inputs do not have OOB scores. " 表明您计算 oob 错误的方式不正确,因为会有一些样本实际上没有预测。

【讨论】:

  • 好的,感谢您的内容。你知道我可以得到完整的射频,而不仅仅是最后一个吗?我认为这段代码也行不通。我收到此 Traceback 警告(“某些输入没有 OOB 分数。” /opt/conda/lib/python3.7/site-packages/sklearn/ensemble/forest.py:737:用户警告:某些输入没有 OOB 分数. 这可能意味着用于计算任何可靠的 oob 估计的树太少。warn("某些输入没有 OOB 分数。"
  • 一种天真的方法是计算循环内的错误(即对于每个经过试验的森林)。为避免 OOB 错误问题,您可以考虑计算 training 错误而不是 OOB。
  • 您认为这可能是复制/粘贴错误。原始代码是 train_results.append(mean_squared_error(y_train, rf.predict(X_train))) test_results.append(mean_squared_error(y_test, rf.predict(X_test))) 来预测训练和测试错误。这对你有意义吗?
  • 我的意思是当你复制/粘贴时你“丢失”了这两行的缩进。它们应该在循环中。
  • 啊,我想我现在理解你了。两行下有rf.fit(X_train, X_test) 语句?
猜你喜欢
  • 2019-09-04
  • 2021-06-11
  • 1970-01-01
  • 1970-01-01
  • 2019-04-25
  • 2021-12-23
  • 1970-01-01
  • 1970-01-01
  • 2021-02-23
相关资源
最近更新 更多