n_estimators 和 max_features 在 RandomForestRegressor 中的含义答案

【问题标题】：What n_estimators and max_features means in RandomForestRegressorn_estimators 和 max_features 在 RandomForestRegressor 中的含义
【发布时间】：2018-02-24 08:45:18
【问题描述】：

我正在阅读有关使用 GridSearchCV 微调模型的信息，我遇到了如下所示的参数网格：

param_grid = [
{'n_estimators': [3, 10, 30], 'max_features': [2, 4, 6, 8]},

{'bootstrap': [False], 'n_estimators': [3, 10], 'max_features': [2, 3, 4]},
]
forest_reg = RandomForestRegressor(random_state=42)
# train across 5 folds, that's a total of (12+6)*5=90 rounds of training 
grid_search = GridSearchCV(forest_reg, param_grid, cv=5,
                       scoring='neg_mean_squared_error')
grid_search.fit(housing_prepared, housing_labels)

这里我没有得到 n_estimator 和 max_feature 的概念。是否像 n_estimator 表示数据中的记录数而 max_features 表示要从数据中选择的属性数？

在走得更远之后，我得到了这个结果：

>> grid_search.best_params_
{'max_feature':8, 'n_estimator':30}

所以问题是我没有得到这个结果实际上想说的......

【问题讨论】：

请阅读文档：RandomForestRegressor 和 user guide

标签： scikit-learn

【解决方案1】：

阅读RandomForest Regressor 的文档后，您可以看到n_estimators 是森林中要使用的树木数量。由于随机森林是一种包含创建多个决策树的集成方法，因此此参数用于控制要在该过程中使用的树的数量。

另一方面，max_features 决定了在寻找拆分时要考虑的最大特征数。有关max_features 的更多信息，请阅读this answer。

【讨论】：

那么谁来决定有多少特征会被认为是一个好的分割呢？我们在谈论什么功能？是数据的属性被视为特征还是数据的数量被视为特征？
@Virtsu 由于您使用的是 GridSearchCV，因此此函数根据分类器在数据集上的执行情况决定 max_features 的最佳值。

【解决方案2】：

n_estimators：这是树的数量（通常是该算法将在其上工作的样本数量，然后它将汇总它们以给出最终答案）您想在进行最大投票或预测平均值之前进行构建。树的数量越多，性能越好，但代码越慢。

max_features：寻找最佳分割时要考虑的特征数量。

>> grid_search.best_params_ :- {'max_feature':8, 'n_estimator':30}

这意味着它们是您应该在 n_estimators{3,10,30} 或 max_features {2, 4, 6, 8} 中运行模型的最佳超参数

【讨论】：