【问题标题】：gridsearchcv for random forest on sentiment analysis datasetgridsearchcv 用于情绪分析数据集上的随机森林
【发布时间】：2018-09-03 18:07:06
【问题描述】：

我正在调整随机森林以获得不同的结果。我使用 gridsearchcv 为 svm 得到了不同的结果，但在为随机森林获取相同类型的结果时遇到了问题。当我处理模型时出现以下错误。

> # Tuning hyper-parameters for precision
> 
> --------------------------------------------------------------------------- AttributeError                            Traceback (most recent call
> last) <ipython-input-26-2d3979d9cbc5> in <module>()
>      24 
>      25     clf = GridSearchCV(clf, tuned_parameters, cv=10,
> ---> 26                        scoring='%s_macro' % score)
>      27     clf.fit(X_train, Y_train)
>      28 
> 
> /usr/local/lib/python3.6/dist-packages/sklearn/model_selection/_search.py
> in __init__(self, estimator, param_grid, scoring, fit_params, n_jobs,
> iid, refit, cv, verbose, pre_dispatch, error_score,
> return_train_score)    1075            
> return_train_score=return_train_score)    1076         self.param_grid
> = param_grid
> -> 1077         _check_param_grid(param_grid)    1078     1079     def _get_param_iterator(self):
> 
> /usr/local/lib/python3.6/dist-packages/sklearn/model_selection/_search.py
> in _check_param_grid(param_grid)
>     346 
>     347     for p in param_grid:
> --> 348         for name, v in p.items():
>     349             if isinstance(v, np.ndarray) and v.ndim > 1:
>     350                 raise ValueError("Parameter array should be one-dimensional.")
> 
> AttributeError: 'set' object has no attribute 'items'

我处理了以下代码来设置参数。请在我对情绪分析数据集运行此过程时解决我的问题。数据集为 csv 格式。

#To Create a Validation Dataset
# Split-out validation dataset
X = df.ix[:,1:18] #training define
Y = df.ix[:,0]  #class define
validation_size = 0.20
#seed = 7
X_train, X_test, Y_train, Y_test = cross_validation.train_test_split(X, Y, test_size=validation_size, random_state=0)
# Test options and evaluation metric
num_folds = 10
num_instances = len(X_train)
scoring = 'accuracy'

通过交叉验证设置参数

tuned_parameters = [{RandomForestClassifier(bootstrap=True, class_weight=None, criterion='gini',
            max_depth=2, max_features='auto', max_leaf_nodes=None,
            min_impurity_decrease=0.0, min_impurity_split=None,
            min_samples_leaf=1, min_samples_split=2,
            min_weight_fraction_leaf=0.0, n_estimators=10, n_jobs=1,
            oob_score=False, random_state=0, verbose=0, warm_start=False)}]
X, Y = make_classification(n_samples=1000, n_features=4,
                           n_informative=2, n_redundant=0,
                           random_state=0, shuffle=False)
clf = RandomForestClassifier(max_depth=2, random_state=0)
clf.fit(X, Y)

scores = ['precision', 'recall']

for score in scores:
    print("# Tuning hyper-parameters for %s" % score)
    print()

    clf = GridSearchCV(clf, tuned_parameters, cv=10,
                       scoring='%s_macro' % score)
    clf.fit(X_train, Y_train)

    print("Best parameters set found on development set:")
    print()
    print(clf.best_params_)
    print()
    print("Grid scores on development set:")
    print()
    means = clf.cv_results_['mean_test_score']
    stds = clf.cv_results_['std_test_score']
    for mean, std, params in zip(means, stds, clf.cv_results_['params']):
        print("%0.3f (+/-%0.03f) for %r"
              % (mean, std * 2, params))
    print()

    print("Detailed classification report:")
    print()
  #  print("The model is trained on the full development set.")
  #  print("The scores are computed on the full evaluation set.")
    print()
    y_true, y_pred = Y_test ,  clf.predict(X_test)
    print(classification_report(y_true, y_pred))
    print()

【问题讨论】：

您使用的 gridSearch 调整参数错误。请参阅examples here 了解如何使用它们。

标签： python-3.x random-forest grid-search

【解决方案1】：

您是否尝试在代码的一开始就设置随机种子？ RF使用随机种子，每次都会有一些差异。

np.random.seed(0)

我猜想在上面添加代码会使您的代码可重现。

【讨论】：