【问题标题】:how can i get_param names for targetencoder? gridsearch我如何获取 targetencoder 的参数名称?网格搜索
【发布时间】:2021-03-25 07:16:57
【问题描述】:

我有以下情况:

preprocess = make_column_transformer(
    (SimpleImputer(strategy='constant',fill_value = 0),numeric_cols),
    (ce.TargetEncoder(),['country'])
    )

pipeline = make_pipeline(preprocess,XGBClassifier())

pipeline[0].get_params().keys()

dict_keys(['n_jobs', 'remainder', 'sparse_threshold', 'transformer_weights', 'transformers', 'verbose', 'simpleimputer', 'targetencoder', 'simpleimputer__add_indicator', 'simpleimputer__copy', 'simpleimputer__fill_value', 'simpleimputer__missing_values', 'simpleimputer__strategy', 'simpleimputer__verbose', 'targetencoder__cols', 'targetencoder__drop_invariant', 'targetencoder__handle_missing', 'targetencoder__handle_unknown', 'targetencoder__min_samples_leaf', 'targetencoder__return_df', 'targetencoder__smoothing', 'targetencoder__verbose'])

然后我希望对平滑因子进行网格搜索:

所以:

param_grid =    { 
                  'xgbclassifier__learning_rate': [0.01,0.005,0.001],
    'targetencoder__smoothing': [1, 10, 30, 50]
                 
                  }

pipeline = make_pipeline(preprocess,XGBClassifier())

# Initialize Grid Search Modelg
clf = GridSearchCV(pipeline,param_grid = param_grid,scoring = 'neg_mean_squared_error',
                                 verbose= 1,iid= True,
                                     refit = True,cv  = 3)
clf.fit(X_train,y_train)

但是我得到了这个错误:

ValueError:估计器管道的参数transformer_targetencoder无效(steps = [('columntransformer', ColumnTransformer(变压器...

如何访问平滑参数?

【问题讨论】:

    标签: python scikit-learn pipeline grid-search encoder


    【解决方案1】:

    使用您的示例,它将是 columntransformer__targetencoder__smoothing 。为了重现管道,首先我使用示例数据集并定义列:

    from sklearn.compose import make_column_transformer
    from sklearn.pipeline import make_pipeline
    from sklearn.impute import SimpleImputer
    import category_encoders as ce
    from xgboost import XGBClassifier
    from sklearn.model_selection import GridSearchCV
    
    X_train = pd.DataFrame({'x1':np.random.normal(0,1,50),
                       'x2':np.random.normal(0,1,50),
                      'country':np.random.choice(['A','B','C'],50)})
    y_train = np.random.binomial(1,0.5,50)
    
    numeric_cols = ['x1','x2']
    
    preprocess = make_column_transformer(
        (SimpleImputer(strategy='constant',fill_value = 0),numeric_cols),
        (ce.TargetEncoder(),['country'])
        )
    
    pipeline = make_pipeline(preprocess,XGBClassifier())
    

    您应该在更高的级别上查看键:

    pipeline.get_params().keys()
    

    然后设置网格,确保平滑是浮点数(见this issue):

    param_grid = { 'columntransformer__targetencoder__smoothing': [1.0, 10.0],
    'xgbclassifier__learning_rate': [0.01,0.001]}
    
    pipeline = make_pipeline(preprocess,XGBClassifier())
    
    clf = GridSearchCV(pipeline,param_grid = param_grid,scoring = 'neg_mean_squared_error', 
    verbose= 1,refit = True,cv  = 3)
    clf.fit(X_train,y_train)
    

    它应该可以工作

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 1970-01-01
      • 2019-12-04
      • 2020-08-19
      • 1970-01-01
      • 2018-09-16
      • 2019-01-15
      • 1970-01-01
      • 2018-01-13
      相关资源
      最近更新 更多