【问题标题】:Force RFECV to keep some features强制 RFECV 保留一些特征
【发布时间】:2021-08-09 19:28:37
【问题描述】:

我正在运行功能选择,并且一直在使用 RFECV 来查找最佳功能数量。 但是,我想保留某些功能...所以,我想知道是否有任何方法可以强制算法保留这些选定的功能,并在剩余的功能上运行 RFECV。

到目前为止,我在所有功能上运行它,使用:

def main():

    df_data = pd.read_csv(csv_file_path, index_col=0)
    
    X_train, y_train, X_test, y_test = split_data(df_data)
    feats_selection(X_train, y_train, X_test, y_test)


def feats_selection(X_train, y_train, X_test, y_test):
    nr_splits = 10
    nr_repeats = 1
    features_step = 1
    est = DecisionTreeRegressor()

    cv_mode = RepeatedKFold(n_splits=nr_splits, n_repeats=nr_repeats, random_state=1)
    rfecv = RFECV(estimator=est, step=features_step, cv=cv_mode, scoring='neg_mean_squared_error', verbose=0)

    ## >>> here, the RFECV algorithm is automatically selecting the optimal features <<<
    X_train_transformed = rfecv.fit_transform(X_train, y_train)
    X_test_transformed = rfecv.transform(X_test)


    ## test on test subset
    est.fit(X_train_transformed, y_train)
    y_pred = est.predict(X_test_transformed)
    rmse = mean_squared_error(y_test, y_pred, squared=False)

【问题讨论】:

    标签: python scikit-learn feature-selection rfe


    【解决方案1】:

    RFECV没有这个参数,没有。

    也许最干净的方法是使用ColumnTransformer

    cols_to_always_keep = [...]  # column names if you'll fit on dataframe, column indices otherwise
    col_sel = ColumnTransformer(
        transformers=['keep', "passthrough", cols_to_always_keep)],
        remainder=rfecv,
    )
    

    【讨论】:

    • 谢谢!然后,按照您的建议设置 col_set 对象后,我可以像以前对 rfecv 对象一样使用它吗?我的意思是,我可以运行类似的东西: X_train_transformed = col_set.fit_transform(X_train, y_train) X_test_transformed = col_set.transform(X_test)
    • 是的,应该这样做!
    猜你喜欢
    • 1970-01-01
    • 2017-08-14
    • 1970-01-01
    • 1970-01-01
    • 2017-02-10
    • 2018-05-28
    • 2023-03-29
    • 1970-01-01
    • 2013-10-31
    相关资源
    最近更新 更多