【发布时间】:2021-08-09 19:28:37
【问题描述】:
我正在运行功能选择,并且一直在使用 RFECV 来查找最佳功能数量。 但是,我想保留某些功能...所以,我想知道是否有任何方法可以强制算法保留这些选定的功能,并在剩余的功能上运行 RFECV。
到目前为止,我在所有功能上运行它,使用:
def main():
df_data = pd.read_csv(csv_file_path, index_col=0)
X_train, y_train, X_test, y_test = split_data(df_data)
feats_selection(X_train, y_train, X_test, y_test)
def feats_selection(X_train, y_train, X_test, y_test):
nr_splits = 10
nr_repeats = 1
features_step = 1
est = DecisionTreeRegressor()
cv_mode = RepeatedKFold(n_splits=nr_splits, n_repeats=nr_repeats, random_state=1)
rfecv = RFECV(estimator=est, step=features_step, cv=cv_mode, scoring='neg_mean_squared_error', verbose=0)
## >>> here, the RFECV algorithm is automatically selecting the optimal features <<<
X_train_transformed = rfecv.fit_transform(X_train, y_train)
X_test_transformed = rfecv.transform(X_test)
## test on test subset
est.fit(X_train_transformed, y_train)
y_pred = est.predict(X_test_transformed)
rmse = mean_squared_error(y_test, y_pred, squared=False)
【问题讨论】:
标签: python scikit-learn feature-selection rfe