如何使管道跳过该步骤（使用“直通”）以及应用于 param_grid 中该步骤的所有参数？答案

【问题标题】：How do I make the Pipeline skip the step (using "passthrough") and all params that applies to that step in param_grid?如何使管道跳过该步骤（使用“直通”）以及应用于 param_grid 中该步骤的所有参数？
【发布时间】：2021-07-01 09:24:54
【问题描述】：

我正在使用 PCA 在 sklearn 中创建一个管道，并使用“直通”跳过此步骤。对于 PCA，我正在测试 n_components 参数的几个值。

from sklearn.datasets import make_regression
from sklearn.decomposition import PCA
from sklearn.linear_model import LinearRegression
from sklearn.pipeline import Pipeline
from sklearn.model_selection import GridSearchCV

X_train, y_train = make_regression(n_samples=100, n_features=10)


param_grid = {
    'reduce_dim': [PCA(), 'passthrough'],
    'reduce_dim__n_components': [1,2,3]
}

pipeline = Pipeline(
        steps=[
            ('reduce_dim', None), 
            ('regressor', LinearRegression())
        ]
    )

grid_search = GridSearchCV(
    estimator=pipeline, 
    param_grid=param_grid, 
    verbose=10
)
grid_search.fit(X_train, y_train)

我想要实现的是 3 适合 PCA 与 n_components=[1,2,3] 和 1 适合没有 PCA。

对 4 个候选者中的每一个进行拟合 5 折，总共 20 次拟合

我得到的是 3 个适合 PCA 和 3 个不适合 PCA（我不需要测试没有 PCA 的 n_components 的所有三种可能性）：

对 6 个候选者中的每一个进行拟合 5 折，总共 30 次拟合

然后是运行时错误，基本上说我无法将 n_components 值分配给“passthrough”（str 对象）

[CV 1/5; 4/6] START reduce_dim=passthrough, reduce_dim__n_components=1...
AttributeError: 'str' object has no attribute 'set_params'

如何使管道跳过该步骤（在这种情况下为reduce_dim）以及适用于该步骤的所有参数？

我知道我可以像这样使用 param_grid：

param_grid = [
    {
        'reduce_dim': [PCA()],
        'reduce_dim__n_components': [1,2,3]
    },
    {}
]

但能否以更优雅的方式完成，因为在更复杂的场景中代码变得非常混乱。

【问题讨论】：

标签： python scikit-learn gridsearchcv

【解决方案1】：

您想要的参数网格也可以在单个字典中为单个参数定义：

param_grid = {
    'reduce_dim' = [PCA(n_components=1), PCA(n_components=2), PCA(n_components=3), 'passthrough']
}

这样做的好处是避免需要定义几个可能不那么“混乱”的字典。

【讨论】：