【发布时间】:2018-04-14 05:59:15
【问题描述】:
经过大量阅读和检查不同verbose参数设置下的pipeline.fit()操作后,我仍然很困惑为什么我的管道会多次访问某个步骤的transform方法。
下面是一个简单的示例pipeline、fit 和GridSearchCV,使用 3 折交叉验证,但参数网格只有一组超参数。所以我预计管道中有三个运行。正如预期的那样,step1 和 step2 都调用了三次 fit,但每个步骤都调用了多次 transform。为什么是这样?下面是最小的代码示例和日志输出。
# library imports
import pandas as pd
from sklearn import datasets
from sklearn.model_selection import KFold
from sklearn.linear_model import LogisticRegression
from sklearn.base import TransformerMixin, BaseEstimator
from sklearn.pipeline import Pipeline
# Load toy data
iris = datasets.load_iris()
X = pd.DataFrame(iris.data, columns = iris.feature_names)
y = pd.Series(iris.target, name='y')
# Define a couple trivial pipeline steps
class mult_everything_by(TransformerMixin, BaseEstimator):
def __init__(self, multiplier=2):
self.multiplier = multiplier
def fit(self, X, y=None):
print "Fitting step 1"
return self
def transform(self, X, y=None):
print "Transforming step 1"
return X* self.multiplier
class do_nothing(TransformerMixin, BaseEstimator):
def __init__(self, meaningless_param = 'hello'):
self.meaningless_param=meaningless_param
def fit(self, X, y=None):
print "Fitting step 2"
return self
def transform(self, X, y=None):
print "Transforming step 2"
return X
# Define the steps in our Pipeline
pipeline_steps = [('step1', mult_everything_by()),
('step2', do_nothing()),
('classifier', LogisticRegression()),
]
pipeline = Pipeline(pipeline_steps)
# To keep this example super minimal, this param grid only has one set
# of hyperparams, so we are only fitting one type of model
param_grid = {'step1__multiplier': [2], #,3],
'step2__meaningless_param': ['hello'] #, 'howdy', 'goodbye']
}
# Define model-search process/object
# (fit one model, 3-fits due to 3-fold cross-validation)
cv_model_search = GridSearchCV(pipeline,
param_grid,
cv = KFold(3),
refit=False,
verbose = 0)
# Fit all (1) models defined in our model-search object
cv_model_search.fit(X,y)
输出:
Fitting step 1
Transforming step 1
Fitting step 2
Transforming step 2
Transforming step 1
Transforming step 2
Transforming step 1
Transforming step 2
Fitting step 1
Transforming step 1
Fitting step 2
Transforming step 2
Transforming step 1
Transforming step 2
Transforming step 1
Transforming step 2
Fitting step 1
Transforming step 1
Fitting step 2
Transforming step 2
Transforming step 1
Transforming step 2
Transforming step 1
Transforming step 2
【问题讨论】:
标签: python machine-learning scikit-learn pipeline