如何访问通过 RandomizedSearchCV 识别的最佳神经网络模型答案

【问题标题】：How can I access the best neural network model identifed through RandomizedSearchCV如何访问通过 RandomizedSearchCV 识别的最佳神经网络模型
【发布时间】：2021-08-25 14:57:24
【问题描述】：

我希望提取和使用（在同一个 Jupyter 笔记本中）从 RandomizedSearchCV 中确定为最佳模型的模型，以用于未来的拟合和绘图。具体来说，我希望重新拟合被确定为最佳的 Keras 神经网络，以便我可以针对相同或其他数据集绘制损失和准确性。

如果我运行以下代码，我会得到我期望的输出 - 最好的分数和用于获得该分数的参数。

import pandas as pd
from sklearn.preprocessing import LabelEncoder
from sklearn.pipeline import Pipeline
from skrebate import SURF
from sklearn.model_selection import StratifiedKFold
from sklearn.model_selection import GridSearchCV, RandomizedSearchCV
from keras.wrappers.scikit_learn import KerasClassifier
from keras.models import Sequential
from keras.layers import Dense


url = "https://datahub.io/machine-learning/sonar/r/sonar.csv"
dataframe = pd.read_csv(url)

dataset = dataframe.values

X = dataset[:,0:60].astype(float)
y = dataset[:,60]

features, labels = dataset[:,0:60].astype(float), dataset[:,60]

encoder = LabelEncoder()
encoder.fit(y)
encoded_y = encoder.transform(y)



def create_nn_model(input_dims):
    # Create model.
    model = Sequential()
    model.add(Dense(60, input_dim=input_dims, activation='relu'))
    model.add(Dense(1, activation='sigmoid'))
    # Compile model.
    model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
    return model

param_grid = {'model__epochs': (100,200,300),
         'model__batch_size': (10,20,20)}

kfold = StratifiedKFold(n_splits=10, shuffle=True)

for x in range(10, 11, 10): # Iterate through top 10% to 100% of the features.

    num_features = int(features.shape[1] * x / 100)

    clf = Pipeline([('fs_step', SURF(n_features_to_select=num_features)),
                    ('model', KerasClassifier(build_fn=lambda: create_nn_model(num_features), epochs=100, batch_size=5, verbose=0))])

    grid = RandomizedSearchCV(clf, param_grid, n_jobs=-1, cv=kfold, n_iter = 3)
    grid_result = grid.fit(features, encoded_y)

    print('Best score obtained: {0}'.format(grid_result.best_score_))

    print('Parameters:')
    for param, value in grid_result.best_params_.items():
        print('\t{}: {}'.format(param, value))

我还知道如何按照以下代码通过构建、编译和拟合模型来绘制我正在寻找的数据：

model = Sequential()
model.add(Dense(60, input_dim=60, activation='relu'))
model.add(Dense(1, activation='sigmoid'))
# Compile model
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])

model.fit(features, encoded_y, epochs=100, batch_size=25, verbose=0)

losses = pd.DataFrame(model.history.history)
import matplotlib.pyplot as plt

losses.plot()
plt.show()

我可以从 RandomizedSearchCV 中获取最佳模型，根据数据和绘图进行拟合，还是必须基于 best_params_ 构建、编译和拟合？我问的原因是因为我无法识别/访问我理解为grid_result.best_estimator_.model 的最佳模型。尝试这样做会给出：

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-31-bbfe0b584f46> in <module>
----> 1 grid_result.best_estimator_.model

AttributeError: 'Pipeline' object has no attribute 'model'

任何帮助将不胜感激。谢谢。

【问题讨论】：

标签： python keras

【解决方案1】：

grid_result.best_estimator_ 包含 refit 估计器（因为您保留了 refit 参数的默认值），它是您的 clf 的拟合克隆。在您的情况下，这恰好是一个管道对象（有两个步骤）；如果你想访问 keras 模型，你可以像字典一样访问它：

grid_result.best_estimator_['model'] 将是一个合适的KerasClassifier 对象。而那些有 model 属性，其中包含本机 keras 对象：

grid_result.best_estimator_['model'].model

【讨论】：

非常感谢@ben-reiniger，我没有使用字典键。似乎正好给了我我需要的东西。