【发布时间】:2021-08-25 14:57:24
【问题描述】:
我希望提取和使用(在同一个 Jupyter 笔记本中)从 RandomizedSearchCV 中确定为最佳模型的模型,以用于未来的拟合和绘图。具体来说,我希望重新拟合被确定为最佳的 Keras 神经网络,以便我可以针对相同或其他数据集绘制损失和准确性。
如果我运行以下代码,我会得到我期望的输出 - 最好的分数和用于获得该分数的参数。
import pandas as pd
from sklearn.preprocessing import LabelEncoder
from sklearn.pipeline import Pipeline
from skrebate import SURF
from sklearn.model_selection import StratifiedKFold
from sklearn.model_selection import GridSearchCV, RandomizedSearchCV
from keras.wrappers.scikit_learn import KerasClassifier
from keras.models import Sequential
from keras.layers import Dense
url = "https://datahub.io/machine-learning/sonar/r/sonar.csv"
dataframe = pd.read_csv(url)
dataset = dataframe.values
X = dataset[:,0:60].astype(float)
y = dataset[:,60]
features, labels = dataset[:,0:60].astype(float), dataset[:,60]
encoder = LabelEncoder()
encoder.fit(y)
encoded_y = encoder.transform(y)
def create_nn_model(input_dims):
# Create model.
model = Sequential()
model.add(Dense(60, input_dim=input_dims, activation='relu'))
model.add(Dense(1, activation='sigmoid'))
# Compile model.
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
return model
param_grid = {'model__epochs': (100,200,300),
'model__batch_size': (10,20,20)}
kfold = StratifiedKFold(n_splits=10, shuffle=True)
for x in range(10, 11, 10): # Iterate through top 10% to 100% of the features.
num_features = int(features.shape[1] * x / 100)
clf = Pipeline([('fs_step', SURF(n_features_to_select=num_features)),
('model', KerasClassifier(build_fn=lambda: create_nn_model(num_features), epochs=100, batch_size=5, verbose=0))])
grid = RandomizedSearchCV(clf, param_grid, n_jobs=-1, cv=kfold, n_iter = 3)
grid_result = grid.fit(features, encoded_y)
print('Best score obtained: {0}'.format(grid_result.best_score_))
print('Parameters:')
for param, value in grid_result.best_params_.items():
print('\t{}: {}'.format(param, value))
我还知道如何按照以下代码通过构建、编译和拟合模型来绘制我正在寻找的数据:
model = Sequential()
model.add(Dense(60, input_dim=60, activation='relu'))
model.add(Dense(1, activation='sigmoid'))
# Compile model
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
model.fit(features, encoded_y, epochs=100, batch_size=25, verbose=0)
losses = pd.DataFrame(model.history.history)
import matplotlib.pyplot as plt
losses.plot()
plt.show()
我可以从 RandomizedSearchCV 中获取最佳模型,根据数据和绘图进行拟合,还是必须基于 best_params_ 构建、编译和拟合?我问的原因是因为我无法识别/访问我理解为grid_result.best_estimator_.model 的最佳模型。尝试这样做会给出:
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
<ipython-input-31-bbfe0b584f46> in <module>
----> 1 grid_result.best_estimator_.model
AttributeError: 'Pipeline' object has no attribute 'model'
任何帮助将不胜感激。谢谢。
【问题讨论】: