如何序列化 Keras 模型以与 Joblib 一起使用？答案

【问题标题】：How to serialize Keras models to use with Joblib?如何序列化 Keras 模型以与 Joblib 一起使用？
【发布时间】：2017-12-07 20:09:02
【问题描述】：

我正在尝试结合 Keras 和 Joblib 以生成多个简单模型并将它们存储在一个数组中，以便我可以在验证阶段之后投影探针样本。

我有一个 Bootstrap Aggregating (Bagging) 方法的实现，其中包含几个使用 Joblib 的简单二元神经网络模型。但是，我在尝试预测时遇到了以下错误：

Traceback (most recent call last):
File "../HFCN_openset_load.py", line 264, in <module>
main()
File "../HFCN_openset_load.py", line 107, in main
pr, roc = fcnhface(args, parallel_pool)
File "../HFCN_openset_load.py", line 194, in fcnhface
pred = models[k][0].predict(feature_vector.reshape(1, feature_vector.shape[0]))
File "/usr/local/lib/python2.7/dist-packages/keras/models.py", line 1004, in predict
if not self.built:
File "/usr/local/lib/python2.7/dist-packages/keras/engine/topology.py", line 339, in built
return self._built
AttributeError: 'Sequential' object has no attribute '_built'

您会在下面找到我认为可能出现错误的部分代码：

def getModel(input_shape,nclasses=2):
    make_keras_picklable()
    model = Sequential()
    model.add(Dense(64, activation='relu', input_shape=input_shape))
    model.add(Dropout(0.2))
    model.add(Dense(nclasses, activation='softmax'))
    model.compile(loss='categorical_crossentropy', optimizer='adadelta', metrics=['accuracy'])#RMSprop()
    return model

def learn_fc_model(X, Y, split):
    boolean_label = [(split[key]+1)/2 for key in Y]
    y_train = np_utils.to_categorical(boolean_label, 2)
    model = getModel(input_shape=X[0].shape)
    model.fit(X, y_train, batch_size=40, nb_epoch=100, verbose=0)
    return (model, split)

#Training using Joblib, models is a list of tuples (ANN models, any variable)
with Parallel(n_jobs=4, verbose=15, backend='multiprocessing') as parallel_pool:
    models = parallel_pool(
        delayed(learn_fc_model) (numpy_x, numpy_y, split) for split in numpy_s
    )

#Testing
for k in range (0, len(models)):
    pred = models[k][0].predict(feature_vector.reshape(1, feature_vector.shape[0]))

完整文件的链接是正确的here

【问题讨论】：

标签： python serialization keras pickle joblib

【解决方案1】：

下面是一个使用 Joblib 并行估计多个 keras 模型的简单方法

定义基本参数：

n_jobs: 多少工作
n_estimators: 多少模型适合
```
n_jobs, n_estimators = 4, 20
```

生成虚拟数据：

n_class = 2
X = np.random.uniform(0,1, (100,10))
y = np.random.randint(0,n_class, 100)

空模型结构定义的实用函数：

def get_model(input_shape):
    m = Sequential([Dense(n_class, input_shape=input_shape,
                          activation='softmax')])
    m.compile(loss='sparse_categorical_crossentropy', optimizer='adam')
    return m

多模型拟合的实用函数（必须返回拟合权重列表）：

def fit_models(n_estimators, x, y):
    
    weights = []
    for _ in range(n_estimators):
        m = get_model(input_shape=(10,))
        m.fit(x, y)
        weights.append(m.get_weights())
    
    return weights

在作业之间划分估计器的实用函数

from joblib import Parallel, delayed, effective_n_jobs

def _partition_estimators(n_estimators, n_jobs):

    # Compute the number of jobs
    n_jobs = min(effective_n_jobs(n_jobs), n_estimators)

    # Partition estimators between jobs
    n_estimators_per_job = np.full(n_jobs, n_estimators // n_jobs,
                                   dtype=int)
    n_estimators_per_job[:n_estimators % n_jobs] += 1

    return n_jobs, n_estimators_per_job.tolist()

并行运行作业：

n_jobs, n_estimators = _partition_estimators(n_estimators, n_jobs)

res = Parallel(n_jobs=n_jobs, verbose=1)(
    delayed(fit_models)(
        n_estimators = n_estimators[i],
        x = X,
        y = y
    ) 
    for i in range(n_jobs))

all_weights = list(itertools.chain.from_iterable(res)) # get all fitted weights in a list
all_models = [get_model((10,)) for _ in all_weights] # get empty models in a list
# put fitted weights into empty model structures
for w,m in zip(all_weights, all_models):
    m.set_weights(w)

here 带有完整示例的正在运行的笔记本

【讨论】：