保存经过训练以减少尺寸的自动编码器答案

【问题标题】：Saving an Auto-encoder trained to reduce dimensions保存经过训练以减少尺寸的自动编码器
【发布时间】：2018-10-11 09:12:11
【问题描述】：

我制作了一个用于降维的自动编码器，我想保存它以用于减少测试数据集。这是我的代码

dom_state = seed(123)
print('Rescaling Data')
y = minmax_scale(X, axis=0)
ncol = y.shape[1] #here ncol = 19
print('Encoding Dimensions')
encoding_dim = 3
input_dim = Input(shape = (ncol,))

with tf.Session(config=tf.ConfigProto(intra_op_parallelism_threads=24)) as sess:
    K.set_session(sess)
    print('Initiating Encoder Layer')
    encoded1 = Dense(20, activation = 'relu')(input_dim)
    encoded2 = Dense(10, activation = 'relu')(encoded1)
    encoded3 = Dense(5, activation = 'relu')(encoded2)
    encoded4 = Dense(encoding_dim, activation = 'relu')(encoded3)
    print('Initiating Decoder Layer')
    decoded1 = Dense(5, activation = 'relu')(encoded4)
    decoded2 = Dense(10, activation = 'relu')(decoded1)
    decoded3 = Dense(20, activation = 'relu')(decoded2)
    decoded4 = Dense(ncol, activation = 'sigmoid')(decoded3)

    print('Combine Encoder and Decoder layers')
    autoencoder = Model(input = input_dim, output = decoded4)
    print('Compiling Mode')
    autoencoder.compile(optimizer = 'Nadam', loss ='mse')
    autoencoder.fit(y, y, nb_epoch = 300, batch_size = 20, shuffle = True)
    encoder = Model(input = input_dim, output = decoded4)
    encoder.save('reduction_param.h5')

    print('Initiating Dimension Reduction')
    model = load_model('reduction_param.h5')
    encoded_input = Input(shape = (encoding_dim, ))
    encoded_out = model.predict(y)

但是，即使我限制了尺寸，在 model.predict(y) 部分，我仍然得到完整的 19 列而不是 3 列。此外，我也收到错误：

UserWarning: No training configuration found in save file: the model was *not* compiled. Compile it manually.
  warnings.warn('No training configuration found in save file:

我理解，因为 encoder.save('reduction_param.h5') 实际上没有使用优化器编译。我错过了什么吗？

编辑：

我不知道这是否是解决问题的正确方法，基本上我将 MinMAXScaler() 训练到训练数据集，将特征保存为泡菜，然后在维护自动编码器的同时重新使用它，根据代码：

dom_state = seed(123)
print('Rescaling Data')
feature_space= MinMaxScaler()
feature_pkl = feature_space.fit(X)
filename = 'lc_feature_space.sav'
pickle.dump(feature_pkl, open(filename, 'wb'))
loaded_model = pickle.load(open(filename, 'rb'))
y = loaded_model.transform(X)
ncol = y.shape[1]
print(ncol)
print('Encoding Dimensions')
encoding_dim = 3
input_dim = Input(shape = (ncol,))

with tf.Session(config=tf.ConfigProto(intra_op_parallelism_threads=24)) as sess:
    K.set_session(sess)
    print('Initiating Encoder Layer')
    encoded1 = Dense(20, activation = 'relu')(input_dim)
    encoded2 = Dense(10, activation = 'relu')(encoded1)
    encoded3 = Dense(5, activation = 'relu')(encoded2)
    encoded4 = Dense(encoding_dim, activation = 'relu')(encoded3)
    print('Initiating Decoder Layer')
    decoded1 = Dense(5, activation = 'relu')(encoded4)
    decoded2 = Dense(10, activation = 'relu')(decoded1)
    decoded3 = Dense(20, activation = 'relu')(decoded2)
    decoded4 = Dense(ncol, activation = 'sigmoid')(decoded3)

    print('Combine Encoder and Deocoder layers')
    autoencoder = Model(input = input_dim, output = decoded4)
    print('Compiling Mode')
    autoencoder.compile(optimizer = 'Nadam', loss ='mse')
    autoencoder.fit(y, y, nb_epoch = 300, batch_size = 20, shuffle = True)

    print('Initiating Dimension Reduction')
    encoder = Model(input = input_dim, output = decoded4)
    encoded_input = Input(shape = (encoding_dim, ))
    encoded_out = encoder.predict(y)
    result = encoded_out[0:2]

我的论点是将训练数据集的特征保存在 MINMAXScaler() 级别，根据这些特征转换测试数据集，然后使用自动编码器进行归约。我仍然不知道这是否正确。

【问题讨论】：

标签： python-3.x variables keras autoencoder dimensionality-reduction

【解决方案1】：

我认为您没有看到 encoder 正常工作的原因，即减少输入张量的维度，是因为您定义并保存了错误的模型。你应该使用

encoder = Model(input = input_dim, output = encoded4 )

其输出节点是encoded4 而不是decoded4。

【讨论】：