DQN 的迁移学习答案

【问题标题】：Transfer learning for DQNDQN 的迁移学习
【发布时间】：2019-12-17 04:18:36
【问题描述】：

我研究如何在深度强化学习中使用迁移学习。

我想通过迁移学习在我的项目中使用预训练模型（h5f.文件）。我有图像输入和标量输入。该图像是卷积神经网络 (CNN) 的输入。

我还尝试从预训练模型中加载权重，并尝试确定哪些层可以训练。

dqn.load_weights('checkpoint_reward_176.h5f')

for i in range(4):
    model.layers[1].trainable = False

for i in range(4,8):
    model.layers[i].trainable = True

总而言之，如何将层转移到未训练的层。在这种情况下是否可以使用迁移学习？

感谢所有回答，非常感谢。

这是 DQN 代码。

import tensorflow as tf
from keras.backend.tensorflow_backend import set_session   
env = gym.make(args.env_name)
np.random.seed(123)
env.seed(123)
nb_actions = env.action_space.n

img_shape = env.simage.shape
vel_shape = env.svelocity.shape
dst_shape = env.sdistance.shape
geo_shape = env.sgeofence.shape


AE_shape = env.sAE.shape

img_kshape = (1,) + img_shape

#Sequential model for convolutional layers applied to image
image_model = Sequential()
image_model.add(Conv2D(32, (4, 4), strides=(4, 4) ,activation='relu', input_shape=img_kshape, data_format = "channels_first"))
image_model.add(Conv2D(64, (3, 3), strides=(2, 2),  activation='relu'))
image_model.add(Flatten())
print(image_model.summary())

#Input and output of the Sequential model
image_input = Input(img_kshape)
encoded_image = image_model(image_input)

#Inputs and reshaped tensors for concatenate after with the image
velocity_input = Input((1,) + vel_shape)
distance_input = Input((1,) + dst_shape)
geofence_input = Input((1,) + geo_shape)
vel = Reshape(vel_shape)(velocity_input)
dst = Reshape(dst_shape)(distance_input)
geo = Reshape(geo_shape)(geofence_input)


AE_input  = Input((1,) + AE_shape)
ae=Reshape(AE_shape)(AE_input)#Concatenation of image, position, distance and geofence values.
#3 dense layers of 256 units
denses = concatenate([encoded_image, vel, dst, geo, ae])
denses = Dense(256, activation='relu')(denses)
denses = Dense(256, activation='relu')(denses)
denses = Dense(256, activation='relu')(denses)

#Last dense layer with nb_actions for the output
predictions = Dense(nb_actions, kernel_initializer='zeros', activation='linear')(denses)

model = Model(
        inputs=[image_input, velocity_input, distance_input, geofence_input, AE_input],
        outputs=predictions
        )
print(model.summary())


train = True

memory = SequentialMemory(limit=100000, window_length=1)                        

processor = MultiInputProcessor(nb_inputs=5)

policy = LinearAnnealedPolicy(EpsGreedyQPolicy(), attr='eps', value_max=1., value_min=.1, value_test=0.0,
                              nb_steps=50000)

dqn = DQNAgent(model=model, processor=processor, nb_actions=nb_actions, memory=memory, nb_steps_warmup=50, 
               enable_double_dqn=True, 
               enable_dueling_network=False, dueling_type='avg', 
               target_model_update=1e-2, policy=policy, gamma=.99)

dqn.compile(Adam(lr=0.00025), metrics=['mae'])'

DQN 代码更新：

# Obtaining shapes from Gym environment
img_shape = env.simage.shape
vel_shape = env.svelocity.shape
dst_shape = env.sdistance.shape
geo_shape = env.sgeofence.shape

AE_shape = env.sAE.shape

# Keras-rl interprets an extra dimension at axis=0
# added on to our observations, so we need to take it into account
img_kshape = (1,) + img_shape

input_layer = Input(shape=img_kshape)
conv1 = Conv2D(32, (4, 4), strides=(4, 4), activation='relu', input_shape=img_kshape, name='conv1',
               data_format="channels_first")(input_layer)
conv2 = Conv2D(64, (3, 3), strides=(2, 2), activation='relu', name='conv2')(conv1)
flat1 = Flatten(name='flat1')(conv2)

auxiliary_input1 = Input(vel_shape, name='vel')
auxiliary_input2 = Input(dst_shape, name='dst')
auxiliary_input3 = Input(geo_shape, name='geo')
auxiliary_input4 = Input(AE_shape, name='ae')

denses = concatenate([flat1, auxiliary_input1, auxiliary_input2, auxiliary_input3, auxiliary_input4])
denses = Dense(256, activation='relu')(denses)
denses = Dense(256, activation='relu')(denses)
denses = Dense(256, activation='relu')(denses)

predictions = Dense(nb_actions, kernel_initializer='zeros', activation='linear')(denses)

model = Model(inputs=[input_layer, auxiliary_input1, auxiliary_input2, auxiliary_input3, auxiliary_input4],
              outputs=predictions)

print(model.summary())

Model summary

【问题讨论】：

您能否详细说明一下，“总而言之，如何将层转移到未训练的层。”？
首先，感谢您的评论。例如，我想使用图像输入层，它是来自预训练模型的 CNN 的输入。
您是否想将不同（相同形状）的预训练 CNN 模型的权重导入您的不可训练层？您的可训练层的形状与预训练的 CNN 模型的形状不同？
是的，类似的。形状和其他所有参数都相同。我只是尝试将预训练模型的权重用于某些层。

标签： tensorflow keras deep-learning reinforcement-learning transfer-learning

【解决方案1】：

我认为您应该使用 keras 函数式 API 来构建神经网络并将这两个部分连接起来。因此，而不是代码中的以下部分，

#Sequential model for convolutional layers applied to image
image_model = Sequential()
image_model.add(Conv2D(32, (4, 4), strides=(4, 4) ,activation='relu', input_shape=img_kshape, data_format = "channels_first"))
image_model.add(Conv2D(64, (3, 3), strides=(2, 2),  activation='relu'))
image_model.add(Flatten())

使用以下使用 keras 功能 API 的 sn-p。

input_layer = Input(shape=img_kshape)
conv1 = Conv2D(32, (4, 4), strides=(4, 4) ,activation='relu', input_shape=img_kshape, name='conv1', data_format = "channels_first")(input_layer)
conv2 = Conv2D(64, (3, 3), strides=(2, 2),  activation='relu', name='conv2')(conv1)
flat1 = Flatten(name='flat1')(conv2)

然后你可以定义一个辅助输入层来输入所有vel、dst、geo张量（使用适当的形状——为了方便，我给了5个）。最后，连接这些层并构建模型（所以使用下面的 sn-p 而不是你的 '#3 dense layers of 256 units' sn-p）。

auxiliary_input = Input(shape=(5,), name='aux_input')
denses1 = concatenate([flat1, auxiliary_input])
denses2 = Dense(256, activation='relu')(denses1)
denses3 = Dense(256, activation='relu')(denses2)
denses4 = Dense(256, activation='relu')(denses3)

model = Model(inputs=[input_layer,auxiliary_input], outputs=denses4)

print (model.summary()) 会屈服，

__________________
Layer (type)                    Output Shape         Param #     Connected to                     
==================================================================================================
input_1 (InputLayer)            (None, 1, 96, 96)    0                                            
__________________________________________________________________________________________________
conv1 (Conv2D)                  (None, 32, 24, 24)   544         input_1[0][0]                    
__________________________________________________________________________________________________
conv2 (Conv2D)                  (None, 15, 11, 64)   13888       conv1[0][0]                      
__________________________________________________________________________________________________
flat1 (Flatten)                 (None, 10560)        0           conv2[0][0]                      
__________________________________________________________________________________________________
aux_input (InputLayer)          (None, 5)            0                                            
__________________________________________________________________________________________________
concatenate_1 (Concatenate)     (None, 10565)        0           flat1[0][0]                      
                                                                 aux_input[0][0]                  
__________________________________________________________________________________________________
dense_1 (Dense)                 (None, 256)          2704896     concatenate_1[0][0]              
__________________________________________________________________________________________________
dense_2 (Dense)                 (None, 256)          65792       dense_1[0][0]                    
__________________________________________________________________________________________________
dense_3 (Dense)                 (None, 256)          65792       dense_2[0][0]                    
==================================================================================================
Total params: 2,850,912
Trainable params: 2,850,912
Non-trainable params: 0
__________________________________________________________________________________________________

训练完成后，您现在可以像在原始帖子中那样冻结一些层，然后将这些层导入到不可训练的层，如下所示。

conv1_weights = model.get_layer('conv1').get_weights()

如果 conv1 不可训练，则按如下方式分配加载的权重。

conv1.set_weights(conv1_weights)

我已经在没有minimum reproducible example 的情况下解决了您的问题，因此请让我知道任何错误。

【讨论】：

非常感谢您的详细回答。我会尝试一下，我会尽快分享结果。
我和你一样改了代码，得到了类似的模型总结。我有一个与 dqn.fit 相关的整形错误。我会分享细节。
我已经更新了问题并添加了模型摘要。开始训练后的错误出现在 dqn.fit: ValueError: Error when checks input: expected vel to have 2 dimensions, but got array with shape (1, 1, 2)
ValueError: 检查输入时出错：预期 vel 有 2 个维度，但得到的数组形状为 (1, 1, 2) 此错误发生在 dqn.fit 中。之前，我正在重塑输入，现在我根据您的建议删除了重塑以创建新模型。我认为这可能是由环境引起的，因为我从健身房环境中接收状态
您能否从您的环境中输出正确的vel,dst,geo,AE，以便摆脱那些Reshape 层。把完整代码贴出来，有时间我看看。