DQN(Deep reinforcement learning) Basic


1 DQNs architecture

 【强化学习】DQN(Deep reinforcement learning) Basic

input

84*84*4 image pixels. The input to the neural network consists of an 84*84*4 image produced by the preprocessing map .  

hidden layer

The first hidden layer convolves 32 filters of 8*8 with stride 4 with the input image and applies a rectifier nonlinearity31,32. The second hidden layer convolves 64 filters of 4*4 with stride 2, again followed by a rectifier nonlinearity. This is followed by a third convolutional layer that convolves 64 filters of 3*3 withstride 1 followed by a rectifier. The final hidden layer is fully-connected and consists of 512 rectifier units.

output

The output layer is a fully-connected linear layer with a single output for each valid action. The number of valid actions varied between 4 and 18 on the games we considered.

 

loss function

The loss function(object function) of DQN is

 【强化学习】DQN(Deep reinforcement learning) Basic

in which gamma is the discount factor determining the agent’s horizon, theta are the parameters of the Q-network at iteration i and theta - are the network parameters used to compute the target at iteration i. The target network parameters theta - are only updated with the Q-network parameters theta every C steps and are held fixed between individual updates.

 

 

2 Algorithm

 【强化学习】DQN(Deep reinforcement learning) Basic

3 Conclusion

DQN use DNN to store policy, pi, which is a sequence of mapping state to action.

 

3.2 为什么使用DNN

解决输入的高维问题

 

相关文章:

  • 2021-08-26
  • 2021-11-28
  • 2021-10-16
  • 2021-09-06
  • 2021-12-13
  • 2022-02-01
猜你喜欢
  • 2021-05-16
  • 2021-05-03
  • 2021-11-14
  • 2021-12-08
  • 2021-11-20
  • 2022-12-23
  • 2022-12-23
相关资源
相似解决方案