Reinforcement Learning(四):Actor-Critic Methods

主要思想:

Reinforcement Learning(四):Actor-Critic Methods


Policy Network (Actor)

Reinforcement Learning(四):Actor-Critic Methods

Value Network (Critic):

Reinforcement Learning(四):Actor-Critic Methods

形象对比:

Reinforcement Learning(四):Actor-Critic Methods


Train the Neural Networks

Reinforcement Learning(四):Actor-Critic Methods

具体步骤:

Reinforcement Learning(四):Actor-Critic Methods

Update value network q using TD

Reinforcement Learning(四):Actor-Critic Methods

Update policy network Π using policy gradient

Reinforcement Learning(四):Actor-Critic Methods


Actor-Critic Method

Reinforcement Learning(四):Actor-Critic MethodsReinforcement Learning(四):Actor-Critic Methods

Reinforcement Learning(四):Actor-Critic MethodsReinforcement Learning(四):Actor-Critic Methods

Summary of Algorithm

Reinforcement Learning(四):Actor-Critic Methods

Reinforcement Learning(四):Actor-Critic Methods


Summary

Policy Network and Value Network

Reinforcement Learning(四):Actor-Critic Methods

Reinforcement Learning(四):Actor-Critic Methods

Training

Reinforcement Learning(四):Actor-Critic Methods

相关文章: