
主要思想:

Policy Network (Actor)

Value Network (Critic):

形象对比:

Train the Neural Networks

具体步骤:

Update value network q using TD

Update policy network Π using policy gradient

Actor-Critic Method




Summary of Algorithm


Summary
Policy Network and Value Network


Training

相关文章: