回顾一下action-value函数:

Reinforcement Learning(二):Value-Based

Value-Based是指:

Reinforcement Learning(二):Value-Based

但是一般来说,这个Q*我们是无从得出的,因此提出使用卷积网络来近似:

Deep Q-Network (DQN)

Approximate the Q Function

Reinforcement Learning(二):Value-Based

Reinforcement Learning(二):Value-Based

Deep Q Network (DQN)

Reinforcement Learning(二):Value-Based

Apply DQN to Play Game

Reinforcement Learning(二):Value-Based


Temporal Difference (TD) Learning

一个小例子

Reinforcement Learning(二):Value-Based

那么存不存在一种方法,不用完成旅行,就可以进行更新呢?

Reinforcement Learning(二):Value-Based

Reinforcement Learning(二):Value-Based

Reinforcement Learning(二):Value-Based

Why does TD learning work?

Reinforcement Learning(二):Value-Based


TD Learning for DQN

Reinforcement Learning(二):Value-Based

没看懂?别急,下面简单推导一下:

Reinforcement Learning(二):Value-Based

Reinforcement Learning(二):Value-Based

Reinforcement Learning(二):Value-Based

Train DQN using TD learning

Reinforcement Learning(二):Value-Based


Summary

Value-Based Reinforcement Learning

Reinforcement Learning(二):Value-Based

Temporal Difference (TD) Learning

Reinforcement Learning(二):Value-Based

 

相关文章: