1、Deep Reinforcement Learning:AI=RL+DL
2、对比Supervised v.s Reinforcement
Supervised:向老师学习;
Reinforcement:从经验学习,最大化期望;(eg:Alpha GO(机器互下):supervised+reinforcement)
3、应用方面:
Space invader
Play yourself: http://www.2600online.com/spaceinvaders.htm
• How about machine: https://gym.openai.com/evaluations/eval_Eduozx4HRyqgTCVk9ltw
4、Reinforcement Learning的难点
- Reward delay
- Agent’s actions affect the subsequent data it receives
5、Asynchronous Advantage Actor-Critic(A3C)
引言:基于policy和基于value(eg:Alpha GO:policy-based+value-based+model-based)
6、Policy-based Approach:learning an actor
----------------------Step1---------------------
------------------------------------Step2-------------------------------------------
---------------------------------------step3---------------------------------------
|
Add a Baseline:
|
7、Valued-based Approach:learning a critic
|
|
8、Actor-Critic
|
|
A3c Deepmind Demo1:https://www.youtube.com/watch?v=nMR5mjCFZCw
A3c Deepmind Demo2:https://www.youtube.com/watch?v=0xo1Ldx3L5Q