1、Deep Reinforcement Learning:AI=RL+DL

2、对比Supervised v.s Reinforcement

Supervised:向老师学习;

Reinforcement:从经验学习,最大化期望;(eg:Alpha GO(机器互下):supervised+reinforcement)

3、应用方面:

[机器学习:李宏毅]28.Deep Reinforcement Learning

Space invader

Play yourself: http://www.2600online.com/spaceinvaders.htm

• How about machine: https://gym.openai.com/evaluations/eval_Eduozx4HRyqgTCVk9ltw


4、Reinforcement Learning的难点

  • Reward delay
  •  Agent’s actions affect the subsequent data it receives 

5、Asynchronous Advantage Actor-Critic(A3C)

引言:基于policy和基于value(eg:Alpha GO:policy-based+value-based+model-based)

[机器学习:李宏毅]28.Deep Reinforcement Learning

[机器学习:李宏毅]28.Deep Reinforcement Learning

6、Policy-based Approach:learning an actor

[机器学习:李宏毅]28.Deep Reinforcement Learning [机器学习:李宏毅]28.Deep Reinforcement Learning

[机器学习:李宏毅]28.Deep Reinforcement Learning

----------------------Step1---------------------


[机器学习:李宏毅]28.Deep Reinforcement Learning



------------------------------------Step2-------------------------------------------

[机器学习:李宏毅]28.Deep Reinforcement Learning    [机器学习:李宏毅]28.Deep Reinforcement Learning

[机器学习:李宏毅]28.Deep Reinforcement Learning

---------------------------------------step3---------------------------------------

[机器学习:李宏毅]28.Deep Reinforcement Learning    [机器学习:李宏毅]28.Deep Reinforcement Learning

[机器学习:李宏毅]28.Deep Reinforcement Learning    [机器学习:李宏毅]28.Deep Reinforcement Learning

[机器学习:李宏毅]28.Deep Reinforcement Learning    [机器学习:李宏毅]28.Deep Reinforcement Learning

Add a Baseline

[机器学习:李宏毅]28.Deep Reinforcement Learning



7、Valued-based Approach:learning a critic

[机器学习:李宏毅]28.Deep Reinforcement Learning [机器学习:李宏毅]28.Deep Reinforcement Learning
[机器学习:李宏毅]28.Deep Reinforcement Learning [机器学习:李宏毅]28.Deep Reinforcement Learning


[机器学习:李宏毅]28.Deep Reinforcement Learning [机器学习:李宏毅]28.Deep Reinforcement Learning

8、Actor-Critic

[机器学习:李宏毅]28.Deep Reinforcement Learning[机器学习:李宏毅]28.Deep Reinforcement Learning[机器学习:李宏毅]28.Deep Reinforcement Learning

A3c Deepmind Demo1:https://www.youtube.com/watch?v=nMR5mjCFZCw

A3c Deepmind Demo2:https://www.youtube.com/watch?v=0xo1Ldx3L5Q


相关文章: