A little bit probability theory

Random Variable

随机变量:未知;它的值取决于随机事件的结果。用大写字母表示随机变量,观察值用小写字母表示,注意,观察值是没有随机性的。

Reinforcement Learning(一):introduction

Probability Density Function (PDF)

PDF提供了随机变量的值与样本相等的相对可能性,比如下图的高斯分布以及离散分布:

Reinforcement Learning(一):introduction

Reinforcement Learning(一):introduction

满足以下性质:

Reinforcement Learning(一):introduction

Expectation

Reinforcement Learning(一):introduction


Terminologies 

以马里奥游戏为例子说明。

state and action

  • agent:智能体
  • state:当前状态
  • action:agent采取的行动

Reinforcement Learning(一):introduction

policy

是概率密度函数,是给定当前状态s下,采取行动a的概率。

Reinforcement Learning(一):introduction

reward

Reinforcement Learning(一):introduction

state transition

状态转移函数通常是未知的,由环境决定。

Reinforcement Learning(一):introduction

 agent environment interaction

Reinforcement Learning(一):introductionReinforcement Learning(一):introduction


 Randomness in Reinforcement Learning

Actions have randomness

Reinforcement Learning(一):introduction

State transitions have randomness

Reinforcement Learning(一):introduction


 Play the game using AI

Reinforcement Learning(一):introduction


 Rewards and Returns

Return

Reinforcement Learning(一):introduction

因此引入折扣回报:

Reinforcement Learning(一):introduction

 Randomness in Returns

Reinforcement Learning(一):introduction


Value Functions

Action-Value FunctionQ(s,a)

Reinforcement Learning(一):introduction

Reinforcement Learning(一):introduction

State-Value Function V(s)

Reinforcement Learning(一):introduction

Reinforcement Learning(一):introduction

Understanding the Value Functions 

Reinforcement Learning(一):introduction


Play games using reinforcement learning

How does AI control the agent?

两种方法:

Reinforcement Learning(一):introduction


OpenAI Gym

gym是一个开发和比较强化学习算法的工具包。https://gym.openai.com/

Reinforcement Learning(一):introduction

 


Summary

Reinforcement Learning(一):introduction

Reinforcement Learning(一):introduction


We are going to study…

Reinforcement Learning(一):introduction

相关文章:

  • 2021-05-27
  • 2021-11-24
  • 2022-12-23
  • 2021-08-17
  • 2021-05-10
  • 2021-11-27
  • 2021-09-02
猜你喜欢
  • 2021-04-12
  • 2021-12-17
  • 2021-05-07
  • 2022-03-01
  • 2021-05-30
  • 2021-09-28
相关资源
相似解决方案