强化学习无痛上手笔记第1课

文章目录

state and action
transition function
reward function
Markov Decision Process
policy
强化学习的基本流程
Optimality Criteria and Discounting
Value Functions and Bellman Equations
greedy policy

书名: Reinforcement Learning State-of-the-Art
笔者简介：一名学生

state and action

强化学习无痛上手笔记第1课

transition function

强化学习无痛上手笔记第1课

reward function

强化学习无痛上手笔记第1课

Markov Decision Process

强化学习无痛上手笔记第1课

policy

强化学习无痛上手笔记第1课

强化学习的基本流程

强化学习无痛上手笔记第1课

Optimality Criteria and Discounting

Before we can talk about algorithms for computing optimal policies, we have to deﬁne what that means. That is, we have to deﬁne what the model of optimality is.
强化学习无痛上手笔记第1课

Value Functions and Bellman Equations

A value function represents an estimate how good it is for the agent to be in a certain state (or how good it is to perform a certain action in that state). The notion of how good is expressed in terms of an optimality criterion, i.e. in terms of the expected return.
强化学习无痛上手笔记第1课

greedy policy

强化学习无痛上手笔记第1课