书名: Reinforcement Learning State-of-the-Art
笔者简介:一名学生

state and action

强化学习无痛上手笔记第1课

transition function

强化学习无痛上手笔记第1课

reward function

强化学习无痛上手笔记第1课

Markov Decision Process

强化学习无痛上手笔记第1课
强化学习无痛上手笔记第1课

policy

强化学习无痛上手笔记第1课

强化学习的基本流程

强化学习无痛上手笔记第1课

Optimality Criteria and Discounting

Before we can talk about algorithms for computing optimal policies, we have to define what that means. That is, we have to define what the model of optimality is.
强化学习无痛上手笔记第1课

Value Functions and Bellman Equations

A value function represents an estimate how good it is for the agent to be in a certain state (or how good it is to perform a certain action in that state). The notion of how good is expressed in terms of an optimality criterion, i.e. in terms of the expected return.
强化学习无痛上手笔记第1课

greedy policy

强化学习无痛上手笔记第1课

相关文章: