cs294-RL introduction

强化学习的种类

cs294-RL introduction

model-based RL

cs294-RL introduction

值函数

cs294-RL introduction

policy gradient

cs294-RL introduction

actor-critic： value function plus policy gradients cs294-RL introduction

为什么要有那么多的RL算法？

协调因素：采样高效、稳定
不同假设：随机或确定、连续or离散、episode or infinite horizon
难度不同：策略展示简单还是模型展示简单

cs294-RL introduction

采样高效、on-policy or off-policy

cs294-RL introduction

算法的采样比较：

cs294-RL introduction

具体算法：

cs294-RL introduction

相关文章：

猜你喜欢

相关资源

相似解决方案

热门标签

Java Python linux javascript Mysql C# Docker 算法前端 SpringBoot Redis Vue spring 设计模式 .net core .net kubernetes c++ 数据库数据结构大数据 js 机器学习微服务 Android Go 程序员面试 JVM ASP.net core 云原生人工智能后端 PHP git CSS golang k8s Nginx Django mybatis 深度学习多线程 React 架构 devops 爬虫云计算 Spring Boot LeetCode