Homepage

Warm up

Schedule

 

Date Topics Lecturer Readings Additional Material
Wed Jan 18 Course Introduction Katerina    
Mon Jan 23 Intro to MDPs, POMDPs Katerina Sutton & Barto Ch 3  
Wed Jan 25 Solving known MDPs: Dynamic Programming, Value Iteration, Policy Iteration, Policy Evaluation Katerina Sutton & Barto Ch 4  
Mon Jan 30 Monte Carlo Learning: value function estimation and optimization Russ Sutton & Barto Ch 5  
Wed Feb 1 Temporal Difference Learning: value function estimation and optimization, Q learning, SARSA Russ Sutton & Barto Ch 6  
Mon Feb 6 Planning and Learning(1): Tabular methods, Dyna, Monte Carlo Tree Search Katerina Sutton & Barto Ch 8 A Survey of Monte Carlo Tree Search Methods http://www.cameronius.com/cv/mcts-survey-master.pdf
Wed Feb 8 Value function approximation, Deep Learning, Convnets, backpropagation Russ    
Mon Feb 13 Value function approximation, Deep Learning, Convnets, backpropagation Russ    
Wed Feb 15 Deep Q Learning : Double Q learning, replay memory Russ    
Mon Feb 20 Policy Gradients (1): REINFORCE, Natural Policy gradients,Variance reduction in gradient estimation, Actor-Critic, Deep Actor-Critic, TRPO Russ Sutton & Barto Ch 13  
Wed Feb 22 Policy Gradients (2) Russ    
Mon Feb 27 Policy Gradients (3) Russ    
Wed Mar 1 Closer look at Continuous Actions, Variational Autoencoders, multimodal stochastic policies Russ    
Mon Mar 6 Exploration(1) Katerina   Incentivizing Exploration In Reinforcement Learning With Deep Predictive Models
https://arxiv.org/abs/1507.00814, Variational Information Maximizing Exploration https://arxiv.org/abs/1605.09674, visitation counts, hashing


Wed Mar 8 Imitation learning(1): mimicking experts, behaviour cloning Katerina   An Invitation to Imitation http://www.ri.cmu.edu/publication_view.html?pub_id=7891 Generative adversarial imitation learning
https://arxiv.org/abs/1606.03476
Mon Mar 13 Spring break!      
Wed Mar 15 Spring break!      
Mon Mar 20 Imitation learning(2): Learning reward functions from demonstration, IOC, IRL     A Reduction of Imitation Learning and Structured Prediction
to No-Regret Online Learning http://www.jmlr.org/proceedings/papers/v15/ross11a/ross11a.pdf, Generative adversarial imitation learning https://arxiv.org/abs/1606.03476, Maximum entropy inverse reinforcement learning http://www.aaai.org/Papers/AAAI/2008/AAAI08-227.pdf,Learning to search: Functional gradient techniques for imitation learning http://www.ri.cmu.edu/publication_view.html?pub_id=6410
Wed Mar 22 Intro to optimal control, Differential Dynamic Programming, LQR, iterative-LQR Katerina    
Mon Mar 27 Imitation learning(3): learning from optimal controllers, self trials Katerina   End-to-End Training of Deep Visuomotor Policies https://arxiv.org/pdf/1504.00702.pdf, PLATO: Policy Learning using Adaptive Trajectory Optimization, https://arxiv.org/pdf/1603.00622v3.pdf
Wed Mar 29 Planning and Learning(2): Learning Forward/Backward Models from experience, Planning with learned forward models, simulation to real world adaptation Katerina   SE3-Nets: Learning Rigid Body Motion using Deep Neural Networks
https://arxiv.org/pdf/1606.02378v2.pdf
Mon Apr 3 Planning and Learning(3)      
4 Case studies: Alpha Go, deep math Katerina    
Mon Apr 10 Modular / Hierarchical RL (1): compositionality, temporal abstraction      
Wed Apr 12 Modular / Hierarchical RL (2): Multi-task learning, curriculum learning Russ    
Mon Apr 17 Exploration(2):Learning and exploration in 3D environments, Long Term Memory Russ    
Wed Apr 19 Learning Motor Control: inspiration from Psychology   Sutton & Barto Ch 14,15  
Mon Apr 24 Frontiers/Open Problems Katerina    
Wed Apr 26 Project Presentations      
Mon May 1 Project Presentations      
Wed May 3 Project Presentations      

Log

Week 1:

Jan 18 - Introduction

Week 2:

Jan 23 - Intro to MDPs, POMDPs

  • Slide
  • Sutton & Barto Ch 3
    • 3.1, 3.2, 3.3: 1/23/2017;

Jan 25 - Solving known MDPs: Dynamic Programming, Value Iteration, Policy Iteration, Policy Evaluation

  • Slide
  • Sutton & Barto Ch 4
    • 4.1: 1/25/2017;
  • implement Markov Decision Processes in Python
    • AIMA Python file: mdp.py (code taken from Artificial Intelligence: A Modern Approach by Stuart Russell and Peter Norvig)

 

相关文章:

  • 2021-12-12
  • 2022-12-23
  • 2021-07-18
  • 2021-05-03
  • 2021-11-10
  • 2021-07-06
  • 2021-07-16
  • 2021-09-20
猜你喜欢
  • 2021-04-18
  • 2021-08-18
  • 2022-12-23
  • 2022-01-01
  • 2021-12-26
  • 2021-11-27
相关资源
相似解决方案