Reinforcement Learning

Reinforcement Learning

Course Information

Overview

Reinforcement learning (RL) is a powerful learning paradigm through which machines learn to make (sequential) decisions. It has been playing a pivotal role in advancing artificial intelligence, with notable successes including mastering the game of Go and enhancing large language models.

This course focuses on the design principles of RL algorithms. Similar to statistical learning, a central challenge in RL is to generalize learned capabilities to unseen environments. However, RL also faces additional challenges such as exploration-exploitation tradeoff, credit assignment, and distribution mismatch between behavior and target policies. Throughout the course, we will delve into various solutions to these challenges and provide theoretical justifications.

Prerequisites

This course is mathematically demanding. Students are expected to have strong foundations in probability, linear algebra, and calculus. A basic understanding of machine learning and convex optimization will be beneficial. Proficiency in python programming is required.

Grading

Platforms

Discussions: Piazza
Homework submissions: Gradescope

Schedule

Date Topics Slides and Recommended Reading Notes
1/17 Introduction Slides  
1/22 Multi-armed bandits: explore-then-commit, epsilon-greedy, Boltzmann exploration, UCB, Thompson sampling Slides
Ch. 2 of FR
Ch. 6, 7, 8, 36 of LS
 
1/24 Linear contextual bandits: LinUCB, linear Thompson sampling Slides
Ch. 3 of FR
Ch. 18, 19, 20 of LS
Shipra Agrawal’s talk
 
1/29 General contextual bandits: UCB for logistic bandits, RegCB, SquareCB Slides
Ch. 3 of FR
Dylan Foster’s talk
 
1/31     Last day to enroll
2/5 Adversarial online learning: exponential weight algorithm, projected gradient descent Slides
Ch. 28 of LS
5.5-5.11 of Constantine Caramanis’s channel
 
2/7 Adversarial multi-armed bandits: Exp3 Slides
Haipeng Luo’s talk
 
2/12 Adversarial linear bandits: one-point gradient estimator + projected gradient descent, doubly robust estimator Slides
Ch. 5, 6 of L
 
2/14     Project proposal due on 2/16
2/19 Basics of Markov decision processes: Bellman (optimality) equations, reverse Bellman equations, value iteration Slides
Ch. 1.1-1.3 of AJKS
Ch. 3 of SB
 
2/21     HW1 due on 2/23
2/26      
2/28      
3/4 Spring recess    
3/6 Spring recess    
3/11      
3/13      
3/18      
3/20      
3/25      
3/27      
4/1      
4/3      
4/8      
4/10      
4/15 Student presentation    
4/17 Student presentation    
4/22 Student presentation    
4/24 Student presentation    
4/29      

Books and Lecture Notes

Previous RL Courses at UVA