Reinforcement Learning (Spring 2024)

Course Information

Instructor: Chen-Yu Wei
TA: Haolin Liu (srs8rh at virginia.edu)
Time: MW 9:30-10:45
Location: Rice Hall 340
Office Hours (Instructor): Th 15:30-16:30 at Rice 409
Office Hours (TA): M 11:00-12:00 at Rice 336

Overview

Reinforcement learning (RL) is a powerful learning paradigm through which machines learn to make (sequential) decisions. It has been playing a pivotal role in advancing artificial intelligence, with notable successes including mastering the game of Go and enhancing large language models.

This course focuses on the design principles of RL algorithms. Similar to statistical learning, a central challenge in RL is to generalize learned capabilities to unseen environments. However, RL also faces additional challenges such as exploration-exploitation tradeoff, credit assignment, and distribution mismatch between behavior and target policies. Throughout the course, we will delve into various solutions to these challenges and provide theoretical justifications.

Prerequisites

This course is mathematically demanding. Students are expected to have strong foundations in probability, linear algebra, and calculus. A basic understanding of machine learning and convex optimization will be beneficial. Proficiency in python programming is required.

Grading

(60%) Assignments: 4 problem sets, each consisting of theoretical questions and programming tasks.
(35%) Final project: See here for the specification.
(5%) Class participation

Platforms

Discussions: Piazza
Homework submissions: Gradescope

Schedule

Date	Topics	Slides and Recommended Reading	Notes
1/17	Introduction	Slides
1/22	Multi-armed bandits: explore-then-commit, epsilon-greedy, Boltzmann exploration, UCB, Thompson sampling	Slides Ch. 2 of FR Ch. 6, 7, 8, 36 of LS
1/24	Linear contextual bandits: LinUCB, linear Thompson sampling	Slides Ch. 3 of FR Ch. 18, 19, 20 of LS Shipra Agrawal’s talk
1/29	General contextual bandits: UCB for logistic bandits, RegCB, SquareCB	Slides Ch. 3 of FR Dylan Foster’s talk
1/31			Last day to enroll
2/5	Adversarial online learning: exponential weight algorithm, projected gradient descent	Slides Ch. 28 of LS 5.5-5.11 of Constantine Caramanis’s channel
2/7	Adversarial multi-armed bandits: Exp3	Slides Haipeng Luo’s talk
2/12	Adversarial linear bandits: one-point gradient estimator + projected gradient descent, doubly robust estimator	Slides Ch. 5, 6 of L
2/14			Project proposal due on 2/16
2/19	Basics of Markov decision processes: Bellman (optimality) equations, reverse Bellman equations, value iteration, (modified) policy iteration, performance difference lemma	Slides Ch. 1.1-1.3 of AJKS Ch. 3 of SB
2/21			HW1 due on 2/23
2/26
2/28
3/4	Spring recess
3/6	Spring recess
3/11	Approximate value iteration: least-square value iteration (LSVI), Watkins’s Q-learning, deep Q-learning, prioritized replay, double Q-learning	Slides Ch. 3, 7 of AJKS Lec. 7, 8 of Sergey Levine’s course
3/13			HW2 due on 3/17
3/18	Policy evaluation: least-square policy evaluation (LSPE), temporal difference (TD) learning, Monte Carlo estimation, TD(λ)	Slides Ch. 5.1, 5.2, 5.5, 6.1-6.3, 9.1-9.4, 11.1-11.3, 12.1-12.5 of SB
3/20
3/25	Policy-based learning methods: least-square policy iteration (LSPI), policy gradient, natural policy gradient (NPG)	Slides Notes of J on 3/24-3/31 Lec. 5, 6, 9 of Sergey Levine’s course W. van Heeswijk’s paper
3/27			Project milestone due on 3/29
4/1
4/3	Actor-critic methods: advantage actor-critic (A2C), proximal policy optimization (PPO), deep deterministic policy gradient (DDPG), twin-delayed DDPG (TD3), soft actor-critic (SAC)	Slides Algorithms Docs in Spinning Up References in the slides	HW3 due on 4/7
4/8
4/10
4/15	Student presentation
4/17	Student presentation
4/22	Student presentation
4/24	Student presentation
4/29	Summary	Slides	HW4 due on 5/10

Books and Lecture Notes

Bandit Algorithms by Tor Lattimore and Csaba Szepesvari
Reinforcement Learning: An Introduction by Richard Sutton and Andrew Barto
Reinforcement Learning: Theory and Algorithms by Alekh Agarwal, Nan Jiang, Sham Kakade, and Wen Sun
Statistical Reinforcement Learning and Decision Making: Course Notes by Dylan Foster and Sasha Rakhlin

Deep Reinforcement Learning by Sergey Levine
Reinforcement Learning by Emma Brunskill
RL Lecture Series by Hado van Hasselt
Introduction to Reinforcement Learning by Lucas Janson and Sham Kakade
Introduction to Reinforcement Learning and Foundations of Reinforcement Learning by Wen Sun
Topics in Bandits and Reinforcement Learning Theory by Chicheng Zhang
Foundations of Reinforcement Learning by Chi Jin
Reinforcement Learning and Statistical Reinforcement Learning by Nan Jiang
Theoretical Foundations of Reinforcement Learning by Csaba Szepesvari
Theory of Reinforcement Learning by Ambuj Tewari
Theory of Multi-armed Bandits and Reinforcement Learning by Jiantao Jiao
Statistical Reinforcement Learning and Decision Making by Dylan Foster and Sasha Rakhlin
Introduction to Online Optimization/Learning by Haipeng Luo

Previous RL Courses at UVA

Topics in Reinforcement Learning by Shangtong Zhang
Reinforcement Learning by Hongning Wang