Course Information
- Time: Monday & Wednesday & Friday 11:00-11:50 AM
- Location: Thornton Hall D223
- Instructor and office hours: Chen-Yu Wei, Monday 3:30-4:30PM @ Rice 409
- TA and office hours: Braham Snyder, Wednesday 4-5PM @ Rice 442
Overview
Reinforcement learning (RL) is a powerful learning paradigm through which machines learn to make (sequential) decisions. It has been playing a pivotal role in advancing artificial intelligence, with notable successes including mastering the game of Go and enhancing large language models.
This course focuses on the design principles of RL algorithms. Similar to statistical learning, a central challenge in RL is to generalize learned capabilities to unseen environments. However, RL also faces additional challenges such as exploration-exploitation tradeoff, credit assignment, and distribution mismatch between behavior and target policies. Throughout the course, we will delve into various solutions to these challenges and provide theoretical justifications.
Prerequisites
Probability, linear algebra, calculus, machine learning, python programming.
Platforms
- Piazza: Discussions
- Gradescope: Homework submissions
Grading
- (60%) Assignments
- (35%) Final project
- (5%) Participation
Late policy for assignments: 10 free late days can be used across all assignments. Each additional late day will result in a 10% deduction in the semester’s assignment grade. No assignment can be submitted more than 7 days after its deadline.
Schedule
One file of slides may be used for multiple lectures.
Date | Topics | Materials | Assignments |
---|---|---|---|
8/27 | Introduction | Slides, Recording | |
8/29 | Recording | ||
9/1 | Labor Day (no class) | ||
9/3 | Value-based bandits: Explore-then-commit, ε-greedy | Slides, Recording | |
9/5 | Boltzmann exploration, IGW | Recording, Supp-IGW | |
9/8 | CB with regression, UCB | Recording | |
9/10 | UCB, TS | Recording | |
9/12 | Policy-based bandits: Exponential weights | Slides, Recording | HW1 out |
9/15 | |||
9/17 | |||
9/19 | |||
9/22 | |||
9/24 | |||
9/26 | HW1 due on 9/28 | ||
9/29 | |||
10/1 | Project proposal due on 10/1 | ||
10/3 | |||
10/6 | |||
10/8 | |||
10/10 | |||
10/13 | Reading Day (no class) | ||
10/15 | |||
10/17 | |||
10/20 | |||
10/22 | |||
10/24 | |||
10/27 | |||
10/29 | |||
10/31 | |||
11/3 | |||
11/5 | |||
11/7 | |||
11/10 | |||
11/12 | |||
11/14 | |||
11/17 | |||
11/19 | |||
11/21 | |||
11/24 | |||
11/26 | Thanksgiving recess (no class) | ||
11/28 | Thanksgiving recess (no class) | ||
12/1 | |||
12/3 | |||
12/5 | |||
12/8 |
Resources
- Reinforcement Learning: A Comprehensive Overview by Kevin Murphy
- Bandit Algorithms by Tor Lattimore and Csaba Szepesvari
- Reinforcement Learning: An Introduction by Richard Sutton and Andrew Barto
- Reinforcement Learning: Theory and Algorithms by Alekh Agarwal, Nan Jiang, Sham Kakade, and Wen Sun
- Statistical Reinforcement Learning and Decision Making: Course Notes by Dylan Foster and Sasha Rakhlin
- Reinforcement Learning: Foundations by Shie Mannor, Yishay Mansour, and Aviv Tamar