Course Information
- Time: Tuesday & Thursday 2:00-3:15PM
- Location: Olsson Hall 005
- Instructor and office hours: Chen-Yu Wei, Monday 3:30-4:30PM @ Rice 409
- TA and office hours:
- Fengyu Gao, Tuesday 3:30-4:30PM @ Rice 328
- Xinyu Liu, Wednesday 4:00-5:00PM @ Rice 442
- Yufeng Gao, Thursday 3:30-4:30PM @ Rice 442
Overview
Reinforcement learning (RL) is a powerful learning paradigm through which machines learn to make (sequential) decisions. It has been playing a pivotal role in advancing artificial intelligence, with notable successes including mastering the game of Go and enhancing large language models.
This course focuses on the design principles of RL algorithms. Similar to statistical learning, a central challenge in RL is to generalize learned capabilities to unseen environments. However, RL also faces additional challenges such as exploration-exploitation tradeoff, credit assignment, and distribution mismatch between behavior and target policies. Throughout the course, we will delve into various solutions to these challenges.
Prerequisites
Probability, linear algebra, calculus, machine learning, python programming
Platforms
- Piazza: Discussions
- Gradescope: Homework submissions
Grading
- (70%) Assignments: 5-6 assignments
- (30%) Exams: midterm (12%) and final (18%)
Assignments policy: Students have 12 free late days that may be used across all assignments. After the free late days are exhausted, each additional late day incurs a 10% deduction from the semester assignment grade. No assignment can be submitted more than 7 days after its deadline. Late days are counted by rounding up (e.g., 1 hour late counts as 1 day late).
Exams policy: All exams are in person; no online option is available. For both the midterm and the final, one (and at most one) make-up exam session may be arranged. If you miss the midterm due to extenuating circumstances (e.g., illness, family emergency), the final exam may be used to replace the midterm score. If you miss the final exam due to extenuating circumstances, you may request an incomplete grade and complete the exam after the semester.
Bonus points: Completing the course evaluation at semester end earns 3 bonus points.
The mapping from scores to final grades does NOT always follow the default scale. It may be adjusted (towards a better grade) based on the score distribution.
Schedule
One file of slides may be used for multiple lectures. Check Piazza for the recording passcode.
| Date | Topics | Slides | Recordings | Assignments |
|---|---|---|---|---|
| 1/13 | Introduction | Slides | No recording | |
| 1/15 | Value-based bandit algorithms: Explore-then-commit, ε-greedy | Slides | Recording | |
| 1/20 | Boltzmann exploration | Recording | ||
| 1/22 | Contextual bandits with regression | Recording | HW1 out | |
| 1/27 | ||||
| 1/29 | Policy-based bandit algorithms | |||
| 2/3 | HW1 due on 2/4 | |||
| 2/5 | ||||
| 2/10 | ||||
| 2/12 | Markov decision processes | |||
| 2/17 | ||||
| 2/19 | ||||
| 2/24 | ||||
| 2/26 | Midterm Exam (in class) | |||
| 3/3 | Spring recess (no class) | |||
| 3/5 | Spring recess (no class) | |||
| 3/10 | ||||
| 3/12 | ||||
| 3/17 | Value-based RL algorithms | |||
| 3/19 | ||||
| 3/24 | ||||
| 3/26 | Policy-based RL algorithms | |||
| 3/31 | ||||
| 4/2 | ||||
| 4/7 | RL with models/simulators | |||
| 4/9 | ||||
| 4/14 | ||||
| 4/16 | Imitation learning | |||
| 4/21 | ||||
| 4/23 | ||||
| 4/28 | Last Lecture | |||
| 5/1 | Final Exam (9AM-12PM) |
Resources
- Deep Reinforcement Learning by Sergey Levine
- Reinforcement Learning: An Introduction by Richard Sutton and Andrew Barto
- Reinforcement Learning: A Comprehensive Overview by Kevin Murphy
- Bandit Algorithms by Tor Lattimore and Csaba Szepesvari