Reinforcement Learning (Spring 2026)

Reinforcement Learning (Spring 2026)



Course Information

Overview

Reinforcement learning (RL) is a powerful learning paradigm through which machines learn to make (sequential) decisions. It has been playing a pivotal role in advancing artificial intelligence, with notable successes including mastering the game of Go and enhancing large language models.

This course focuses on the design principles of RL algorithms. Similar to statistical learning, a central challenge in RL is to generalize learned capabilities to unseen environments. However, RL also faces additional challenges such as exploration-exploitation tradeoff, credit assignment, and distribution mismatch between behavior and target policies. Throughout the course, we will delve into various solutions to these challenges.

Prerequisites

Probability, linear algebra, calculus, machine learning, python programming

Platforms

Grading

Assignments policy: Students have 12 free late days that may be used across all assignments. After the free late days are exhausted, each additional late day incurs a 10% deduction from the semester assignment grade. No assignment can be submitted more than 7 days after its deadline. Late days are counted by rounding up (e.g., 1 hour late counts as 1 day late).

Exams policy: All exams are in person; no online option is available. For both the midterm and the final, one (and at most one) make-up exam session may be arranged. If you miss the midterm due to extenuating circumstances (e.g., illness, family emergency), the final exam may be used to replace the midterm score. If you miss the final exam due to extenuating circumstances, you may request an incomplete grade and complete the exam after the semester.

Bonus points: Completing the course evaluation at semester end earns 3 bonus points.

The mapping from scores to final grades does NOT always follow the default scale. It may be adjusted (towards a better grade) based on the score distribution.

Schedule

One file of slides may be used for multiple lectures. Check Piazza for the recording passcode.

Date Topics Slides Recordings Assignments
1/13 Introduction Slides No recording  
1/15 Value-based bandit algorithms: Explore-then-commit, ε-greedy Slides Recording  
1/20 Boltzmann exploration   Recording  
1/22 Contextual bandits with regression   Recording HW1 out
1/27        
1/29 Policy-based bandit algorithms      
2/3       HW1 due on 2/4
2/5        
2/10        
2/12 Markov decision processes      
2/17        
2/19        
2/24        
2/26 Midterm Exam (in class)      
3/3 Spring recess (no class)      
3/5 Spring recess (no class)      
3/10        
3/12        
3/17 Value-based RL algorithms      
3/19        
3/24        
3/26 Policy-based RL algorithms      
3/31        
4/2        
4/7 RL with models/simulators      
4/9        
4/14        
4/16 Imitation learning      
4/21        
4/23        
4/28 Last Lecture      
5/1 Final Exam (9AM-12PM)      

Resources