Reinforcement Learning (Fall 2025)

Reinforcement Learning (Fall 2025)



Course Information

Overview

Reinforcement learning (RL) is a powerful learning paradigm through which machines learn to make (sequential) decisions. It has been playing a pivotal role in advancing artificial intelligence, with notable successes including mastering the game of Go and enhancing large language models.

This course focuses on the design principles of RL algorithms. Similar to statistical learning, a central challenge in RL is to generalize learned capabilities to unseen environments. However, RL also faces additional challenges such as exploration-exploitation tradeoff, credit assignment, and distribution mismatch between behavior and target policies. Throughout the course, we will delve into various solutions to these challenges and provide theoretical justifications.

Prerequisites

Probability, linear algebra, calculus, machine learning, python programming

Platforms

Grading

Late policy for assignments: 10 free late days can be used across all assignments. Each additional late day will result in a 10% deduction in the semester’s assignment grade. No assignment can be submitted more than 7 days after its deadline.

Schedule

One file of slides may be used for multiple lectures.

Date Topics Materials Assignments
8/27 Introduction Slides, Recording  
8/29   Recording  
9/1 Labor Day (no class)    
9/3 Value-based bandits: Explore-then-commit, ε-greedy Slides, Recording  
9/5 Boltzmann exploration, IGW Recording, Supp-IGW  
9/8 CB with regression, UCB Recording  
9/10 UCB, TS Recording  
9/12 Policy-based bandits: Exponential weights Slides, Recording HW1 out
9/15 EXP3 Recording  
9/17 EXP3 Recording  
9/19 PPO Recording  
9/22 NPG, PG Recording  
9/24 Continuous actions: gradient ascent Slides, Recording  
9/26 gradient estimator Recording HW1 due on 9/28
9/29 PG Recording  
10/1 Markov Decision Process Slides, Recording HW2 out
10/3 Value iteration Recording Project proposal due on 10/3
10/6 Value iteration Recording  
10/8 Value iteration Recording  
10/10 Policy iteration Recording  
10/13 Reading Day (no class)   HW2 due on 10/14
10/15 Generalized policy iteration Recording  
10/17 Performance difference lemma Recording  
10/20 Performance difference lemma Recording  
10/22 Value-iteration-based algorithms: DQN Slides, Recording  
10/24      
10/27      
10/29      
10/31      
11/3      
11/5      
11/7      
11/10      
11/12      
11/14      
11/17      
11/19      
11/21      
11/24      
11/26 Thanksgiving recess (no class)    
11/28 Thanksgiving recess (no class)    
12/1      
12/3      
12/5      
12/8      

Resources

Previous Offerings