Reinforcement Learning (Fall 2025)

Course Information

Time: Monday & Wednesday & Friday 11:00-11:50 AM
Location: Thornton Hall D223
Instructor and office hours: Chen-Yu Wei, Monday 3:30-4:30PM @ Rice 409
TA and office hours: Braham Snyder, Wednesday 4-5PM @ Rice 442

Overview

Reinforcement learning (RL) is a powerful learning paradigm through which machines learn to make (sequential) decisions. It has been playing a pivotal role in advancing artificial intelligence, with notable successes including mastering the game of Go and enhancing large language models.

This course focuses on the design principles of RL algorithms. Similar to statistical learning, a central challenge in RL is to generalize learned capabilities to unseen environments. However, RL also faces additional challenges such as exploration-exploitation tradeoff, credit assignment, and distribution mismatch between behavior and target policies. Throughout the course, we will delve into various solutions to these challenges and provide theoretical justifications.

Prerequisites

Probability, linear algebra, calculus, machine learning, python programming

Platforms

Piazza: Discussions
Gradescope: Homework submissions

Grading

(60%) Assignments
(35%) Final project (Instructions)
(5%) Participation

Late policy for assignments: 10 free late days can be used across all assignments. Each additional late day will result in a 10% deduction in the semester’s assignment grade. No assignment can be submitted more than 7 days after its deadline.

Schedule

One file of slides may be used for multiple lectures.

Date	Topics	Materials	Assignments
8/27	Introduction	Slides, Recording
8/29		Recording
9/1	Labor Day (no class)
9/3	Value-based bandits: Explore-then-commit, ε-greedy	Slides, Recording
9/5	Boltzmann exploration, IGW	Recording, Supp-IGW
9/8	CB with regression, UCB	Recording
9/10	UCB, TS	Recording
9/12	Policy-based bandits: Exponential weights	Slides, Recording	HW1 out
9/15	EXP3	Recording
9/17	EXP3	Recording
9/19	PPO	Recording
9/22	NPG, PG	Recording
9/24	Continuous actions: gradient ascent	Slides, Recording
9/26	gradient estimator	Recording	HW1 due on 9/28
9/29	PG	Recording
10/1	Markov Decision Process	Slides, Recording	HW2 out
10/3	Value iteration	Recording	Project proposal due on 10/3
10/6	Value iteration	Recording
10/8	Value iteration	Recording
10/10	Policy iteration	Recording
10/13	Reading Day (no class)		HW2 due on 10/14
10/15	Generalized policy iteration	Recording
10/17	Performance difference lemma	Recording
10/20	Performance difference lemma	Recording
10/22	Value-iteration-based algorithms: DQN	Slides, Recording
10/24	DQN, DDQN, CQL	Recording
10/27	Residual gradient	Recording
10/29	Policy-iteration-based algorithms: TD	Slides, Recording	HW3 out
10/31	GAE	Recording
11/3	GAE	Recording
11/5	PPO, PG, A2C	Recording
11/7	RL with continuous actions: DDPG, TD3	Slides, Recording
11/10	SAC	Recording
11/12	Model-based algorithms: Dyna, MCTS	Slides, Recording
11/14	Model ensemble	Recording	HW3 due on 11/16
11/17	Exploration in MDPs: UCBVI	Slides, Recording
11/19	Randomized VI, information-directed sampling, count-based bonus	Recording
11/21	ICM, RND, Bootstrapped DQN	Recording
11/24	No class		HW4 out
11/26	Thanksgiving recess (no class)
11/28	Thanksgiving recess (no class)
12/1	Imitation Learning: Behavior cloning, DAgger	Slides, Recording
12/3	NeurIPS (no class)
12/5	NeurIPS (no class)
12/8	DPO, Inverse RL, RLHF	Recording	Final presentation due on 12/12 HW4 due on 12/14 Final report due on 12/17

Resources

Reinforcement Learning: A Comprehensive Overview by Kevin Murphy
Bandit Algorithms by Tor Lattimore and Csaba Szepesvari
Reinforcement Learning: An Introduction by Richard Sutton and Andrew Barto
Reinforcement Learning: Theory and Algorithms by Alekh Agarwal, Nan Jiang, Sham Kakade, and Wen Sun
Statistical Reinforcement Learning and Decision Making: Course Notes by Dylan Foster and Sasha Rakhlin
Reinforcement Learning: Foundations by Shie Mannor, Yishay Mansour, and Aviv Tamar

Previous Offerings

CS 6501 Reinforcement Learning (Spring 2025)
CS 6501 Reinforcement Learning (Spring 2024)
CS 6501 Topics in Reinforcement Learning (Fall 2022) by Prof. Shangtong Zhang