Reinforcement Learning (Spring 2025)

Course Information

Time: Monday & Wednesday 9:30-10:45AM
Location: Rice 340
Instructor and office hours: Chen-Yu Wei, Tuesday 3-4PM @ Rice 409
TA and office hours: Braham Snyder, Friday 4-5PM @ Rice 442

Overview

Reinforcement learning (RL) is a powerful learning paradigm through which machines learn to make (sequential) decisions. It has been playing a pivotal role in advancing artificial intelligence, with notable successes including mastering the game of Go and enhancing large language models.

This course focuses on the design principles of RL algorithms. Similar to statistical learning, a central challenge in RL is to generalize learned capabilities to unseen environments. However, RL also faces additional challenges such as exploration-exploitation tradeoff, credit assignment, and distribution mismatch between behavior and target policies. Throughout the course, we will delve into various solutions to these challenges and provide theoretical justifications.

Prerequisites

This course is mathematically demanding. Students are expected to have strong foundations in probability, linear algebra, and calculus. A basic understanding of machine learning and convex optimization will be beneficial. Proficiency in python programming is required.

Topics

Bandits, online learning, dynamic programming, Q-learning, policy evaluation, policy gradient.

Platforms

Piazza: Discussions
Gradescope: Homework submissions

Grading

(70%) Assignments
(30%) Final project (spec)

Late policy for assignments: 10 free late days can be used across all assignments. Each additional late day will result in a 10% deduction in the semester’s assignment grade. No assignment can be submitted more than 7 days after its deadline.

Schedule

One slide deck may be used for multiple lectures.

Date	Topics	Materials	Assignments
1/13	Introduction	Slides, Recording	HW0 (no submission needed)
1/15	Value-based bandits: Explore-then-exploit, ε-greedy	Slides, Recording
1/20	MLK Holiday
1/22	Boltzmann exploration, Inverse gap weighting, Reduction	Recording, Supp-IGW
1/27	UCB, TS	Recording	HW1 out
1/29	Policy-based bandits: Exponential weights (full-information)	Slides, Recording
2/3	EXP3	Recording
2/5	PPO	Recording	HW1 due on 2/7
2/10	NPG, PG	Recording
2/12	Bandits with continuous actions: Gradient ascent	Slides, Recording	HW2 out
2/17	One-point gradient estimators	Recording
2/19	PG, PPO	Recording
2/24	Markov decision process	Slides, Recording
2/26	Dynamic programming	Recording	HW2 due on 2/28
3/3	Dynamic programming	Recording
3/5	Dynamic programming	Recording
3/10	Spring recess
3/12	Spring recess
3/17	Performance difference lemma	Recording	HW3 out
3/19	Performance difference lemma	Recording
3/24	Value-based RL: DQN	Slides, Recording
3/26	DQN, DDQN, CQL	Recording	Midterm report due on 3/28 HW3 due on 3/30
3/31	Watkin’s Q-learning, residual gradient	Recording
4/2	Policy evaluation: TD, Monte Carlo estimation	Slides, Recording
4/7	GAE	Recording
4/9	Policy-based RL: PPO, A2C, PG	Recording	HW4 out
4/14	RL with continuous actions: DDPG, TD3, SAC	Slides, Recording
4/16	Exploration in MDPs: UCBVI, Randomized VI	Slides, Recording
4/21	IDS, Count-based methods, ICM	No Recording
4/23	RND, Bootstrapped DQN, VIME	Recording
4/28	Summary	Slides, Recording	The last lecture
4/30			Project video due
5/5			Project report due
5/8			HW4 due

Resources

Reinforcement Learning: A Comprehensive Overview by Kevin Murphy
Bandit Algorithms by Tor Lattimore and Csaba Szepesvari
Reinforcement Learning: An Introduction by Richard Sutton and Andrew Barto
Reinforcement Learning: Theory and Algorithms by Alekh Agarwal, Nan Jiang, Sham Kakade, and Wen Sun
Statistical Reinforcement Learning and Decision Making: Course Notes by Dylan Foster and Sasha Rakhlin
Reinforcement Learning: Foundations by Shie Mannor, Yishay Mansour, and Aviv Tamar

Previous Offerings

CS 6501 Reinforcement Learning (Spring 2024)
CS 6501 Topics in Reinforcement Learning (Fall 2022) by Prof. Shangtong Zhang