Introduction to Reinforcement Learning

Course Overview

An introductory course designed for advanced undergraduates and beginning graduate students covering the fundamentals of reinforcement learning. Students will learn core concepts, implement basic algorithms, and understand when and how to apply RL to real-world problems.

Course Description

This course provides a comprehensive introduction to reinforcement learning, covering both theoretical foundations and practical implementation. Students will learn how agents can learn to make sequential decisions through interaction with their environment, gaining hands-on experience through coding assignments and a final project.

Learning Objectives

By the end of this course, students will be able to:

Understand the mathematical foundations of MDPs and RL
Implement core RL algorithms from scratch
Recognize appropriate use cases for RL vs. other ML approaches
Debug and tune RL algorithms for different problems
Critically evaluate RL research and applications
Apply RL techniques to small-scale real-world problems

Tentative Course Outline

Part I: Foundations (Weeks 1-4)

Week 1: Introduction to RL

What is reinforcement learning?
Comparison with supervised and unsupervised learning
Key challenges in RL
Real-world applications and case studies

Week 2: Markov Decision Processes

States, actions, rewards, transitions
Policies and value functions
Bellman equations
Optimal policies and optimality

Week 3: Dynamic Programming

Policy evaluation
Policy iteration
Value iteration
Limitations and computational considerations

Week 4: Monte Carlo Methods

Monte Carlo prediction
Monte Carlo control
On-policy vs. off-policy learning
Importance sampling

Part II: Core Algorithms (Weeks 5-9)

Week 5: Temporal-Difference Learning

TD prediction (TD(0))
Advantages of TD learning
TD vs. Monte Carlo vs. Dynamic Programming
n-step TD methods

Week 6: Q-Learning and SARSA

Q-learning algorithm
SARSA and on-policy control
Expected SARSA
Implementation and debugging strategies

Week 7: Function Approximation

Why function approximation?
Linear function approximation
Feature engineering
Convergence considerations

Week 8: Deep Q-Networks (DQN)

Neural networks for value approximation
Experience replay
Target networks
Common pitfalls and solutions

Week 9: Policy Gradient Methods

REINFORCE algorithm
Policy gradient theorem
Baseline techniques
Actor-critic methods

Part III: Advanced Topics & Applications (Weeks 10-13)

Week 10: Exploration vs. Exploitation

Multi-armed bandits
Epsilon-greedy, UCB, Thompson sampling
Exploration in deep RL
Curiosity-driven learning

Week 11: Reward Shaping & Design

Reward engineering challenges
Reward shaping techniques
Inverse reinforcement learning (intro)
Common reward specification mistakes

Week 12: RL in Practice

Hyperparameter tuning
Debugging RL agents
Benchmarking and evaluation
Sim-to-real transfer considerations

Week 13: Case Studies

Robotics applications
Game playing (Chess, Go, video games)
Recommendation systems
Resource management

Part IV: Final Projects (Weeks 14-16)

Week 14-15: Project Development

Work on final projects
In-class project consultations
Peer feedback sessions

Week 16: Final Presentations

Student project presentations
Course wrap-up and future directions

Prerequisites

Required:

Programming: Proficient in Python
Mathematics:
- Linear algebra (vectors, matrices)
- Probability (random variables, expectations)
- Calculus (derivatives, gradients)
Machine Learning: Basic ML concepts (supervised learning, neural networks)

Recommended:

Previous coursework in algorithms and data structures
Experience with NumPy, PyTorch, or TensorFlow
Exposure to optimization methods

Course Format

Lectures: 2x per week (75 minutes each)
- Mix of theory, examples, and live coding
- Interactive discussions and Q&A
Lab Sessions: Weekly (90 minutes)
- Guided implementation exercises
- Algorithm debugging practice
- Office hours for project help
Programming Assignments: 5 assignments
- Implement core RL algorithms
- Apply to provided environments
- Analyze and report results
Final Project: Team-based (2-3 students)
- Propose an RL application
- Implement and evaluate solution
- Written report and presentation

Assessment

Programming Assignments (40%): 5 assignments, 8% each
Midterm Exam (20%): Covering weeks 1-7
Final Project (30%): Report (20%) + Presentation (10%)
Participation (10%): Class engagement and lab attendance

Textbooks & Resources

Primary Textbook

Sutton & Barto: “Reinforcement Learning: An Introduction” (2nd ed)
- Free online: http://incompleteideas.net/book/the-book-2nd.html

Supplementary Resources

OpenAI Spinning Up in Deep RL
DeepMind x UCL RL Lecture Series
Selected research papers on key topics

Software & Tools

Python 3.8+
OpenAI Gym / Gymnasium
PyTorch or TensorFlow
Weights & Biases (for experiment tracking)

Course Policies

Collaboration Policy

Assignments: Discuss concepts but write your own code
Projects: Full collaboration within teams
Exams: Individual work only

Late Policy

Assignments: 20% penalty per day late (max 3 days)
Projects: No late submissions (presents at scheduled time)

Academic Integrity

Cite all external resources and code
Do not share code with other students
Plagiarism will result in course failure

Expected Workload

Lectures: 2.5 hours/week
Lab sessions: 1.5 hours/week
Assignments: 5-8 hours/week
Project: 10-15 hours/week (final 3 weeks)
Total: ~10-12 hours/week

Coming Soon

Detailed assignment specifications, starter code, and lecture slides will be made available once the course is scheduled.