NOC:Reinforcement Learning


Lecture 1 - Tutorial 1 - Probability Basics 1


Lecture 2 - Tutorial 1 - Probability Basics 2


Lecture 3 - Tutorial 2 - Linear algebra - 1


Lecture 4 - Tutorial 2 - Linear algebra - 2


Lecture 5 - Introduction to RL


Lecture 6 - RL Framework and applications


Lecture 7 - Introduction to Immediate RL


Lecture 8 - Bandit Optimalities


Lecture 9 - Value function based methods


Lecture 10 - UCB 1


Lecture 11 - Concentration Bounds


Lecture 12 - UCB 1 Theorem


Lecture 13 - PAC Bounds


Lecture 14 - Median Elimination


Lecture 15 - Thompson Sampling


Lecture 16 - Policy Search


Lecture 17 - REINFORCE


Lecture 18 - Contextual Bandits


Lecture 19 - Full RL Introduction


Lecture 20 - Returns, Value Functions and MDPs


Lecture 21 - MDP Modelling


Lecture 22 - Bellman Equation


Lecture 23 - Bellman Optimality Equation


Lecture 24 - Cauchy Sequence and Green's Equation


Lecture 25 - Banach Fixed Point Theorem


Lecture 26 - Convergence Proof


Lecture 27 - Lpi Convergence


Lecture 28 - Value Iteration


Lecture 29 - Policy Iteration


Lecture 30 - Dynamic Programming


Lecture 31 - Monte Carlo


Lecture 32 - Control in Monte Carlo


Lecture 33 - Off Policy MC


Lecture 34 - UCT


Lecture 35 - TD(0)


Lecture 36 - TD(0) Control


Lecture 37 - Q-Learning


Lecture 38 - Afterstate


Lecture 39 - Eligibility Traces


Lecture 40 - Backward View of Eligibility Traces


Lecture 41 - Eligibility Trace Control


Lecture 42 - Thompson Sampling Recap


Lecture 43 - Function Approximation


Lecture 44 - Linear Parameterization


Lecture 45 - State Aggregation Methods


Lecture 46 - Function Approximation and Eligibility Traces


Lecture 47 - LSTD and LSTDQ


Lecture 48 - LSPI and Fitted Q


Lecture 49 - DQN and Fitted Q-Iteration


Lecture 50 - Policy Gradient Approach


Lecture 51 - Actor Critic and REINFORCE


Lecture 52 - REINFORCE (cont'd)


Lecture 53 - Policy Gradient with Function Approximation


Lecture 54 - Hierarchical Reinforcement Learning


Lecture 55 - Types of Optimality


Lecture 56 - Semi Markov Decision Processes


Lecture 57 - Options


Lecture 58 - Learning with Options


Lecture 59 - Hierarchical Abstract Machines


Lecture 60 - MAXQ


Lecture 61 - MAXQ Value Function Decomposition


Lecture 62 - Option Discovery


Lecture 63 - POMDP Introduction


Lecture 64 - Solving POMDP