Motivational examples; Markov decision processes, policies, long-term rewards, the goal of RL; types of RL methods: value-based, policy-based, actor-critic; value functions; exploration vs. exploitation; tabular RL methods; on-policy and off-policy methods; experience replay
Content:
– Reinforcement learning;
– Motivational examples;
– MDPs: the elements of an MDP, the Markov condition;
– Policies;
– Long-term rewards;
– The goal of RL;
– The types of RL:
– Value-based;
– Policy-based;
– Actor-critic;
– Value functions;
– Recursiveness, Bellman equations;
– Exploration vs. exploitation;
– Greedy, ε-greedy, softmax;
– Tabular methods:
– Dynamic programming;
– Monte Carlo learning;
– Temporal difference learning;
– SARSA and Q-learning: the difference between on-policy and off-policy methods;
– Experience replay;
A set of colab notebooks, regarding especially these topics:
– The OpenAI Gym interface;
– Illustration of the basic tabular methods using gridworld examples;
– Experience replay;
– ...
The estimated additional time required for studying the material independently, using the lecture videos/slides and also referencing other literature and material, as necessary. Facilitates correct understanding of the material.
This activity also includes the time required for review before exams.
Quiz activities meant to provide quick, unassessed feedback to students regarding their grasp of the material.