Machine Learning School

M18-RL: Reinforcement Learning

11h 0min
Motivational examples; Markov decision processes, policies, long-term rewards, the goal of RL; types of RL methods: value-based, policy-based, actor-critic; value functions; exploration vs. exploitation; tabular RL methods; on-policy and off-policy methods; experience replay

Main Content
Acquisition
1 Lecture content
Content: – Reinforcement learning; – Motivational examples; – MDPs: the elements of an MDP, the Markov condition; – Policies; – Long-term rewards; – The goal of RL; – The types of RL: – Value-based; – Policy-based; – Actor-critic; – Value functions; – Recursiveness, Bellman equations; – Exploration vs. exploitation; – Greedy, ε-greedy, softmax; – Tabular methods: – Dynamic programming; – Monte Carlo learning; – Temporal difference learning; – SARSA and Q-learning: the difference between on-policy and off-policy methods; – Experience replay;

2h 0min
Practice
2 Colab Notebooks
A set of colab notebooks, regarding especially these topics: – The OpenAI Gym interface; – Illustration of the basic tabular methods using gridworld examples; – Experience replay; – ...

3h 0min
0
Investigation
3 Independent study time + review
The estimated additional time required for studying the material independently, using the lecture videos/slides and also referencing other literature and material, as necessary. Facilitates correct understanding of the material. This activity also includes the time required for review before exams.

5h 30min
Assessment
4 Quiz activities
Quiz activities meant to provide quick, unassessed feedback to students regarding their grasp of the material.

30 min