Content:
– Reinforcement learning;
– Motivational examples;
– MDPs: the elements of an MDP, the Markov condition;
– Policies;
– Long-term rewards;
– The goal of RL;
– The types of RL:
– Value-based;
– Policy-based;
– Actor-critic;
– Value functions;
– Recursiveness, Bellman equations;
– Exploration vs. exploitation;
– Greedy, ε-greedy, softmax;
– Tabular methods:
– Dynamic programming;
– Monte Carlo learning;
– Temporal difference learning;
– SARSA and Q-learning: the difference between on-policy and off-policy methods;
– Experience replay;