Value function representation: tabular, shallow approximation, deep learning; deep value-based methods (DQN); policy gradient methods (REINFORCE); actor-critic methods (A2C, A3C, DDPG, PPO, SAC)
Content:
– Value function representation;
– Tabular;
– Approximation using shallow models;
– Deep learning;
– DQN;
– Policy gradient methods;
– With shallow models;
– With deep models;
– Actor-critic:
– REINFORCE, A3C, A2C;
– PPO;
– DDPG; SAC;
A set of colab notebooks, regarding especially these topics:
– DQN applied to the Lunar Lander;
– SAC applied to the inverted pendulum;
– SAC applied to AntBullet;
– ...
The estimated additional time required for studying the material independently, using the lecture videos/slides and also referencing other literature and material, as necessary. Facilitates correct understanding of the material.
This activity also includes the time required for review before exams.
Quiz activities meant to provide quick, unassessed feedback to students regarding their grasp of the material.