6 EC
Semester 1, period 1
5204RELE6Y
| Owner | Master Artificial Intelligence |
| Coordinator | dr. H.C. van Hoof |
| Part of | Master Artificial Intelligence, |
Reinforcement learning is a general framework studying sequential decision-making problems. In such problems, at every time step an action must be chosen to optimize long-term performance. This is a very wide class of problems that includes robotic control, game playing, but also human and animal behavior. Reinforcement learning methods can be applied when no training labels for the optimal action are available, and good actions have to be discovered through trial and error.
In this course, we will discuss properties of reinforcement learning problems and algorithms to solve them. In particular, we will look at
Reinforcement Learning: An Introduction. R. S. Sutton & A. G. Barto
Second Edition. Available: http://incompleteideas.net/book/RLbook2020.pdf. We will cover or partially cover chapters 1-11, 13, 16, and 17.
A Survey on Policy Search for Robotics. M. P. Deisenroth, G. Neumann, J. Peters. We will cover this survey partially. Available: http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.436.44&rep=rep1&type=pdf
We will study new developments in the field of RL through recent publicly available papers. Links will be distributed through the class website.
In the lectures the theoretical background will be covered, coupled the building up of an intuitive understanding of how methods relate to each other through examples and explanation.
The practical sessions focus on applying and practicing the techniques from the lecture.
Activity | Hours | |
Hoorcollege | 28 | |
Laptopcollege | 14 | |
Tentamen | 3 | |
Werkcollege | 14 | |
Self study | 109 | |
Total | 168 | (6 EC x 28 uur) |
This programme does not have requirements concerning attendance (OER part B).
Additional requirements for this course:
Attendance to lectures and tutorial sessions is strongly encouraged but not required.
| Item and weight | Details |
|
Final grade | |
|
1 (100%) Tentamen |
A resit is possible for the exam only.
A result of at least 5.0 on the exam is necessary to pass the course. Of course, the weighted average of assignments and exam needs to be at least 5.5 to pass the course.
The final grade is a weighted average of assignments and the exam as follows:
Assignments handed in after the assignment deadline without permission might not be graded and might not be awarded full points. Please see the course policy in the syllabus on Canvas.
An announcement will be made on canvas for inspecting exam grades. For inspecting assignment grades, ask your TA after the grade is announced.
Both graded and ungraded assignments are provided. Only the graded assignments should be handed in. They are clearly marked as 'homework'.
The answers to ungraded assignments will be provided one week after the assignment was scheduled so that students can check their own work.
Homework assignments, coding assignments and the reproducibility report can be done in groups of 2 or individually. Feedback will be given in Canvas and additional feedback can be given on request by the TA during tutorial sessions.
The 'Regulations governing fraud and plagiarism for UvA students' applies to this course. This will be monitored carefully. Upon suspicion of fraud or plagiarism the Examinations Board of the programme will be informed. For the 'Regulations governing fraud and plagiarism for UvA students' see: www.student.uva.nl
|
Lecture |
Topic |
Literature |
|
T 1 |
Set-up programming environment, entry quiz |
Ex. 0.1 & 0.2 |
|
L 1 |
Introduction. Recap MDP & Bandit |
RL:AI 1,2.1-2.4, 2.6, 2.7, 3.1-3.3 |
|
T 2 |
|
Ex. 1.1-1.3 |
|
L 2 |
Dynamic programming |
RL:AI 3.4-3.8, 4 |
|
T 3 |
|
Ex. 2.1-2.3, HW 2.4 & 2.5 |
|
L 3 |
Monte-Carlo methods |
RL:AI 5.1-5.7 |
|
T 4 |
|
Ex. 3.1-3.4, HW 3.5 |
|
13/9 |
Hand in HW 1! (HW always due on Wednesday at 19:00) |
|
|
L 4 |
Temporal difference methods |
RL:AI 6.1-6.6, |
|
T 5 |
|
Ex. 4.1-4.3, HW 4.4 |
|
L 5 |
Advanced TD methods |
RL:AI 6.7, 7.1-7.3 |
|
T 6 |
|
Ex. 5.1-5.2, HW 5.3 & 5.4 |
|
20/9 |
Hand in HW 2! |
|
|
L 6 |
Prediction with approximation |
RL:AI 9.1-9.8 |
|
T 7 |
|
Ex. 6.1-6.5 HW 6.6 |
|
L 7 |
Control with approximation |
RL:AI 10.1, 10.2, 11.1-11.7 & 16.5 ; DQN paper |
|
T 8 |
|
Ex. 7.1-7.3, HW 7.4 & 7.5 |
|
27/9 |
Hand in HW 3! |
|
|
L 8 |
Policy gradient methods: REINFORCE; approximations |
RL:AI 13.1-13.5, 13.7; Survey 1 - 2.4.1.2 |
|
T 9 |
|
Ex. 8.1-8.4, HW 8.5 |
|
L 9 |
Policy gradient methods: PGT, DPG & evaluation |
See lecture 8, RL that matters paper; DPG paper |
|
T 10 |
|
Ex. 9.1 - 9.3, HW 9.4 & 9.5 |
|
4/10 |
Hand in HW 4! |
|
|
L 10 |
Advanced PS methods: NPG & TRPO |
Survey 2.4.1.3, TRPO paper |
|
T 11 |
|
Ex. 10.1 - 10.2, HW 10.3 |
|
L 11 |
Planning and learning |
RL:AI 8.1-8.8, 8.13 |
|
T 12 |
|
Ex. 11.1 & 11.2 |
|
11/10 |
Hand in HW 5! |
|
|
L 12 |
Guest lecture by Erman Acar |
|
|
T 13 |
|
Work on RR assignment |
|
L 13 |
Partial observability |
RL:AI 17.3 |
|
T 14 |
FAQ session (exam and reproducible research assignment) |
Ex. 13.1, RR assignment |
|
18/10 |
Hand in HW 6! |
|
|
L 14 |
Recap & Exam FAQ |
|
|
26/10 |
Exam! |
Note: data was changed. |
The schedule for this course is published on DataNose.
For questions regarding assignment to tutorial groups please see the Canvas announcement and contact Seethu Christopher.
Questions about the content of lectures or exercises can be asked on the course Piazza page (see link on Canvas).
For sensitive or private questions please contact the course coordinator using e-mail or a message on Canvas.