6 EC
Semester 1, period 1
5204RELE6Y
Owner | Master Artificial Intelligence |
Coordinator | dr. H.C. van Hoof |
Part of | Master Artificial Intelligence, |
Reinforcement learning is a general framework studying sequential decision-making problems. In such problems, at every time step an action must be chosen to optimize long-term performance. This is a very wide class of problems that includes robotic control, game playing, but also human and animal behavior. Reinforcement learning methods can be applied when no training labels for the optimal action are available, and good actions have to be discovered through trial and error.
In this course, we will discuss properties of reinforcement learning problems and algorithms to solve them. In particular, we will look at
Reinforcement Learning: An Introduction. R. S. Sutton & A. G. Barto
Second Edition. Available: http://incompleteideas.net/book/RLbook2020.pdf. We will cover or partially cover chapters 1-11, 13, 16, and 17.
A Survey on Policy Search for Robotics. M. P. Deisenroth, G. Neumann, J. Peters. We will cover this survey partially. Available: http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.436.44&rep=rep1&type=pdf
We will study new developments in the field of RL through recent publicly available papers. Links will be distributed through the class website.
Activity | Hours | |
Hoorcollege | 28 | |
Laptopcollege | 14 | |
Tentamen | 3 | |
Werkcollege | 14 | |
Self study | 109 | |
Total | 168 | (6 EC x 28 uur) |
This programme does not have requirements concerning attendance (OER part B).
Additional requirements for this course:
Participation in the peer feedback meeting counts towards the final grade.
Item and weight | Details |
Final grade | |
0.6 (60%) Tentamen | Must be ≥ 5, NAP if missing |
0.16 (16%) Homeworks | |
1 (25%) Homework 1 | |
1 (25%) Homework 2 | |
1 (25%) Homework 3 | |
1 (25%) Homework 4 | |
0.1 (10%) Programming | |
1 (20%) Lab 1 - Dynamic Programming | |
1 (20%) Lab 2 - Monte Carlo | |
1 (20%) Lab 3 - Temporal Difference | |
1 (20%) Lab 5 - Policy Gradient | |
1 (20%) Lab 4 - DQN | |
0.14 (14%) Reproducible research assignment |
A resit is possible for the exam only.
A result of at least 5.0 on the exam is necessary to pass the course. Of course, the weighted average of assignments and exam needs to be at least 5.5 to pass the course.
An announcement will be made on canvas for inspecting exam grades. For inspecting assignment grades, ask your TA after the grade is announced.
The 'Regulations governing fraud and plagiarism for UvA students' applies to this course. This will be monitored carefully. Upon suspicion of fraud or plagiarism the Examinations Board of the programme will be informed. For the 'Regulations governing fraud and plagiarism for UvA students' see: www.student.uva.nl
Weeknummer | Onderwerpen | Studiestof |
1 | ||
2 | ||
3 | ||
4 | ||
5 | ||
6 | ||
7 | ||
8 |
The schedule for this course is published on DataNose.