Course manual 2021/2022

Course content

Reinforcement learning is a general framework studying sequential decision-making problems. In such problems, at every time step an action must be chosen to optimize long-term performance. This is a very wide class of problems that includes robotic control, game playing, but also human and animal behavior. Reinforcement learning methods can be applied when no training labels for the optimal action are available, and good actions have to be discovered through trial and error.

In this course, we will discuss properties of reinforcement learning problems and algorithms to solve them. In particular, we will look at

Solving problems with discrete action sets using so-called value-based methods. We will cover approximate methods that can be employed in problems with large state spaces.
Solving problems with continuous actions using approximate policy-based methods. These methods have application in, among others, robotics and control.
The use of multi-layer neural networks as function approximators in reinforcement learning algorithms, yielding so-called deep reinforcement learning methods
We will also briefly cover advanced & frontier topics in reinforcement learning.

Study materials

Literature

Reinforcement Learning: An Introduction. R. S. Sutton & A. G. Barto
Second Edition. Available: http://incompleteideas.net/book/RLbook2020.pdf. We will cover or partially cover chapters 1-11, 13, 16, and 17.
A Survey on Policy Search for Robotics. M. P. Deisenroth, G. Neumann, J. Peters. We will cover this survey partially. Available: http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.436.44&rep=rep1&type=pdf
We will study new developments in the field of RL through recent publicly available papers. Links will be distributed through the class website.

Objectives

The student is able to describe the main algorithms in Monte Carlo, temporal difference, and model-based and policy-based methods
The student is able to describe the main differences between on&off policy learning, TD&Monte Carlo methods, value-based and policy-based methods, tabular and approximate methods; and categorise reinforcement learning methods according to these properties
The student is able to implement reinforcement learning algorithm; and able to analyse its performance (or lack thereof) in a given environment. The student is able to apply the learned update rule on given data sets.
The student is able to compare reinforcement learning algorithms and point out their main advantages and disadvantages. The student is able to choose which reinforcement learning algorithm to select based on characteristics of the given environment
The student is able to design experiments to compare reinforcement learning techniques on a given environment. The student is able to critically evaluate reinforcement learning experiments and point out their strong points and weak points

Teaching methods

Lecture
Seminar

Learning activities

Activity	Hours
Hoorcollege	28
Laptopcollege	14
Tentamen	3
Werkcollege	14
Self study	109
Total	168	(6 EC x 28 uur)

Attendance

This programme does not have requirements concerning attendance (OER part B).

Additional requirements for this course:

Participation in the peer feedback meeting counts towards the final grade.

Assessment

Item and weight	Details
Final grade
0.6 (60%) Tentamen	Must be ≥ 5, NAP if missing
0.16 (16%) Homeworks
1 (25%) Homework 1
1 (25%) Homework 2
1 (25%) Homework 3
1 (25%) Homework 4
0.1 (10%) Programming
1 (20%) Lab 1 - Dynamic Programming
1 (20%) Lab 2 - Monte Carlo
1 (20%) Lab 3 - Temporal Difference
1 (20%) Lab 5 - Policy Gradient
1 (20%) Lab 4 - DQN
0.14 (14%) Reproducible research assignment

A resit is possible for the exam only.

A result of at least 5.0 on the exam is necessary to pass the course. Of course, the weighted average of assignments and exam needs to be at least 5.5 to pass the course.

Inspection of assessed work

An announcement will be made on canvas for inspecting exam grades. For inspecting assignment grades, ask your TA after the grade is announced.

Fraud and plagiarism

The 'Regulations governing fraud and plagiarism for UvA students' applies to this course. This will be monitored carefully. Upon suspicion of fraud or plagiarism the Examinations Board of the programme will be informed. For the 'Regulations governing fraud and plagiarism for UvA students' see: www.student.uva.nl

Course structure

Weeknummer	Onderwerpen	Studiestof
1
2
3
4
5
6
7
8

Owner	Master Artificial Intelligence
Coordinator	dr. H.C. van Hoof
Part of	Master Artificial Intelligence,

Reinforcement Learning