Reinforcement Learning

6 EC

Semester 1, period 1

5204RELE6Y

Owner Master Artificial Intelligence
Coordinator dr. H.C. van Hoof
Part of Master Artificial Intelligence,

Course manual 2023/2024

Course content

Reinforcement learning is a general framework studying sequential decision-making problems. In such problems, at every time step an action must be chosen to optimize long-term performance. This is a very wide class of problems that includes robotic control, game playing, but also human and animal behavior. Reinforcement learning methods can be applied when no training labels for the optimal action are available, and good actions have to be discovered through trial and error.

In this course, we will discuss properties of reinforcement learning problems and algorithms to solve them. In particular, we will look at

  • Solving problems with discrete action sets using so-called value-based methods. We will cover approximate methods that can be employed in problems with large state spaces.
  • Solving problems with continuous actions using approximate policy-based methods. These methods have application in, among others, robotics and control.
  • The use of multi-layer neural networks as function approximators in reinforcement learning algorithms, yielding so-called deep reinforcement learning methods
  • We will also briefly cover advanced & frontier topics in reinforcement learning.

Study materials

Literature

Objectives

  • The student is able to describe the main algorithms in Monte Carlo, temporal difference, and model-based and policy-based methods
  • The student is able to describe the main differences between on&off policy learning, TD&Monte Carlo methods, value-based and policy-based methods, tabular and approximate methods; and categorise reinforcement learning methods according to these properties
  • The student is able to implement reinforcement learning algorithm; and able to analyse its performance (or lack thereof) in a given environment. The student is able to apply the learned update rule on given data sets.
  • The student is able to compare reinforcement learning algorithms and point out their main advantages and disadvantages. The student is able to choose which reinforcement learning algorithm to select based on characteristics of the given environment
  • The student is able to critically evaluate reinforcement learning experiments and point out their strong points and weak points.

Teaching methods

  • Lecture
  • Seminar

In the lectures the theoretical background will be covered, coupled the building up of an intuitive understanding of how methods relate to each other through examples and explanation.

The  practical sessions focus on applying and practicing the techniques from the lecture. 

Learning activities

Activity

Hours

Hoorcollege

28

Laptopcollege

14

Tentamen

3

Werkcollege

14

Self study

109

Total

168

(6 EC x 28 uur)

Attendance

This programme does not have requirements concerning attendance (OER part B).

Additional requirements for this course:

Attendance to lectures and tutorial sessions is strongly encouraged but not required. 

Assessment

Item and weight Details

Final grade

1 (100%)

Tentamen

A resit is possible for the exam only.

A result of at least 5.0 on the exam is necessary to pass the course. Of course, the weighted average of assignments and exam needs to be at least 5.5 to pass the course.

The final grade is a weighted average of assignments and the exam as follows: 

  • Exam 65%
  • 5 homeworks, 4% each
  • 5 coding assignments, 2% each
  • Reproducibility report, 5%

Assignments handed in after the assignment deadline without permission might not be graded and might not be awarded full points. Please see the course policy in the syllabus on Canvas. 

Inspection of assessed work

An announcement will be made on canvas for inspecting exam grades. For inspecting assignment grades, ask your TA after the grade is announced.

Assignments

Both graded and ungraded assignments are provided. Only the graded assignments should be handed in. They are clearly marked as 'homework'. 

The answers to ungraded assignments will be provided one week after the assignment was scheduled so that students can check their own work. 

Homework assignments, coding assignments and the reproducibility report can be done in groups of 2 or individually. Feedback will be given in Canvas and additional feedback can be given on request by the TA during tutorial sessions. 

Fraud and plagiarism

The 'Regulations governing fraud and plagiarism for UvA students' applies to this course. This will be monitored carefully. Upon suspicion of fraud or plagiarism the Examinations Board of the programme will be informed. For the 'Regulations governing fraud and plagiarism for UvA students' see: www.student.uva.nl

Course structure

Lecture

Topic

Literature

T 1

Set-up programming environment, entry quiz

Ex. 0.1 & 0.2
(Exercise sheets will be provided on Canvas)

L 1

Introduction. Recap MDP & Bandit

RL:AI 1,2.1-2.4, 2.6, 2.7, 3.1-3.3
(RL:AI refers to Reinforcement Learning: An Introduction.
Full details under 'study materials'.)

T 2

 

Ex. 1.1-1.3

L 2

Dynamic programming

RL:AI 3.4-3.8, 4

T 3

 

Ex. 2.1-2.3, HW 2.4 & 2.5

L 3

Monte-Carlo methods

RL:AI 5.1-5.7

T 4

 

Ex. 3.1-3.4, HW 3.5

13/9

Hand in HW 1! (HW always due on Wednesday at 19:00)

 

L 4

Temporal difference methods

RL:AI 6.1-6.6,

T 5

 

Ex. 4.1-4.3, HW 4.4

L 5

Advanced TD methods

RL:AI 6.7, 7.1-7.3

T 6

 

Ex. 5.1-5.2, HW 5.3 & 5.4

20/9

Hand in HW 2!

 

L 6

Prediction with approximation

RL:AI 9.1-9.8

T 7

 

Ex. 6.1-6.5 HW 6.6

L 7

Control with approximation

RL:AI 10.1, 10.2, 11.1-11.7 & 16.5 ; DQN paper

T 8

 

Ex. 7.1-7.3, HW 7.4 & 7.5

27/9

Hand in HW 3!

 

L 8

Policy gradient methods: REINFORCE; approximations

RL:AI 13.1-13.5, 13.7; Survey 1 - 2.4.1.2

T 9

 

Ex. 8.1-8.4, HW 8.5

L 9

Policy gradient methods: PGT, DPG & evaluation

See lecture 8, RL that matters paper; DPG paper

T 10

 

Ex. 9.1 - 9.3, HW 9.4 & 9.5

4/10

Hand in HW 4!

 

L 10

Advanced PS methods: NPG & TRPO

Survey 2.4.1.3, TRPO paper

T 11

 

Ex. 10.1 - 10.2, HW 10.3

L 11

Planning and learning

RL:AI 8.1-8.8, 8.13

T 12

 

Ex. 11.1 & 11.2
Work on RR assignment

11/10

Hand in HW 5!

 

L 12

Guest lecture by Erman Acar
"Causality and Reinforcement Learning"

 

T 13

 

Work on RR assignment

L 13

Partial observability

RL:AI 17.3

T 14

FAQ session (exam and reproducible research assignment)

Ex. 13.1, RR assignment

18/10

Hand in HW 6!

 

L 14

Recap & Exam FAQ

 

26/10

Exam!

Note: data was changed.

Timetable

The schedule for this course is published on DataNose.

Contact information

Coordinator

  • dr. H.C. van Hoof

For questions regarding assignment to tutorial groups please see the Canvas announcement and contact Seethu Christopher. 

Questions about the content of lectures or exercises can be asked on the course Piazza page (see link on Canvas).

For sensitive or private questions please contact the course coordinator using e-mail or a message on Canvas.