Course manual 2021/2022

Course content

In data science and real-world machine learning, there are many issues that are often neglected in standard machine learning courses. In this course we will focus on these two aspects:

(i) many tasks are inherently trying to answer causal questions and gather actionable insights, even when there is not enough data to draw causal conclusions;

(i) data is often missing not at random, heterogenous or not i.i.d.

For the first issue, we will focus on formulating the correct causal questions and assumptions needed to solve the real-world task at hand. For example, a strong correlation between two variables X and Y is not enough to decide a policy in which we change X and expect to see an increase in Y (i.e. “correlation is not causation”). On the other hand, if we measure another variable Z that we know causes X, but does not have an effect on Y (i.e. an instrumental variable), we can discover under certain assumptions that X is the cause of Y, even if we haven’t performed any experiment. In the course we will learn about causal discovery, which extends this case to multiple variables and multiple observational and experimental datasets, and about causal effect estimation, which describes the type of causal effect a variable X has on another variable Y.

In particular, we will discuss how to interpret the output of existing methods and their assumptions, as well as the concept of identifiability, i.e. when one can answer the relevant causal relations with the data at hand, or which new data or experiments may be required.

To address the second issue, we will look into data fusion methods based on causal graphs, showing that they can represent correctly different distributions without inducing any wrong conclusion. In particular we will show how one can apply these methods to transfer learning and domain adaptation tasks.

While the lectures will provide the theoretical foundations, the course project will allow small teams of students to apply these concepts in a simplified real-world setting, with additional practical guidance in terms of existing tools during the lab assignments.

Study materials

Syllabus

Objectives

  • Understand the high impact and potential of causal inference, as well as its limitations
  • Identify the correct causal questions and assumptions in a given data science task
  • Analyze and interpret the outputs of existing causal inference tools
  • Understand which answers can and which cannot be answered with the current data (identifiability) and which experiments/further data sources could help
  • Combine different tools in order to solve more complex causal questions
  • [Optionally] implement simple extensions to existing algorithms in causal discovery, causal effect estimation and applications of causality to machine learning

Teaching methods

  • Lecture
  • Computer lab session/practical training
  • Self-study
  • Presentation/symposium
  • Working independently on e.g. a project or thesis

Learning activities

Activity

Hours

 

Lectures

24

 

Practicals

14

 

Presentation

4

 

Self study

126

 

Total

168

(6 EC x 28 uur)

Attendance

In TER part B of this programme no requirements regarding attendance are mentioned.

Assessment

Item and weight Details

Final grade

The assessment of the course consists of three parts

  • Quizzes on theory and in-class participation (10%)
  • Course project presentation (20%) - group grade
  • Course project report (70%) - individual grade

The final grade will be a weighted average of the grades in each part. The passing grade is a final grade >= 5.5.

Fraud and plagiarism

The 'Regulations governing fraud and plagiarism for UvA students' applies to this course. This will be monitored carefully. Upon suspicion of fraud or plagiarism the Examinations Board of the programme will be informed. For the 'Regulations governing fraud and plagiarism for UvA students' see: www.student.uva.nl

Course structure

Week number Topic
1 Introduction and Probability Recap
2 Causal graphs and Interventions
3 Covariate adjustment
4 Potential outcomes
5 Causal Discovery
6 Advances topics (causality-inspired ML)
7 Presentations of projects
8 No class - exam week
1st April 2022 Paper deadline

Timetable

The schedule for this course is published on DataNose.

Contact information

Coordinator

  • dr. Sara Magliacane

Staff