6 EC
Semester 2, period 4
5294CADS6Y
Owner  Master Information Studies 
Coordinator  dr. Sara Magliacane 
Part of  Master Information Studies, track Data Science, year 1Master Information Studies, track Information Systems, year 1 
In data science and realworld machine learning, there are many issues that are often neglected in standard machine learning courses. In this course we will focus on these two aspects:
(i) many tasks are inherently trying to answer causal questions and gather actionable insights, even when there is not enough data to draw causal conclusions;
(i) data is often missing not at random, heterogenous or not i.i.d.
For the first issue, we will focus on formulating the correct causal questions and assumptions needed to solve the realworld task at hand. For example, a strong correlation between two variables X and Y is not enough to decide a policy in which we change X and expect to see an increase in Y (i.e. “correlation is not causation”). On the other hand, if we measure another variable Z that we know causes X, but does not have an effect on Y (i.e. an instrumental variable), we can discover under certain assumptions that X is the cause of Y, even if we haven’t performed any experiment. In the course we will learn about causal discovery, which extends this case to multiple variables and multiple observational and experimental datasets, and about causal effect estimation, which describes the type of causal effect a variable X has on another variable Y.
In particular, we will discuss how to interpret the output of existing methods and their assumptions, as well as the concept of identifiability, i.e. when one can answer the relevant causal relations with the data at hand, or which new data or experiments may be required.
To address the second issue, we will look into data fusion methods based on causal graphs, showing that they can represent correctly different distributions without inducing any wrong conclusion. In particular we will show how one can apply these methods to transfer learning and domain adaptation tasks.
While the lectures will provide the theoretical foundations, the course project will allow small teams of students to apply these concepts in a simplified realworld setting, with additional practical guidance in terms of existing tools during the lab assignments.
Activity 
Hours 

Lectures 
24 

Practicals 
14 

Presentation 
4 

Self study 
126 

Total 
168 
(6 EC x 28 uur) 
In TER part B of this programme no requirements regarding attendance are mentioned.
Item and weight  Details 
Final grade 
The assessment of the course consists of three parts
The final grade will be a weighted average of the grades in each part. The passing grade is a final grade >= 5.5.
The 'Regulations governing fraud and plagiarism for UvA students' applies to this course. This will be monitored carefully. Upon suspicion of fraud or plagiarism the Examinations Board of the programme will be informed. For the 'Regulations governing fraud and plagiarism for UvA students' see: www.student.uva.nl
Week number  Topic 
1  Introduction and Probability Recap 
2  Causal graphs and Interventions 
3  Covariate adjustment 
4  Potential outcomes 
5  Causal Discovery 
6  Advances topics (causalityinspired ML) 
7  Presentations of projects 
8  No class  exam week 
1st April 2022  Paper deadline 
The schedule for this course is published on DataNose.