6 EC
Semester 2, period 4
5294CADS6Y
Owner | Master Information Studies |
Coordinator | dr. Sara Magliacane |
Part of | Master Information Studies, track Data Science, year 1Master Information Studies, track Information Systems, year 1 |
In data science and real-world machine learning, there are many issues that are often neglected in standard machine learning courses. In this course we will focus on these two aspects:
(i) many tasks are inherently trying to answer causal questions and gather actionable insights, even when there is not enough data to draw causal conclusions;
(i) data is often missing not at random, heterogenous or not i.i.d.
For the first issue, we will focus on formulating the correct causal questions and assumptions needed to solve the real-world task at hand. For example, a strong correlation between two variables X and Y is not enough to decide a policy in which we change X and expect to see an increase in Y (i.e. “correlation is not causation”). On the other hand, if we measure another variable Z that we know causes X, but does not have an effect on Y (i.e. an instrumental variable), we can discover under certain assumptions that X is the cause of Y, even if we haven’t performed any experiment. In the course we will learn about causal discovery, which extends this case to multiple variables and multiple observational and experimental datasets, and about causal effect estimation, which describes the type of causal effect a variable X has on another variable Y.
In particular, we will discuss how to interpret the output of existing methods and their assumptions, as well as the concept of identifiability, i.e. when one can answer the relevant causal relations with the data at hand, or which new data or experiments may be required.
To address the second issue, we will look into data fusion methods based on causal graphs, showing that they can represent correctly different distributions without inducing any wrong conclusion. In particular we will show how one can apply these methods to transfer learning and domain adaptation tasks.
While the lectures will provide the theoretical foundations, the course project will allow small teams of students to apply these concepts in a simplified real-world setting, with additional practical guidance in terms of existing tools during the lab assignments.
Activity |
Hours |
|
Lectures |
24 |
|
Practicals |
14 |
|
Presentation |
4 |
|
Self study |
126 |
|
Total |
168 |
(6 EC x 28 uur) |
In TER part B of this programme no requirements regarding attendance are mentioned.
Item and weight | Details |
Final grade |
The assessment of the course consists of three parts
The final grade will be a weighted average of the grades in each part. The passing grade is a final grade >= 5.5.
The 'Regulations governing fraud and plagiarism for UvA students' applies to this course. This will be monitored carefully. Upon suspicion of fraud or plagiarism the Examinations Board of the programme will be informed. For the 'Regulations governing fraud and plagiarism for UvA students' see: www.student.uva.nl
Week number | Topic |
1 | Introduction and Probability Recap |
2 | Causal graphs and Interventions |
3 | Covariate adjustment |
4 | Potential outcomes |
5 | Causal Discovery |
6 | Advances topics (causality-inspired ML) |
7 | Presentations of projects |
8 | No class - exam week |
1st April 2022 | Paper deadline |
The schedule for this course is published on DataNose.