Course manual 2024/2025

Course content

In this course we study the techniques to ensure interpretability of modern AI techniques, and to generate explanations for the classifications, decisions and predictions that these systems make. We consider applications of these techniques in diverse subfields of AI, ranging from machine vision to translation, and from speech recognition to automatic reasoning. An important focus will be on post hoc interpretation techniques, that take an existing model (e.g., a deep learning model for object recognition, machine translation or music recommendation), and attempt to interpret the intermediate representations using visualization, attribution or probing methods. A second thread will be studying ways to a priori constrain or bias such models to arrive at more interpretable solutions, e.g. by encouraging sparse representations or by generating explanations as a secondary objective. Finally, we will consider approaches that are inherently explainable, including models with rich symbolic backbones such as neurosymbolic models. A common theme throughout the course will be the comparison of symbolic and deep learning models, and we will see that classic results from symbolic AI sometimes find a new relevance in their use for helping interpret deep learning systems, or for helping identify their shortcomings.

Core concepts covered in the course: probing (information-theoretic/counterfactual/Pareto-optimal), occlusion, Guided Backpropagation, Deconvolution, Saliency maps, (Deep)LIFT, Layer-wise relevance propagation, Integrated Gradients, Shapley values, Contextual decomposition, (Deep)SHAP, Attention flow, Attention-as-explanation, Challenge sets, Influence Functions, neurosymbolic models (semantic loss, deepproblog, logic tensor networks), Symbolic Regression, and Non-parametric Bayesian methods (including Dirichlet, Pitman-Yor, for image and text parsing).

Study materials

Literature

Original research papers, made available through canvas

Software

Python

Objectives

Understanding the main goals and challenges of explainability and interpretability research in AI
Able to implement and apply the main posthoc interpretability tools in computer vision, natural language processing and other AI fields
Able to implement and apply some of the main whiteboxing, 'explainable-by-design' and constrained deep learning approaches in vision and language
Able to design research to evaluate the faithfulness and usefulness of various interpretability and explainability techniques

Teaching methods

Lecture
Presentation/symposium
Self-study
Computer lab session/practical training
Seminar
Working independently on e.g. a project or thesis

Lectures and seminars given by lecturers and guest lecturers will lay the foundation and the scope the theoretical content. Students will experience these notions closely on extensive hands-on workshops during the lab sessions, to be delivered at the end of each week as a technical report, which will require a considerable amount of self-study. Every lecture has associated reading of which will be the content of the weekly quizzes. This will require self-study of the material, in addition to material to be presented, which will help students to internalise these notions and able their skills to successfully communicate them.

Learning activities

Activity	Hours
Hoorcollege	12
Laptopcollege	14
Presentatie	12
Self study	130
Total	168	(6 EC x 28 uur)

Attendance

This programme does not have requirements concerning attendance (OER part B).

Additional requirements for this course:

This is an on-campus course, however we will not take attendance except the presentation sessions.

Assessment

Item and weight	Details
Final grade
0.12 (12%) Journal club presentation.
0.04 (4%) Presence
0.08 (8%) Quiz 1
0.08 (8%) Quiz 2
0.08 (8%) Quiz 3
0.1 (10%) Report – Week 1
0.15 (15%) Report - Week 2
0.15 (15%) Report – Week 3
0.2 (20%) Final mini-project

Presentation sessions are done w.r.t. corresponding week's reading list, which will be related to the lectures and the workshop content. Quizzes are online and multiple choice.

Inspection of assessed work

Once an assignment is submitted, and graded, it will be uploaded on Canvas with the feedback, and students can inspect the given grade and the background feedback

Assignments

Technical reports as a delivery of workshops on each week. (40% in total)
Journal Club group presentations (attendance + presentation: 16% in total)
Weekly Quizzes (24% in total)
Final mini-group project and poster presentation (20% in total).

All the assignments are done individually except presentations and final mini-project.

Fraud and plagiarism

The 'Regulations governing fraud and plagiarism for UvA students' applies to this course. This will be monitored carefully. Upon suspicion of fraud or plagiarism the Examinations Board of the programme will be informed. For the 'Regulations governing fraud and plagiarism for UvA students' see: www.student.uva.nl

Course structure

Weeknummer	Onderwerpen	Studiestof
1	Introduction and Posthoc Interpretability & Transfomers Interpretability
2	Attribution Methods & Interpretable by design I (neuro-symbolic systems)
3	Guest Lecture I (Mech Interp.). & Interpretable by design II
4	Guest Lecture II (Mech. Interp.), Summary

Additional information

For this course's website, and websites of other courses of the ILLC's 'Natural Language Processing & Digital Humanities' group, see: https://cl-illc.github.io/teaching.html

Contact information

Coordinator

dr. rer. nat. Erman Acar

Staff

Dr. A. Lucic (Lecturer)
Melika Davood Zadeh (Senior TA)
Satchit Chatterji MSc (TA)
A.C. Lumadjeng MSc (TA)
Angela van Sprang MSc (TA)
Despoina Touska BSc. (TA)
Adrian Sauter BSc. (TA)
Marcel Vélez Vásquez MSc (TA)

Owner	Master Artificial Intelligence
Coordinator	dr. rer. nat. Erman Acar
Part of	Master Artificial Intelligence, Master Logic, Master Logic, track Logic and Computation, Master Logic, track Logic and Philosophy, Master Logic, track Logic and Language, Master Logic, track Logic and Mathematics, Master Logic, track Logic Year, year 1 Master Logic, track Exchange, year 1

Interpretability & Explainability in AI