Course manual 2024/2025

Course content

In this course we study the techniques to ensure interpretability of modern AI techniques, and to generate explanations for the classifications, decisions and predictions that these systems make. We consider applications of these techniques in diverse subfields of AI, ranging from machine vision to translation, and from speech recognition to automatic reasoning.  An important focus will be on post hoc interpretation techniques, that take an existing model (e.g., a deep learning model for object recognition, machine translation or music recommendation), and attempt to interpret the intermediate representations using visualization, attribution or probing methods. A second thread will be studying ways to a priori constrain or bias such models to arrive at more interpretable solutions, e.g. by encouraging sparse representations or by generating explanations as a secondary objective. Finally, we will consider approaches that are inherently explainable, including models with rich symbolic backbones such as neurosymbolic models.  A common theme throughout the course will be the comparison of symbolic and deep learning models, and we will see that classic results from symbolic AI sometimes find a new relevance in their use for helping interpret deep learning systems, or for helping identify their shortcomings.

Core concepts covered in the course: probing (information-theoretic/counterfactual/Pareto-optimal), occlusion, Guided Backpropagation, Deconvolution, Saliency maps, (Deep)LIFT, Layer-wise relevance propagation, Integrated Gradients, Shapley values, Contextual decomposition, (Deep)SHAP, Attention flow, Attention-as-explanation, Challenge sets, Influence Functions, neurosymbolic models (semantic loss, deepproblog, logic tensor networks),  Symbolic Regression, and Non-parametric Bayesian methods (including Dirichlet, Pitman-Yor, for image and text parsing).

 

Study materials

Literature

  • Original research papers, made available through canvas

Software

  • Python

Objectives

  • Understanding the main goals and challenges of explainability and interpretability research in AI
  • Able to implement and apply the main posthoc interpretability tools in computer vision, natural language processing and other AI fields
  • Able to implement and apply some of the main whiteboxing, 'explainable-by-design' and constrained deep learning approaches in vision and language
  • Able to design research to evaluate the faithfulness and usefulness of various interpretability and explainability techniques

Teaching methods

  • Lecture
  • Presentation/symposium
  • Self-study
  • Computer lab session/practical training
  • Seminar
  • Working independently on e.g. a project or thesis

Lectures and seminars given by lecturers and guest lecturers will lay the foundation and the scope the theoretical content. Students will experience these notions closely on extensive hands-on workshops during the lab sessions, to be delivered at the end of each week as a technical report, which will require a considerable amount of self-study. Every lecture has associated reading of which will be the content of the weekly quizzes. This will require self-study of the material, in addition to material to be presented, which will help students to internalise these notions and able their skills to successfully communicate them. 

Learning activities

Activity

Hours

Hoorcollege

12

Laptopcollege

14

Presentatie

12

Self study

130

Total

168

(6 EC x 28 uur)

Attendance

This programme does not have requirements concerning attendance (OER part B).

Additional requirements for this course:

This is an on-campus course, however we will not take attendance except the presentation sessions. 

Assessment

Item and weight Details

Final grade

0.12 (12%)

Journal club presentation.

0.04 (4%)

Presence

0.08 (8%)

Quiz 1

0.08 (8%)

Quiz 2

0.08 (8%)

Quiz 3

0.1 (10%)

Report – Week 1

0.15 (15%)

Report - Week 2

0.15 (15%)

Report – Week 3

0.2 (20%)

Final mini-project

Presentation sessions are done w.r.t. corresponding week's reading list, which will be related to the lectures and the workshop content. Quizzes are  online and multiple choice. 

Inspection of assessed work

Once an assignment is submitted, and graded,  it will be uploaded on Canvas with the feedback, and students can inspect the given grade and the background feedback 

Assignments

  • Technical reports as a delivery of workshops on each week. (40% in total)
  • Journal Club group presentations (attendance + presentation:  16%  in total)
  • Weekly Quizzes (24% in total)
  • Final mini-group project and poster presentation (20% in total).

All the assignments are done individually except presentations and final mini-project. 

Fraud and plagiarism

The 'Regulations governing fraud and plagiarism for UvA students' applies to this course. This will be monitored carefully. Upon suspicion of fraud or plagiarism the Examinations Board of the programme will be informed. For the 'Regulations governing fraud and plagiarism for UvA students' see: www.student.uva.nl

Course structure

Weeknummer Onderwerpen Studiestof
1 Introduction and Posthoc Interpretability & Transfomers Interpretability   
2 Attribution Methods &  Interpretable by design I (neuro-symbolic systems)   
3 Guest Lecture I (Mech Interp.). & Interpretable by design II   
4 Guest Lecture II (Mech. Interp.), Summary  

Additional information

For this course's website, and websites of other courses of the ILLC's 'Natural Language Processing & Digital Humanities' group, see: https://cl-illc.github.io/teaching.html

Contact information

Coordinator

  • dr. rer. nat. Erman Acar

Staff

  • Dr. A. Lucic (Lecturer)
  • Melika Davood Zadeh (Senior TA)
  • Satchit Chatterji MSc (TA)
  • A.C. Lumadjeng MSc (TA)
  • Angela van Sprang MSc (TA)
  • Despoina Touska BSc. (TA)
  • Adrian Sauter BSc. (TA)
  • Marcel Vélez Vásquez MSc (TA)