Course manual 2024/2025

Course content

This highly technical course focuses on the preparation and life-cycle management of data for production machine learning deployments. The course starts by recapping fundamentals about relational data processing and dataflow systems. Subsequently, students learn about encoding, storing and managing vectorised feature representations of heterogeneous input data sources for machine learning applications, and the architecture of current state-of-the- art systems for this task such as Google’s Tensorflow Extended Platform. Concurrently, the students will be exposed to foundational theory for this problem space, such as incremental view maintenance for relational data, fine-grained data provenance tracking via provenance semi-rings and differential computation.

In addition, students will learn to identify, quantify and address common quality issues with respect to the completeness and consistency of the data. Furthermore, they will learn about technical challenges with respect to the compliance with regulations for private data such as the “right-to-be-forgotten” from GDPR. Finally, students will be exposed to ongoing research efforts in this space such as ML pipeline debugging or error detection techniques from data-centric AI. In addition, they will have the opportunity to discuss the practical implications of the covered technologies with invited industry experts.

Study materials

Literature

  • Scientific papers

  • Book chapters

  • Presentation slides

Syllabus

  • Detailed information about the course and grading will be discussed in the first lecture

Practical training material

  • Programming assignments

  • Examplde code

Software

  • Java and Python-based open source software

Objectives

  • Describe the lifecycle of data in systems employing machine learning and predictive analytics
  • Describe and employ fundamental methods for creating and maintaining datasets such as incremental view maintenance, fine-grained data provenance and differential computation
  • Implement efficient data preparation programs using state-of-the-art relational and dataflow processing systems
  • Identify and potentially correct data issues related to data quality, privacy violations or technical bias
  • Design and validate a scalable data architecture for preparing and maintaining data for predictive analytics

Teaching methods

  • Lecture
  • Working independently on e.g. a project or thesis
  • Presentation/symposium
  • Self-study
  • Computer lab session/practical training

Learning activities

Activity

Hours

Hoorcollege

12

Laptopcollege

8

Presentations 6
Self-Study 120
     

Attendance

Programme's requirements concerning attendance (TER-B):

  • In the case of a practical training, the student must attend at least 100% of the practical sessions. Should the student attend less than 100%, the student must repeat the practical training, or the Examinations Board may have one or more supplementary assignments issued.
  • In the case of a tutorial, the student must attend at least 100% of the tutorial sessions. Should the student attend less 100%, the student must repeat the tutorial, or the Examinations Board may have one or more supplementary assignments issued.

Additional requirements for this course:

Participation will be measured. Attendance in the lab sessions is needed in order to attain the programming skills and background required for the assignments and the project.

Assessment

Item and weight Details

Final grade

Details for the grading of the assignments and project will be made available during the course.

Assignments

  • Three individual programming assignments
  • Project with presentation and paper to be conducted in groups of 3-4 students

Fraud and plagiarism

The 'Regulations governing fraud and plagiarism for UvA students' applies to this course. This will be monitored carefully. Upon suspicion of fraud or plagiarism the Examinations Board of the programme will be informed. For the 'Regulations governing fraud and plagiarism for UvA students' see: www.student.uva.nl

Course structure

Weeknummer Onderwerpen Studiestof
     
     
     
     
     
     
     
     

Contact information

Coordinator

  • dr. H. Harmouch

Staff

  • Antonios Georgakopoulos
  • D.I. Jackson MSc
  • Yichun Wang MSc