Course manual 2025/2026

Course content

This highly technical course focuses on the preparation and life-cycle management of data for production machine learning deployments. The course starts by recapping fundamentals about relational data processing and dataflow systems. Subsequently, students learn about encoding, storing and managing vectorised feature representations of heterogeneous input data sources for machine learning applications, and the architecture of current state-of-the- art systems for this task such as Google’s Tensorflow Extended Platform. Concurrently, the students will be exposed to foundational theory for this problem space, such as incremental view maintenance for relational data, fine-grained data provenance tracking via provenance semi-rings and differential computation.

In addition, students will learn to identify, quantify and address common quality issues with respect to the completeness and consistency of the data. Furthermore, they will learn about technical challenges with respect to the compliance with regulations for private data such as the “right-to-be-forgotten” from GDPR. Finally, students will be exposed to ongoing research efforts in this space such as ML pipeline debugging or error detection techniques from data-centric AI. In addition, they will have the opportunity to discuss the practical implications of the covered technologies with invited industry experts.

Study materials

Literature

Scientific papers
Book chapters
Presentation slides

Syllabus

Detailed information about the course and grading will be discussed in the first lecture

Practical training material

Programming assignments
Examplde code

Software

Java and Python-based open source software

Objectives

Describe the lifecycle of data in systems employing machine learning and predictive analytics.
Implement efficient data preparation programs using state-of-the-art relational and dataflow processing systems.
Identify and potentially correct data issues related to data quality, privacy violations or technical bias.
Design and validate a scalable data architecture for preparing and maintaining data for predictive analytics.

Teaching methods

Lecture
Working independently on e.g. a project or thesis
Presentation/symposium
Self-study
Computer lab session/practical training

Learning activities

Activity	Hours
Hoorcollege	12
Laptopcollege	8
Presentations	6
Self-Study	120

Attendance

Some course components require compulsory attendance. If compulsory attendance applies, this will be indicated in the Course Catalogue which can be consulted via the UvA-website. The rationale for and implementation of this compulsory attendance may vary per course and, if applicable, is included in the Course Manual.

Additional requirements for this course:

Participation will be measured. Attendance in the lab sessions is needed in order to attain the programming skills and background required for the assignments and the project.

Assessment

Item and weight	Details
Final grade

Details for the grading of the assignments and project will be made available during the course.

Assignments

Three individual programming assignments
Project with presentation and paper to be conducted in groups of 3-4 students

Fraud and plagiarism

The 'Regulations governing fraud and plagiarism for UvA students' applies to this course. This will be monitored carefully. Upon suspicion of fraud or plagiarism the Examinations Board of the programme will be informed. For the 'Regulations governing fraud and plagiarism for UvA students' see: www.student.uva.nl

Course structure

Weeknummer	Onderwerpen	Studiestof

Owner	Master Computer Science (joint degree)
Coordinator	dr. H. Harmouch
Part of	Master Computer Science (joint degree), track Foundations of Computing and Concurrency, Master Computer Science (joint degree), track Software Engineering & Green IT, Master Computer Science (joint degree), track Big Data Engineering, Master Computer Science (joint degree), track Computer Systems and Infrastructure, year 1
Links	Visible Learning Trajectories

Data Preparation