Machine Learning in Chemistry

6 EC

Semester 2, period 5

5254MLIC6Y

Owner Master Chemistry (joint degree)
Coordinator prof. dr. ir. B. Ensing
Part of Master Chemistry (joint degree), track Molecular Sciences,
Links Visible Learning Trajectories

Course manual 2025/2026

Course content

The course Machine Learning for Chemistry will provide a broad understanding of current deep learning methodologies and their application in chemical research. Rather than a formal exposure, it will consist of a more hands-on approach tailored to students interested in applying deep learning to (molecular) scientific problems. The course is targeted at a broad audience: from theoretical chemists who wish to dive into data-driven science, to experimental chemists keen on integrating machine learning in their work.

The course will first review briefly foundational aspects of probability and information theoretic concepts together with an overview of machine learning basics (as treated more detailed in the chemistry bachelor course AI for Science). We will then focus on a range of popular deep learning techniques that are particularly useful in chemistry, including graph-neural networks,  diffusion and flow models, transformers and large language models, and Bayesian optimisation. The exposition of deep learning models will be illustrated on relevant chemical applications, such as structure-property prediction, generation of molecules with specific properties, and guiding autonomous lab experiments.

The course is provided as a lecture series (2 times 2 hours per week) plus hands-on laptop sessions (1 time 2 hours per week). The theoretical aspects of deep learning and generative AI for molecular science, taught in the lectures, will be applied by programming assignments during the laptop sessions. The laptop assignments start in the first weeks with deep learning exercises provided as Jupyter Notebooks that run in an internet  browser or the Microsoft Visual Studio Code software on the laptop and contain information, open questions and (to be completed) Python computer codes. In the second part, the students  will work in pairs on one larger deep learning project, during which they develop and implement a deep learning algorithm for a molecular science application. The final Jupyter notebook together with a presentation of the project in the last week will count for 25% of the final grade. The other 75% of the grade is obtained with a written exam.

Proficiency with programming in Python is very strong pre.
Having passed the Chemistry or STI Bachelor course "AI for Science", or similar, is also advantageous, but not essential.

 

Study materials

Literature

  • Book: "Deep Generative Modeling" by Jakub M. Tomczak

Software

  • Python, Jupyter notebooks, MS vscode

  • Relevant libraries: Numpy, scikit-learn, PyTorch, Torch geometric, BOTorch

Other

  • Keynote lecture slides

Objectives

  • Summarize founding concepts of modern machine learning techniques.
  • Analyze popular deep learning schemes used in molecular science (e.g., variational auto-encoders, diffusion models, flow models, large language models, bayesian optimisation models).
  • Evaluate applicability of deep learning to specific chemistry applications (e.g., for structure-property prediction, inverse design of molecules and materials, experiment automatisation).
  • Integrate machine learning libraries and test performance in own code.

Teaching methods

  • Lecture
  • Computer lab session/practical training
  • Self-study

Learning activities

Activity

Hours

Hoorcollege

28

Laptopcollege

14

Tentamen

2

Werkcollege

0

Self study

124

Total

168

(6 EC x 28 uur)

Attendance

This programme does not have requirements concerning attendance (TER part B).

Additional requirements for this course:

None of the lectures is mandatory. However, as this course does not (yet) have a course syllabus and instead refers to (rather formal) sections of books and scientific articles without a clear connection between them, it is highly recommended to attend the lectures. The lecturer aims to record all the lectures and put the videos online on Canvas, but this is not guaranteed.

Assessment

Item and weight Details

Final grade

0.25 (25%)

Final Project Assignment and Presentation

0.75 (75%)

Tentamen

Assignments

In the first three week, we will work on three Jupyter notebook assignments (during two hands-on laptop session and as homework), in which the basics of the machine learning workflow, neural networks (using PyTorch) and graph neural networks are practiced. These initial assignments serve as preparation for the larger (4-week) project in the second half of the course.

Since nowadays, Jupyter notebooks can largely be completed using LLMs such as co-pilot, the notebooks are not graded. Instead, we will take twice (in week 16 and 17) a fast (max. 10 minute)  "pre-test" consisting of multiple choice questions at the beginning of the computer classes, to test if the student has adequately studied the notebooks and acquired the relevant skills from the assignments. A maximum of 1.0 bonus point can be gained with these two tests; the bonus only applies if the final grade from the exam (75%) plus project (25%) is higher than 5.5 (in other words, the bonus cannot be used to pass the course if the total grade from the other components is insufficient). The purpose of the two "pre-tests" is (1) to encourage the student from the beginning to focus on understanding their algorithms and python code, and (2) to give the student early feedback on their learning performance.

The final project (which can be carried out alone or in a team of max 2 students) during the second 4 weeks is graded and counts as 25% of the final grade (before bonus). Also during this final assignment, students are allowed to use co-pilot or other AI to develop their machine learning solution. Note however that the ability of the student(s) to explain their code in the end-presentation counts heavily towards the grade for the project.  

Fraud and plagiarism

The 'Regulations governing fraud and plagiarism for UvA students' applies to this course. This will be monitored carefully. Upon suspicion of fraud or plagiarism the Examinations Board of the programme will be informed. For the 'Regulations governing fraud and plagiarism for UvA students' see: www.student.uva.nl

Course structure

Weeknummer Onderwerpen Studiestof
1 Intro machine learning Lecture slides
2 Generative AI Lecture slides
3 Graphs and GNNs Lecture slides
4 Diffusion and Flow models Lecture slides
5 Transformers and language models Lecture slides
6 Bayesian optimisation and self-driving labs Lecture slides
7 MLIPs and surrogate models for molecular modeling Lecture slides
8 Exam  

Additional information

A modern laptop is needed for the computer practica. At least one week in advance of the first lecture, information on preparing the laptop will be made available on the Canvas website. In particular, installation of (mini-)conda/mamba with an environment of python packages is required to take part of the computer practica. Windows users may need to setup/install WSL with a version of Linux, such as Ubuntu, as explained on the Canvas site.

Contact information

Coordinator

  • prof. dr. ir. B. Ensing

Staff

  • Bernd Ensing
  • Leontii Shtokolenko