Natuurlijke Taalverwerking

Natural Language Processing

6 EC

Semester 2, periode 4

5082NATA6Y

Eigenaar Bachelor Kunstmatige Intelligentie
Coördinator dr. S. Pezzelle
Onderdeel van Minor Logic and Computation, jaar 1Bachelor Kunstmatige Intelligentie, jaar 2
Links Zichtbare leerlijnen

Studiewijzer 2025/2026

Globale inhoud

Natural language is the main channel of communication between humans, and much of human knowledge is represented in the form of natural language. Enabling computers to understand it is an extremely important task, and is one of the core problems of artificial intelligence. Though full understanding still remains a remote goal, robust methods have been developed for more shallow forms of processing, and these methods and corresponding formalisms are the focus of this course.

In this course, you learn about formalisms and techniques to assign probabilities to (parts of) sentences (language modelling) and to perform basic forms of syntactic and semantic processing.  These techniques are the foundation of current data-driven computational linguistics and provide building blocks for speech recognition, language understanding, text summarisation, and machine translation systems.

Studiemateriaal

Literatuur

  • Daniel Jurafsky & James H. Martin, 'Speech and Language Processing' (3rd Edition), Pearson Prentice Hall, 2020. Digital version

     

    Lecture notes provided by the lecturers

     

Leerdoelen

  • The student can recognise real-world applications of natural language processing (NLP) technology.
  • The student can recognise challenges in the statistical analysis of linguistic data.
  • The student can outline the statistical approach to NLP.
  • The student can explain the assumptions that underlie the most important NLP models (e.g., naive Bayes classifiers, generalised linear models of text classification and regression, neural text classifiers, Markov models, hidden Markov models, and neural autoregressive models).
  • The student can implement classic and neural parameterisations of the most important models in NLP using a programming language.
  • The student can analyse and predict properties of text using a statistical (classic or neural) model of language (e.g., probability, syntactic structure, sentiment/opinion, topics).
  • The student can compare NLP models in terms of their intrinsic properties, theoretical capabilities, practical feasibility, and empirical performance.
  • The student can discuss the pros and cons of a model in light of the specific NLP problem and availability of data and computational resources.
  • The student can design models and techniques to analyse linguistic data for novel NLP problems.
  • The student acknowledges the social impact of NLP technology, including ethical considerations that arise in the deployment of NLP technology (e.g., demographic misrepresentation, bias confirmation, and privacy).

Onderwijsvormen

  • Werkcollege
  • Hoorcollege
  • Laptopcollege
  • Zelfstudie

Here's how the different kinds of live sessions are used:

  • Hoorcollege. These sessions will focus on the theory (textbook) and practice through pen-and-paper exercises or interactive quizzes.
  • Werkcollege. These sessions will involve solving lists of exercises that help achieve a deep understanding of the material and are representative of the level at which students will be assessed in exams.
  • Laptopcollege. These sessions will involve coding tutorials and programming assignments using Jupyter notebooks. 

Verdeling leeractiviteiten

Activiteit

Uren

Deeltoets

5

Hoorcollege

24

Laptopcollege

12

Werkcollege

10

Zelfstudie

117

Totaal

168

(6 EC x 28 uur)

Aanwezigheid

Aanwezigheidseisen opleiding (OER-B Artikel B-4.10):

  • Voor sommige studieonderdelen geldt een aanwezigheidsplicht. Indien er een aanwezigheidsplicht geldt, dan staat dit aangegeven in de studiegids. De onderbouwing voor, en invulling van, deze aanwezigheidsplicht kan per vak verschillen, en is opgenomen in de studiewijzer. Wanneer studenten niet voldoen aan deze aanwezigheidsplicht kan het onderdeel niet met een voldoende worden afgerond.

Aanvullende eisen voor dit vak:

Attendance is not mandatory but highly encouraged. Our course's activities are interconnected and they are designed under the premise that students engage actively both with self-study and with live sessions. 

Toetsing

Onderdeel en weging Details

Eindcijfer

0.8 (80%)

Exam component

Moet ≥ 5 zijn

0.2 (20%)

Homework

The grade will be 20% homework and 80% exams (average of midterm and final). Both components (exam and homework) are graded on a scale from 0 to 10.

It is necessary, though not sufficient, to obtain a grade of at least 5 on the exam component in order to pass the course. If you receive a grade below 5 on this component, or if the weighted average of your homework and exam grades is insufficient to pass the course, you are eligible to resit the exam component. In that case, the resit grade fully replaces the original exam grade.

Dutch scaling

In compliance with official UvA regulations, your final grade will be between 1 and 10 with half-point precision for grades between 1 and 5 and between 6 and 10. Canvas Final Grades will take care of rounding your grade to the closest half point, or to the closest point if it falls between 5 and 6. 

Passing the course

To pass the course, your final grade (after Dutch scaling) must be 6.0 or more. In addition, as described above, you are required to score at least 5.0 on the exam component.

Inzage toetsing

Normally, graded assignments will be available on a platform such as ANS, which supports feedback and discussion of the assessment.

Opdrachten

This course makes use of ungraded (formative) assessment in the form of quizzes and exercises available on Canvas or ANS, as well as graded (summative) assessment in the form of exam-like exercises on ANS and programming assignments. 

For ungraded exercises, personalized feedback is usually not available, but a detailed answer model is provided for self-assessment. Moreover, the student is welcome to seek feedback from a TA in an appropriate moment (e.g., werkcollege).

Personalized feedback on graded exercises is generally possible. Graded assignments are typically hosted by a platform such as ANS, which supports personalised feedback and discussion of assessments. 

Feedback on programming assignments is provided by the TAs through Canvas and/or in person, but answer models are not made available to students.

Fraude en plagiaat

Dit vak hanteert de algemene 'Fraude- en plagiaatregeling' van de UvA. Hier wordt nauwkeurig op gecontroleerd. Bij verdenking van fraude of plagiaat wordt de examencommissie van de opleiding ingeschakeld. Zie de Fraude- en plagiaatregeling van de UvA: http://student.uva.nl

Weekplanning

Weeknummer Onderwerpen Studiestof
1 Introduction to NLP  
2 Text classification  
3 Feature learning  
4 Midterm exam  
5 Language Modelling  
6 Sequence-to-Sequence models  
7 Self-supervised pretraining & Recap  
8 Final exam  

Aanvullende informatie

The course will be taught in English.

Prerequisite skills: Basic probability theory, basic statistics, and programming in Python. 

Contactinformatie

Coördinator

  • dr. S. Pezzelle

The course will be taught by Dr. S. Pezzelle and Dr. W. Aziz.