Natuurlijke Taalmodellen en Interfaces

Natural Language Models and Interfaces

6 EC

Semester 2, periode 4

5082NTIT6Y

Eigenaar Bachelor Kunstmatige Intelligentie
Coördinator W. Ferreira Aziz
Onderdeel van Minor Logic and Computation, jaar 1Bachelor Kunstmatige Intelligentie, jaar 2Bachelor Future Planet Studies, major Kunstmatige Intelligentie, jaar 3

Studiewijzer 2022/2023

Globale inhoud

Natural language is the main channel of communication between humans, and much of human knowledge is represented
in the form of natural language. Enabling computers to understand it is an extremely important task, and is one of the
core problems of artificial intelligence. Though full understanding still remains a remote goal, robust methods have been
developed for more shallow forms of processing, and these methods and corresponding formalisms are the focus of this
course.

In this course you learn about formalisms and techniques to assign probabilities to (parts of) sentences (language
modeling) and to perform basic forms of syntactic and semantic processing.  These techniques are the foundation of current data-driven computational linguistics and provide building blocks for speech recognition,
language understanding, text summarisation, and machine translation systems.

Studiemateriaal

Literatuur

  • Daniel Jurafsky & James H. Martin, 'Speech and Language Processing' (3rd Edition) Pearson Prentice Hall, 2020.

     

    Digital version

Leerdoelen

  • The student can recognise real world applications of natural language processing (NLP) technology
  • The student can recognise challenges in the statistical analysis of linguistic data
  • The student can outline the statistical approach to NLP
  • The student can explain the assumptions that underlie statistical models such as Markov models, hidden Markov models, naive Bayes classifiers, logistic regression, and conditional random fields
  • The student can implement NLP models using a programming language
  • The student can use statistical methods to analyse and predict properties of text (e.g., probability, syntactic structure, sentiment/opinion, topics)
  • The student can compare statistical models in terms of their intrinsic properties, theoretical capabilities, practical feasibility, and their empirical performance
  • The student can discuss the pros and cons of a statistical model in light of the specific NLP problem and availability of data and computational resources
  • The student can design models and techniques to analyse linguistic data for novel NLP problems
  • The student acknowledges the social impact of NLP technology including ethical considerations that arise in the deployment of NLP technology (e.g., demographic misrepresentation, bias confirmation, and privacy)

Onderwijsvormen

  • Hoorcollege
  • Werkcollege
  • Zelfstudie
  • Laptopcollege

The course employs ideas from flipped learning. There's self-study require prior, during and after live sessions. Live sessions are of three kinds (hoorcollege, werkcollege, and laptopcollege).

Prior to a live session, students work individually or in group to complete some amount of self-study. For example, they read some pre-specified material or work through a pre-specified tutorial. 

In a live session, and with the help of instructors (e.g., teachers in hoorcollege, TAs in werkcollege or laptopcollege), students work individually or in group to deepen their understanding of the subject matter. 

After a live session, students work individually or in group to complete some amount of self-study. This is usually focussed on exercising and continuously assessing the students' own understanding of the subject.

Here's how the different kinds of live sessions are used:

  • Hoorcollege. These sessions will focus on the theory (textbook) and practice through pen-and-paper exercises or other form of interactive quizzes.
  • Werkcollege. These sessions will involve solving lists of exercises that help achieve deep understanding of the material and are representative of the level at which students will be assessed in exams.
  • Laptopcollege. These sessions will involve coding tutorials and programming assignments using jupyter notebooks. 

 

Verdeling leeractiviteiten

A 6 EC course is a 6x28 hours commitment. We spread these over 8 weeks, where 6 weeks are content weeks and 2 weeks are exam weeks. Here we show you a break down of the expected commitment per learning activity:

Learning Activity

Hours per week

Hoorcollege

4

Werkcollege

2

Laptopcollege

2

Self-study

12

Totaal

20

This amounts to 120 hours over the content weeks. The remaining 48 hours will normally go to exam preparation, and other deadlines.  

Aanwezigheid

Aanwezigheidseisen opleiding (OER-B):

  • Voor practica en werkgroepbijeenkomsten met opdrachten geldt een aanwezigheidsplicht. De invulling van deze aanwezigheidsplicht kan per vak verschillen en staat aangegeven in de studiewijzer. Wanneer studenten niet voldoen aan deze aanwezigheidsplicht kan het onderdeel niet met een voldoende worden afgerond.

Aanvullende eisen voor dit vak:

We do not monitor attendance, but live classes may cover exercises that contribute to the grade. 

If hybrid classes will be needed due to covid19 restrictions, we will livestream them, but be warned that we will not necessarily record classes.

Toetsing

Onderdeel en weging Details

Eindcijfer

1 (100%)

Deeltoets 1

The grade will be 40% homework (weighted average of assignments: graded exercises and programming assignments) and 60% exams (average of midterm and final). Both components (exam and homework) are initially graded on a scale from 0 to 10 and they must each be at least 5. 

You are eligible to a resit of the exam component, in which case the resit grade fully replaces that component. 

Dutch scaling

According to official UvA regulations your final grade has to be between 1 and 10. To avoid confusion, this is how we compute your final grade: 1 + 0.9 * (0.6 * exams + 0.4 * homework). This grade is rounded to the closest half point, or to the closest point if it falls between 5 and 6. 

Passing the course

To pass the course your final grade (after Dutch scaling) has to be at least 6.0. Additionally, your homework and exam components must each be at least 5.0.

Inzage toetsing

Normally, graded assignments will be available on an platform such as ANS which supports feedback and discussion of the assessment.

Opdrachten

This course makes use of ungraded (formative) assessment in the form of quizzes and exercises available on Canvas or ANS, as well as graded (summative) assessment in the form of exam-like exercises on ANS and programming assignments. 

For ungraded exercises, personalised feedback is usually not available, but a detailed answer model is provided for self-assessment. Moreover, the student is welcome to seek feedback from a TA in an appropriate moment (e.g., werkcollege).

Personalised feedback of graded exercises is generally possible. Answer models are generally not made available to students for these exercises. Graded assignments are typically hosted by a platform such as ANS which supports personalised feedback and discussion of assessments. 

Feedback on programming assignments is provided by the TAs through Canvas and/or in person, but answer models are not made available to students.

Fraude en plagiaat

Dit vak hanteert de algemene 'Fraude- en plagiaatregeling' van de UvA. Hier wordt nauwkeurig op gecontroleerd. Bij verdenking van fraude of plagiaat wordt de examencommissie van de opleiding ingeschakeld. Zie de Fraude- en plagiaatregeling van de UvA: http://student.uva.nl

Weekplanning

Week number Subjects
1 Introduction to NLP and statistics for NLP
2 Generative models of text classification
3 Discriminative models of text analysis (classification and regression)
4 Midterm
5 Neural models of text analysis
6 Language modelling
7 Sequence labelling
8 Final exam

Rooster

Het rooster van dit vak is in te zien op DataNose.

Aanvullende informatie

The course will be taught in English.

Prerequisite skills: Basic probability theory, basic statistics, programming in python. 

Verwerking feedback studenten

These are changes motivated by student feedback:

  • Mid-term (since 2019)
  • Ungraded tutorial-style programming notebooks are available in addition to graded programming assignments  (since 2021)
  • Exam-like exercises in werkcolleges (new in 2022)
  • No written technical report (new in 2022)
  • Perusall is not part of the grade (new in 2022)

Contactinformatie

Coördinator

  • W. Ferreira Aziz