Linguistics and Language Processing
6 EC
Semester 2, periode 5
5082TATA6Y
Our ability to use natural language to communicate and record information is one of the main features that makes us intelligent. However, while we use language effortlessly in our everyday lives, computers have a hard time processing natural languages such as English or Dutch. Computational linguistics is a subfield of artificial intelligence at the interface of linguistic theory and computer science, which aims at endowing computers with the ability to process natural language. The ultimate goal is to develop artificial agents that can automatically acquire information from text or that can communicate with humans via intelligent interfaces or human-robot interaction.
This course introduces students to some of the core topics in computational linguistics and natural language processing. We will focus on foundational aspects, paying special attention to rule-based methods. The course provides background for the second-year course Natuurlijke Taalmodellen en Interfaces, which focuses on data-driven probabilistic methods.
The course covers the following key topics in language processing at an introductory level:
Daniel Jurafsky and James H. Martin, Speech and Language Processing (2nd Edition), Pearson Prentice Hall, 2009. Only around seven or eight chapters will be covered in this course.
Note: you do not necessarily need to buy the book. Indeed, you can find the 3rd edition available online for free.
- Automata, Transducers, and Morphology
- Formal Grammars and Syntax
- From Parsing to Computational Semantics
- Compositional Semantics
- Lexical Semantics
- Distributional Semantics, Information Retrieval, and N-gram Models
Other online materials will be pointed out during the course. Slides will be available on Canvas after each lecture.
The course consists of lectures (hoorcolleges) where the theoretical material is explained and discussed and practical sessions (werkcolleges and laptopcolleges). In the practical sessions, students will work in pairs on exercises related to the contents introduced during the lectures.
|
Activiteit |
Aantal uur |
|
Hoorcolleges |
24 |
|
Werkcolleges |
12 |
|
Laptopcolleges |
12 |
|
Exams |
4 |
|
Zelfstudie |
116 |
Aanwezigheidseisen opleiding (OER-B):
Aanvullende eisen voor dit vak:
Attending a minimum of 70% of practical sessions (werkcolleges and laptopcolleges) is obligatory. If the course has, for example, 11 practical sessions, students can miss no more than 3 sessions. Failure to meet this requirement will result in the student being ineligible to participate in the Resit (hertentamen).
Allowed absences do not need to be reported but should be used wisely. There will be no exceptions for additional absences, unless in case of highly exceptional circumstances discussed with your study advisor during the course.
| Onderdeel en weging | Details |
|
Eindcijfer | |
|
0.6 (60%) Exams | Moet ≥ 5 zijn |
|
25% 22-04-2025 Tentamen | |
|
35% 28-05-2025 Tentamen | |
|
0.4 (40%) Assignments | Moet ≥ 5 zijn |
|
1 (20%) Homework #1 | |
|
1 (20%) Homework #2 | |
|
1 (20%) Homework #3 | |
|
1 (20%) Homework #4 | |
|
1 (20%) Homework #5 |
The course final grade will be computed based on two components: an Exams component worth 60% of the final course grade, and an Assignments component worth 40% of the final course grade. To pass the course, it is necessary (not sufficient) to obtain a grade >= 5 in each one of these two components.
The Exams component (60%) includes a midterm exam and a final exam, which are worth 25% and 35% of the final course grade, respectively. The Assignments component (40%) includes 5 homework assignments, each weighted equally.
If you get a grade lower than 5 on the Exam component, and you met the Attendance requirement, you are eligible to take the Resit (hertentamen), which is worth 60% of the final course grade. The resit will cover all the course materials. As such, the grade of the resit will replace the combined results of the two exams, whatever your original results may have been. It is necessary (not sufficient) to obtain a grade of 5 or more in the Resit to pass the course.
The homework component cannot be substituted. Therefore, if you score less than 5 on it, you will fail the course.
There will be weekly homework assignments, which may be completed in pairs. Students will receive feedback from their TAs on these assignments during werkcolleges and laptopcolleges. Assignments handed in up to 24 hours after the deadline will still be accepted but will incur a grade penalty. Students who have not submitted on time but intend to do so within 24 hours should always communicate this to their TA.
Dit vak hanteert de algemene 'Fraude- en plagiaatregeling' van de UvA. Hier wordt nauwkeurig op gecontroleerd. Bij verdenking van fraude of plagiaat wordt de examencommissie van de opleiding ingeschakeld. Zie de Fraude- en plagiaatregeling van de UvA: http://student.uva.nl
| Weeknummer | Onderwerpen | Studiestof |
| 1 | Automata, Transducers, and Morphology | |
| 2 | Formal Grammars and Syntax | |
| 3 | From Parsing to Computational Semantics | |
| 4 | Midterm exam (tentamen 1) | |
| 5 | Compositional Semantics | |
| 6 | Lexical Semantics | |
| 7 | Distributional Semantics, Information Retrieval, and N-gram Models | |
| 8 | Final exam (tentamen 2) |
While the same attendance criteria apply to any student enrolled in the course, honours students can contact Frank Wildenburg and/or Nienke Reints (TA coordinators) and request to be assigned to another group in case the one assigned by default overlaps with other activities.
The Hoorcolleges will be given in English, as well as the homework and exam questions. Werkcolleges will be conducted in Dutch. Students may answer written questions in either English or Dutch. Basic knowledge of Python and first-order logic will be taken for granted; no other previous knowledge of linguistics is required.
The TAs should be the first point of contact for day-to-day issues related to the course. For unusual or extreme circumstances, e.g., exam time conflicts, contact the course coordinator.