Taaltheorie en Taalverwerking

Linguistics and Language Processing

6 EC

Semester 2, periode 5

5082TATA6Y

Eigenaar Bachelor Kunstmatige Intelligentie
Coördinator dr. J.A. Burgoyne
Onderdeel van Bachelor Kunstmatige Intelligentie, jaar 1Minor Kunstmatige Intelligentie, jaar 1Bachelor Bèta-gamma, major Kunstmatige Intelligentie, jaar 2

Studiewijzer 2016/2017

Globale inhoud

Our ability to use natural language to communicate with each other and to record in- formation is one of the main features that makes us intelligent. However, while we use language effortlessly in our everyday life, it is not easy to program computers to cor- rectly process natural languages such as English or Dutch. Computational linguistics is a subfield of artificial intelligence at the interface of linguistic theory and computer science, which aims at endowing computers with the ability to process natural lan- guage. The ultimate goal is to develop artificial agents that can automatically acquire information from text or that can communicate with humans via intelligent interfaces or in human-robot interaction.

This course introduces students to some of the core topics in computational lin- guistics and natural language processing. We will focus on foundational aspects, pay- ing special attention to rule-based methods. The course provides background for the second-year course Natuurlijke Taalmodellen en Interfaces, which focuses on data- driven probabilistic methods.

To adequately follow the course, you need to understand first-order logic (up to the level achieved in the courses Logisch Programmeren en Zoektechnieken and Inleiding Logica). These skills are taken for granted and will not be taught in this course. During the course, you will learn—amongst other things—how Python and first-order logic can be used to analyse and process the structure and meaning of natural language expressions.

Studiemateriaal

Literatuur

  • The main resource for the course is the following textbook: Jurafsky, Daniel and James H. Martin. 2009. Speech and Language Processing. 2nd ed. Upper Saddle River, NJ: Pearson.

     

    This book provides a comprehensive and in-depth overview of the field. In this course we will not cover the entire book (only around two thirds of it). The book is also used in the course Natuurlijke Taalmodellen en Interfaces (second year BSc AI).

Overig

  • Het materiaal op de Blackboard site van Practicum Academische Vaardigheden (www.practicumav.nl)
  • Other online materials will be pointed out during the course. Slides will be available on Blackboard after each lecture.

Leerdoelen

By the end of the course, students should be capable of the following, divided over four topics.

1. Formal languages and automata

  • Describe the properties of regular expressions and automata.

  • Apply formalisms and automata to represented formal and natural languages.

  • Compare languages, automata, and formal grammars with different levels of complexity.

2. Syntactic structure and syntactic parsing

  • Apply a formal grammar to analyse the syntactic structure of natural language sentences.

  • Implement grammars using NLTK in Python to account for syntactic facts.

  • Describe the properties of different parsing methods and explain how they work.

3. Logic-based compositional semantics

  • Represent the meaning of natural language sentences with logic-based formulas.

  • Derive these semantic formulas systematically and compositionally.

  • Implement compositional semantic grammars using NLTK in Python to account for semantic facts.

4. Word meaning and semantic similarity

  • Describe and give examples of lexical semantic relations.
  • Apply and compare different types of word sense disambiguation algorithms.
  • Apply and compare different types of semantic similarity measures.

Other learning objectives concern the enhancement of your presentation skills. These are described in the section on Academic Skills and the Practicum Academische Vaardigheden (PAV).

Onderwijsvormen

  • Hoorcollege
  • Werkcollege
  • Laptopcollege
  • Presentatie/symposium
  • Zelfstudie

Lectures

Please arrive sufficiently early that you can be settled at the beginning of each lecture (9.00). If for some reason you are late, please enter and take your seat as quietly and non-disruptively as possible. Always switch off your mobile phone before entering a class room.

Students tend to recall lecture material better if they take notes with pen and paper, but laptops are allowed. If you do choose to take notes on a laptop, be careful not to let messages and other online temptations distract you: few people multitask as well as they think they do (the instructor included), and there is very little material on Facebook that is worth failing a class. However you choose to take notes, don’t panic

if you miss a detail: the slides will become available after each lecture. (Note-taking is also more effective when you focus on high-level concepts instead of details.)

Students are strongly advised to attend every lecture. The core material is also available through the slides and the assigned literature, so you will be able to cope in case you have to miss a session. However, many of the subtleties of the material can only be communicated in person. If you do have to miss a session, ask a fellow student for what has been covered; please do not ask the instructor or TAs.

Practical Sessions

The main purpose of the practical sessions is to give you the opportunity to work on exercises (both practice exercises and homework exercises to be submitted for grading) under supervision, with direct access to expert help. The practical sessions will be run by the TAs. The TAs may start these sessions by giving feedback on previous homework assignments, going over some of the main concepts introduced in the lecture that day, and proposing some practice exercises related to the new material.

You should use part of the practical sessions to work on the homework assignments. This way you will have no problems finishing the assignments by the deadline. The practical sessions are also the time to approach your TA in case you have questions regarding your graded homework or in case you have difficulties understanding some of the concepts covered in class.

Verdeling leeractiviteiten

The average student should devote 21 hours of work to this course every week (8 hours attending lectures and labs, approximately 9 hours reading and studying material such as lecture slides, and approximately 5 additional hours completing homework exercises). Students who are unable to make this time commitment should not expect to do well.

Academische vaardigheden

In this course, presentation skills are addressed. In the last teaching week of the course you will give a presentation with a fellow student on a topic of your choice (out of a given set of topics). The main aim of this exercise is to practice your presentation skills while reflecting on how the concepts covered in the course bear on practical natural language processing applications. Learning objectives concern the content and structure of your presentation, your (non-)verbal presentation skills, and the use of slides.

The requirements and evaluation criteria for the duo presentation will be specified in separate documents on Blackboard.

In the PAV students will prepare their presentation step-by-step. This includes a computerpracticum on presentation slides and practicing your presentation once, receiving feedback. The PAV is an obligatory part of this course only for BSc KI students, although all students in the course are welcome and encouraged to attend. BSc KI students who completed the course Tutoraat BSc KI 1 (50821TKI0Y) and do not wish to follow PAV should contact Anja Ruhland during the first week of the course (a.m.ruhland@uva.nl) in order to receive an exemption; non-KI students who do wish to follow PAV this term should also contact Anja Ruhland for more information. See the Studiewijzer PAV (Blackboard Portfolio Academische Vaardigheden) for more information.

Aanwezigheid

Aanwezigheidseisen opleiding (OER-B):

  • Voor practica en werkgroepbijeenkomsten met opdrachten geldt een aanwezigheidsplicht. De invulling van deze aanwezigheidsplicht kan per vak verschillen en staat aangegeven in de studiewijzer. Wanneer studenten niet voldoen aan deze aanwezigheidsplicht kan het onderdeel niet met een voldoende worden afgerond. .

Aanvullende eisen voor dit vak:

To pass the course, you may miss no more than three of the practical sessions. This policy includes absences for illness and emergencies: don’t use up your absences early in the block and expect an exception if something happens later! Barring illness and emergencies, most students should expect to attend every practical session.

Toetsing

Onderdeel en weging Details

Eindcijfer

32.5%

Tentamen 1

32.5%

Tentamen 2

20%

Homework assignments

15%

Duo presentation

In order to pass the course, you must score at least a 5 (weighted average) in both the practical part and the theoretical part (and you need to score at least a 5.5 over- all). If you attend both exams but fail one or both of them – and have at least a 5 in the practical part – then you may sit the hertentamen. The hertentamen will replace the combined results of two exams, whatever your original results may have been. If you intend to sit the hertentamen, you must let the instructor know in June. The hertentamen should be a last resort: students who rely on it often get unpleasant surprises.

The examinable material consists of all slides used during class, all other material given as assigned reading, and all concepts covered in any of the exercises assigned throughout the course. The first exam will cover the material from the first three teaching weeks and the second exam will cover the material from the remaining four teaching weeks (however, in practice you need to have mastered some of the material from the first part to be able to work on questions covering the second part).

The exams will consist of a set of questions (between 6 and 10) that test whether you have achieved the learning objectives specified above. The questions will be of the same kind as the exercises you will encounter in the weekly assignments (including some NLTK questions where you will be asked to produce simple programs with pen and paper). If you are able to solve the assignment exercises on your own (both the homework and the self-study exercises) then you should have no problem passing the exams.

Opdrachten

Each week, a number of exercises will be made available via Blackboard. Some of these will be graded and will need to be submitted by the deadline indicated (these are called the homework assignments), while others are only intended for self study. Although nobody will check explicitly that you have completed the self-study exercises, the lectures, practical sessions, and exams will assume that you have.

Simplified sample solutions will be made available after the homework submission deadline. If you have questions, please ask the TAs for feedback during the practical sessions. Understanding how to arrive at a solution with the guidance of your TA is much more useful than simply looking at a sample solution.

You are encouraged to work on the assignments in pairs and to submit one set of homework answers per pair. Of course, you can work on your own if you wish, but it is often useful to collaborate with a fellow student. Note that working in pairs does not mean working in groups or relying on the work of others. It is never acceptable to copy directly from others, and it is also not okay to search the web (or the literature) systematically for solutions to the exercises. You are expected to be able to explain your solutions fully, whether you work with a fellow student or not. The purpose of homework is to understand your own level of ability, and if you cut corners, you will have serious difficulties passing the exams.

All homework must be submitted via Blackboard. Unless instructed otherwise, prepare your homework answers using LATEX (you will be given a template) and submit a single PDF file. If you work on your homework together with a fellow student, each of you should submit exactly the same file (twice). Your file should have the following name: “...pdf”.

All deadlines are strict. If you are late submitting your homework, even by just a couple of minutes, you score 0 points for that week. If at some point you feel there are

special circumstances that warrant an exception to this rule, then please get in touch with the instructor. If those circumstances are of a medical nature, then please bring appropriate paperwork documenting your case to the studieadviseur, who will advise on a final decision. The TAs are not authorised to grant any kind of extension; please do not put them into a difficult situation by approaching them about such issues.

Some homework assignments will include programming exercises. In such cases, you need to submit an additional iPython notebook. Your iPython notebook should be named exactly as your PDF file, but it should obviously have an iPython notebook ex- tension: “...ipynb”. The programming assignments will be uploaded as iPython notebooks themselves, and you should add your answers directly to these notebooks.

Make sure your notebooks run without problems. If your solution does not run, you will receive no points for the relevant questions.

Fraude en plagiaat

Dit vak hanteert de algemene 'Fraude- en plagiaatregeling' van de UvA. Hier wordt nauwkeurig op gecontroleerd. Bij verdenking van fraude of plagiaat wordt de examencommissie van de opleiding ingeschakeld. Zie de Fraude- en plagiaatregeling van de UvA: www.uva.nl/plagiaat

Weekplanning

In block 5, courses runs over a total of nine weeks (due to the many holidays in May). The 4th and the 9th week are reserved for exams. The remaining teaching weeks all have a similar structure:

  • two lectures (hoorcolleges),
  • two practical sessions (werkcolleges or computer labs),
  • homework (due Sunday at 23:59 unless announced otherwise), and
  • one PAV session (for all KI students and any other interested students).

Please consult the official schedule at http://datanose.nl/ for the exact times of the lectures and the practical sessions and for details regarding rooms.

Rooster

Het rooster van dit vak is in te zien op DataNose.

Aanvullende informatie

Information regarding the course is available on a dedicated Blackboard site accessible from http://blackboard.uva.nl. There you will find detailed information on obliga- tory and recommended readings for each week, as well as the slides used in the lectures (which will be uploaded after each lecture). The homework assignments will be made available via this site and you are also required to use it for submitting your answers to the assignments.

This course is taught in English. For your homework, which often will consist of technical exercises, you are encouraged to use English as well, as this is the language of the textbook and the lingua franca of research everywhere in the world. However, if you feel you cannot express yourself precisely enough in English, please write in Dutch. For the exams, you may prefer to write in Dutch, but using English or alternating between the two languages is also fine.

Contactinformatie

Coördinator

  • dr. J.A. Burgoyne

John Ashley Burgoyne is the instructor for this course (j.a.burgoyne@uva.nl, room F2.07 at Science Park 107). To get to Science Park 107, you need to enter via the NIKHEF building at Science Park 105 and tell the receptionist that you want to visit the ILLC. Office hours are Thursdays from 14.00 to 16.00, during which any student may drop in without an appointment to discuss questions or other matters related to the course.

There are five teaching assistants who run the practical sessions and grade the homework:

  • Verna Dankers: verna_dankers@hotmail.com
  • Mara Fennema: maradfennema@gmail.com
  • Adriaan de Vries: adriaan.de.vries@hotmail.com
  • Douwe van der Wal: douwev.dwal@live.nl
  • Nick de Wolf: n.j.g.dewolf@uva.nl

Each student will be allocated to the group of one TA. Your TA will grade your homework and is the first person to contact for all questions regarding the course. Please note that the instructor is normally only able to respond to teaching-related e-mails during business hours on Tuesdays, Wednesdays, and Thursdays.

If we need to contact you, e.g., in case of questions regarding your homework, we will use the e-mail address stored for you in the UvA system (or any other address that you have ever used to contact anyone of us in the past). Please check your e-mail regularly.