Course manual 2023/2024

Course content

The underlying question behind this course is: How do search engines work? To answer this question we dive into the details of information retrieval, the field that deals with search. During the course we discuss the various parts of search engines:

- Retrieval models: how do we retrieve relevant documents for a given query? And how do we rank these documents in the right order?

- Evaluation: given a working retrieval system, how do we determine its performance and how can we compare it to other systems?

Besides these two basics of information retrieval we explore other frequently used techniques, theories, and models (e.g., relevance feedback, learning to rank, and semantic search).

During the course the students are required to perform IR experiments. Goal of these experiments is to get acquainted with IR experimental methodology, get hands on experience with open source retrieval systems and large datasets, and to be able to apply and adjust theoretical models to fit the task at hand. Besides running the experiments, evaluating and analyzing the results in an important part of the practical side of this course

Study materials

Literature

  • Selected chapters from: C.D. Manning, P. Raghavan, H. Schutze. 'Introduction to Information Retrieval', Cambridge University Press, 2008. Available as PDF from http://nlp.stanford.edu/IR-book/
  • Selected chapters from: J. Lin, R. Nogueira, A. Yates. 'Pretrained Transformers for Text Ranking.' Available from https://arxiv.org/pdf/2010.06467.pdf

  • For topics that are not discussed in sufficient detail in the book, we use additional conference or journal papers. This can also happen when the book's content is outdated for a particular topic.

Objectives

  • Transform theoretical models to a working system
  • Describe, explain, and compare algorithms
  • Use different evaluation methods to perform experiments
  • Analyze results, compare different algorithms, and draw conclusions
  • Make decisions and adapt model parameters on the basis of experimentation
  • Modify algorithms to suit other information access tasks

Teaching methods

  • Lecture
  • Computer lab session/practical training

Lectures. Some lab sessions may be held to enable students to put theory to practice.

Learning activities

Activity

Number of hours

Zelfstudie

128

Attendance

This programme does not have requirements concerning attendance (OER part B).

Assessment

Item and weight Details

Final grade

1 (100%)

Tentamen digitaal

Final grade is 50% assignments and 50% exam. Passing requires having a minimum of 5.5/10 on the assignment average AND minimum of 5.5/10 on the exam score. If the resit exam is taken, the resit exam score replaces the original exam score.

Assignments

Assignment 1

  • Graded assignment made in a small group

Assignment 2

  • Graded assignment made in a small group

Assignment 3

  • Graded assignment made in a small group

Assignments contribute 50% of the final grade (with the remaining 50% coming from the exam).

Fraud and plagiarism

The 'Regulations governing fraud and plagiarism for UvA students' applies to this course. This will be monitored carefully. Upon suspicion of fraud or plagiarism the Examinations Board of the programme will be informed. For the 'Regulations governing fraud and plagiarism for UvA students' see: www.student.uva.nl

Course structure

Weeknummer Onderwerpen Studiestof
1
2
3
4

Additional information

Recommended prior knowledge:

  1. Machine learning
    • A good understanding machine learning algorithms is necessary (e.g. linear/logistic regression, naive bayes, svm, neural nets, trees, and possibly ensemble methods)
    • An understanding and practical experience with estimation and optimization methods (e.g. expectation-maximization) is also necessary
  2. Natural language processing 
    • For most of the course shallow NLP methods are in use. However, entity identification, and disambiguation, and distributional semantics (e.g. word2vec) are important in the course.
  3. Software engineering skills 
    • The course does not require to build large software or write immense amounts of code; however you should be ready to write code that for instance processes a set of web-pages and creates a dictionary of words and the web-pages they are found in in an efficient manner.
    • Python is the language of preference and iPython Notebooks are used for submitting assignments, but any other language is also fine.

Contact information

Coordinator

  • dr. Andrew Yates

Staff