Course manual 2023/2024

Course content

The underlying question behind this course is: How do search engines work? To answer this question we dive into the details of information retrieval, the field that deals with search. During the course we discuss the various parts of search engines:

- Retrieval models: how do we retrieve relevant documents for a given query? And how do we rank these documents in the right order?

- Evaluation: given a working retrieval system, how do we determine its performance and how can we compare it to other systems?

Besides these two basics of information retrieval we explore other frequently used techniques, theories, and models (e.g., relevance feedback, learning to rank, and semantic search).

During the course the students are required to perform IR experiments. Goal of these experiments is to get acquainted with IR experimental methodology, get hands on experience with open source retrieval systems and large datasets, and to be able to apply and adjust theoretical models to fit the task at hand. Besides running the experiments, evaluating and analyzing the results in an important part of the practical side of this course

Study materials

Literature

Selected chapters from: C.D. Manning, P. Raghavan, H. Schutze. 'Introduction to Information Retrieval', Cambridge University Press, 2008. Available as PDF from http://nlp.stanford.edu/IR-book/
Selected chapters from: J. Lin, R. Nogueira, A. Yates. 'Pretrained Transformers for Text Ranking.' Available from https://arxiv.org/pdf/2010.06467.pdf
For topics that are not discussed in sufficient detail in the book, we use additional conference or journal papers. This can also happen when the book's content is outdated for a particular topic.

Objectives

Transform theoretical models to a working system
Describe, explain, and compare algorithms
Use different evaluation methods to perform experiments
Analyze results, compare different algorithms, and draw conclusions
Make decisions and adapt model parameters on the basis of experimentation
Modify algorithms to suit other information access tasks

Teaching methods

Lecture
Computer lab session/practical training

Lectures. Some lab sessions may be held to enable students to put theory to practice.

Learning activities

Activity	Number of hours
Zelfstudie	128

Attendance

This programme does not have requirements concerning attendance (OER part B).

Assessment

Item and weight	Details
Final grade
1 (100%) Tentamen digitaal

Final grade is 50% assignments and 50% exam. Passing requires having a minimum of 5.5/10 on the assignment average AND minimum of 5.5/10 on the exam score. If the resit exam is taken, the resit exam score replaces the original exam score.

Assignments

Assignment 1

Graded assignment made in a small group

Assignment 2

Graded assignment made in a small group

Assignment 3

Graded assignment made in a small group

Assignments contribute 50% of the final grade (with the remaining 50% coming from the exam).

Fraud and plagiarism

The 'Regulations governing fraud and plagiarism for UvA students' applies to this course. This will be monitored carefully. Upon suspicion of fraud or plagiarism the Examinations Board of the programme will be informed. For the 'Regulations governing fraud and plagiarism for UvA students' see: www.student.uva.nl

Course structure

Weeknummer	Onderwerpen	Studiestof
1
2
3
4

Additional information

Recommended prior knowledge:

Machine learning
- A good understanding machine learning algorithms is necessary (e.g. linear/logistic regression, naive bayes, svm, neural nets, trees, and possibly ensemble methods)
- An understanding and practical experience with estimation and optimization methods (e.g. expectation-maximization) is also necessary
Natural language processing
- For most of the course shallow NLP methods are in use. However, entity identification, and disambiguation, and distributional semantics (e.g. word2vec) are important in the course.
Software engineering skills
- The course does not require to build large software or write immense amounts of code; however you should be ready to write code that for instance processes a set of web-pages and creates a dictionary of words and the web-pages they are found in in an efficient manner.
- Python is the language of preference and iPython Notebooks are used for submitting assignments, but any other language is also fine.

Contact information

Coordinator

dr. Andrew Yates

Owner	Master Artificial Intelligence
Coordinator	dr. Andrew Yates
Part of	Master Artificial Intelligence, Master Information Studies, track Data Science, year 1 Master Information Studies, track Information Systems, year 1 Master Forensic Science, year 2

Information Retrieval 1