Elementary knowledge of machine learning, probability theory, linear algebra (vector spaces), and data structures is recommended.
Search engines, the internet and cheap powerful hardware have drastically changed the way humans deal with information. Whereas thirty years ago librarians were still classifying books and articles using subject codes, nowadays search technology has become ubiquitous on desktop computers and mobile devices. This course covers both the theory and practice of the field of Information Retrieval, with a focus on to textual content (the courses 4343AUDIO and 4343MMIRL focus on audiovisual content).
This course covers the following aspects:
1. How can we formalize search for information and how can we evaluate search systems?
2. Which document features (e.g. term statistics) could be used to associate a ‘meaning’ to a text?
3. How can we extend the notion of relevance by looking at context and learn from interaction?
4. How can these elements be combined to classify a text or to perform relevance ranking in order to build a search engine?
5. Which data structures and techniques are essential for computational efficiency?
6. Advanced topics such as personalization, recommender systems, learning to rank and responsible information retrieval
1. Introduction and Boolean retrieval
2. Evaluation and test collections
3. Indexing and compression
4. Vector space model
5. Neural IR
6. Probabilistic IR
7. Language Modeling for IR
8. (student presentations about critical review of research paper)
9. Learning to rank
10. Web search
11. Query and session analysis
12. Responsible IR
13. Conversational search and domain specific IR
By the end of the course, the student should have a thorough understanding of:
the foundations of information retrieval models
evaluation methods for IR systems
efficient data structures and complexity of search and indexing algorithms
machine learning for ranking
technologies and relevance models for web search
analysis of query and session logs
how to conduct responsible IR in research and practice
applications and challenges if IR
reviewing a scientific information retrieval publication
In addition, the student should have some practical experience with information retrieval experiments (PyTerrier, ElasticSearch).
The most recent timetable can be found at the Computer Science (MSc) student website.
You will find the timetables for all courses and degree programmes of Leiden University in the tool MyTimetable (login). Any teaching activities that you have sucessfully registered for in MyStudyMap will automatically be displayed in MyTimeTable. Any timetables that you add manually, will be saved and automatically displayed the next time you sign in.
MyTimetable allows you to integrate your timetable with your calendar apps such as Outlook, Google Calendar, Apple Calendar and other calendar apps on your smartphone. Any timetable changes will be automatically synced with your calendar. If you wish, you can also receive an email notification of the change. You can turn notifications on in ‘Settings’ (after login).
For more information, watch the video or go the the 'help-page' in MyTimetable. Please note: Joint Degree students Leiden/Delft have to merge their two different timetables into one. This video explains how to do this.
Mode of instruction
Lectures (2h / week) and literature
Homework (weekly): getting more acquainted with the new lecture material through exercises, mostly taken from the course book.
- Critical review of a recent IR research paper (presentation and report)
- Applying lecture concepts on a real-world dataset (report)
There is no lab session.
The course grade will be computed as follows:
- Homework (weekly exercises, individual) – 10%
- Critical review of a scientific paper (in groups) – 10%
- Practical assignment (in groups) – 20%
- Final written exam (closed book) – 60%
The grade of the homework exercises is based on the number of completed exercises (n_completed/n_total∗10).
The grade for the exam needs to be at least 5.5 to pass the course. The exam has a regular written re-sit opportunity. The teacher will inform the students how the inspection of and follow-up discussion of the exams will take place.
Christopher D. Manning, Hinrich Schütze, and Prabhakar Raghavan: Introduction to information retrieval, 2008, Cambridge University Press. ISBN: 978-0521865715 https://nlp.stanford.edu/IR-book/
Pretrained Transformers for Text Ranking: BERT and Beyond by Jimmy Lin, Rodrigo Nogueira, and Andrew Yates (University of Waterloo, University of Campinas, University of Amsterdam). Morgan & Claypool https://arxiv.org/abs/2010.06467
Additional literature will be distributed on Brightspace.
From the academic year 2022-2023 on every student has to register for courses with the new enrollment tool MyStudyMap. There are two registration periods per year: registration for the fall semester opens in July and registration for the spring semester opens in December. Please see this page for more information.
Please note that it is compulsory to both preregister and confirm your participation for every exam and retake. Not being registered for a course means that you are not allowed to participate in the final exam of the course. Confirming your exam participation is possible until ten days before the exam.
Extensive FAQ's on MyStudymap can be found here.
Prof. dr. W. Kraaij