Studiegids

nl en

Reinforcement Learning

Vak
2025-2026

Admission requirements

Assumed prior knowledge
1. Bachelor level knowledge of Classical Machine learning (training, testing, loss function, overfitting, bias/variance trade-off), for example in a course on Artificial Intelligence, Machine Learning, Data Science, or Data Mining.
2. Bachelor level knowledge of Deep Learning (Neural networks, Backpropagation, Convolutional neural networks, PyTorch, WandB), for example in a course introduction to deep learning.
3. Appropriate mathematical background (Discrete mathematics, Computational Complexity, Graph theory, Probability theory) for Master program computer science.
4. Bachelor level proficiency in the Python programming language (design, implementation, debugging) and common packages (NumPy, MatPlotLib).

Be aware that the course Reinforcement Learning goes fast, this is not an introductory course. If your knowledge of the above is limited, better fix it before entering the course.
Although this course is listed as "mandatory" for some Master tracks, students can always request changes to their study program at the Board of Examiners, for example if there might be a mismatch in prior knowledge requirements; experience has shown the Board may look friendly upon such requests.

Description

Deep reinforcement learning is a field of Artificial Intelligence that has attracted much attention since impressive achievements in Robotics, and games such as StarCarft and Go, where human world champions were defeated by computer players. These results build on a combination of the rich history of reinforcement learning research and deep learning.
This course teaches the field of deep reinforcement learning: How does it work, why does it work, and what are the reinforcement learning methods on which Robotics and AlphaGo’s success are based? By the end of the course you should have acquired a good understanding of the field of deep reinforcement learning.

The defining characteristic of reinforcement learning is that agents learn through interaction with an environment, not unlike humans learn. Instead of telling a learner which action to take, the agent analyzes which action to take so as to maximize a reward signal. Reinforcement learning is a powerful technique for solving sequential decision problems.

The defining characteristic of deep learning is that the model generalizes, it build a hierarchy of abstract features from its inputs.

Prominent reinforcement learning problems occur, amongst others, in games and robotics. In this course you will learn the necessary theory to apply reinforcement learning to realistic problems from the field of computer game playing.
The following topics and algorithms are planned to be discussed:

  • Sequential Decision Problems and Markov Decision Processes

  • Tabular Value-based Reinforcement Learning, such as Q-learning

  • Deep Value-based Reinforcement Learning, such as DQN

  • Policy-based Reinforcement Learning, Robotics

  • Model-based Reinforcement Learning, World models

  • Two-Agent Self-Play, AlphaGo

  • Multi-Agent Reinforcement Learning (Poker, StarCraft)

  • Reinforcement Learning in (Reasoning) Large Language Models

This a hands-on course, in which you will be challenged to build working programs with different reinforcement learning methods. This is a challenging course in which proficiency in Python and deep learning libraries (such as Keras and PyTorch) is important.
All assignments are made in Python. You must have Python programming experience. Please see prior knowledge.

Course objectives

After completing the reinforcement learning course, the students should be able to:

  • Create and apply Deep Reinforcement Learning to model sequential decision problems and have a deep understanding of the field. They know how reinforcement learning differs from supervised learning and unsupervised learning.

  • They should analyse and apply the Markov Decision Process formalism.

  • Students must evaluate and apply model-free, model-based, and value-based and policy-based approaches, for single agent, two-agent, and multi-agent problems. Students should be proficient in applying these approaches to these problems.

  • They should be able to relate their knowledge to applications in games and robotics, such as breakthrough applications in AlphaGo and robotics. Students will also be able to understand how reinforcement learning is used in recent applications such as large language models, and how metalearning helps generalization.

Timetable

The most recent timetable can be found at the Computer Science (MSc) student website.

In MyTimetable, you can find all course and programme schedules, allowing you to create your personal timetable. Activities for which you have enrolled via MyStudyMap will automatically appear in your timetable.

Additionally, you can easily link MyTimetable to a calendar app on your phone, and schedule changes will be automatically updated in your calendar. You can also choose to receive email notifications about schedule changes. You can enable notifications in Settings after logging in.

Questions? Watch the video, read the instructions, or contact the ISSC helpdesk.

Note: Joint Degree students from Leiden/Delft need to combine information from both the Leiden and Delft MyTimetables to see a complete schedule. This video explains how to do it.

Mode of instruction

  • Literature (see below). The relevant chapters should be read before the corresponding lecture.

  • Lectures

  • Computer lab

Course load
Hours of study: 168 hrs (= 6 EC)
Lectures: 26:00 hrs
Seminars: 26:00 hrs
Practical assignments: 70:00 hrs
Examination and preparation: 46:00 hrs

Assessment method

Assignments (2-4) to check ability to create and apply RL algorithms (formative and summative), and theory exam to check understanding and evaluation.
A grading rubric for the assignment is provided on Brightspace.

The final grade is a combination of grades for: (1) the written exam (50%, mandatory) and (2) the reports for the assignments (50%, mandatory).
Completed assignments are valid for one year. Failing the course means redoing all assignments again next year.
For assignments there is no re-take, for the exam there is. Exceptions to be discussed with the teacher.
No-show for an assignment yields a 0 for that assignment, and should be compensated with other assignments/exam.

Some assignments are in groups of three. All students should contribute equally.
Academic integrity is fundamental in this course, all sources should be academically referenced. The use of writing aids may improve your texts. You may be invited to explain your report in person to check the depth of the understanding of your own report. Deviations may be reported to the board of examiners who may decide on a sanction.

The teacher will inform the students how the inspection of and follow-up discussion of the exams will take place.

Reading list

Mandatory:

  • Deep Reinforcement Learning. Aske Plaat. Springer 2022. Freely available here.

  • Reasoning with LLM, a survey. Aske Plaat, Annie Wong, Suzan Verberne, Joost Broekens, Niki van Stein, Thomas Back, arXiv 2407.11511, 2024. Freely avaliable [here]

Optional:

  • R. Sutton and A. Barto, Reinforcement Learning: an introduction, MIT Press, Second Edition, 2018. Freely available here.

Course Website

  • https://rl.liacs.nl

Registration

As a student, you are responsible for enrolling on time through MyStudyMap.

In this short video, you can see step-by-step how to enrol for courses in MyStudyMap.
Extensive information about the operation of MyStudyMap can be found here.

There are two enrolment periods per year:

  • Enrolment for the fall opens in July

  • Enrolment for the spring opens in December

See this page for more information about deadlines and enrolling for courses and exams.

Note:

  • It is mandatory to enrol for all activities of a course that you are going to follow.

  • Your enrolment is only complete when you submit your course planning in the ‘Ready for enrolment’ tab by clicking ‘Send’.

  • Not being enrolled for an exam/resit means that you are not allowed to participate in the exam/resit.

Contact

Remarks

There is limited space for students who are not enrolled in the Computer Science programme or one of the Data Science specialisations (Data Science: Computer Science and Astronomy and Data Science). Please contact the programme coordinator/study advisor (mailto:mastercs@liacs.leideuniv.nl) if you are an external student.

Software
All required software is open source. Linux is prefered, macOS may work.