nl en

Reinforcement Learning


Admission requirements

Assumed prior knowledge

  1. One or more of a Bachelor level course on Artificial Intelligence, Machine Learning, Data Science, or Data Mining.
  2. Bachelor level proficiency in the Python programming language.
  3. Some familiarity with combinatorial games such as Chess, Checkers, Go, Othello is highly advisable.


Reinforcement learning is a field of Artificial Intelligence that has attracted much attention since impressive achievements in Backgammon, Atari, and most recently Go, where human world champions were defeated by computer players. These results build upon a rich history of reinforcement learning research.
This course uses AlphaGo as a motivating example for teaching the field of reinforcement learning: How does it work, why does it work, and what are the reinforcement learning methods on which AlphaGo’s success is based? By the end of the course you should understand in detail how and why AlphaGo works, and have acquired a good understanding of the field of reinforcement learning. In this course we will focus on reinforcement learning with traditional two-player combinatorial games such as chess, checkers, and go.
The defining characteristic of reinforcement learning is that agents learn through interaction with an environment, not unlike humans learn by doing. Instead of telling a learner which action to take, the agent analyzes which action to take so as to maximize a reward signal. Reinforcement learning is a powerful technique for solving sequential decision problems.
Prominent reinforcement learning problems occur, amongst others, in games and robotics. In this course you will learn the necessary theory to apply reinforcement learning to realistic problems from the field of computer game playing.
The following topics and algorithms will be discussed:

  • Reinforcement Learning as Markov Decision Problem – Q-learning

  • Heuristic Planning – alpha-beta

  • Adaptive Sampling – Monte Carlo Tree Search

  • Function Approximation – DQN Deep Reinforcement Learning

  • Self-Play – AlphaGo

In addition, some history on computer game playing in Chess, Checkers, Backgammon, Go and Hex will be covered; developments in Poker and StarCraft will be touched upon; and the role of reinforcement learning in artificial intelligence and the relation with psychology will be discussed (human learning).
This a hands-on course, in which you will be challenged to build working game playing programs with different reinforcement learning methods: heuristic planning, adaptive sampling, function approximation and self-play. This is a challenging course in which proficiency in Python is important.
The field of reinforcement learning is highly active, with many algorithms available. Because of the prominence of Python in this field of research, this course is based on Python. All assignments should be made in Python. The assignment on function approximation uses Tensorflow and Keras.
If you are interested in reinforcement learning then the highly popular course on Deep Learning and Neural Networks by Wojtek Kowalczyk may be a great match, and the course on Modern Game AI Algorithms by Mike Preuss too.

Course objectives

After completing the reinforcement learning course, the students should be able to:

  • Understand the key features and components of reinforcement learning;

  • Knowledge of theoretical foundations on basic and advanced reinforcement learning techniques;

  • Understand the scientific state-of-the-art in the field of reinforcement learning, with application in games;

  • Understand four main reinforcement learning methods and their practical use in a simple game: heuristic planning, adaptive sampling, function approximation and self-play.


The schedule can be found on the student's website.

Mode of instruction

  • Literature (see below). The relevant chapters should be read before the corresponding lecture.

  • Lectures

  • Computer lab

Course load

Hours of study: 168 hrs (= 6 EC)
Lectures: 26:00 hrs
Seminars: 26:00 hrs
Practical assignments: 70:00 hrs
Examination and preparation: 46:00 hrs

Assessment method

The final grade is a combination of grades for: (1) the written exam (20%, mandatory) and (2) the 4 reports including Python source code on the 4 practical assignment(s) (each 20%, mandatory, in total 80%).
Completed assignments are valid for one year. Failing the course means redoing all assignments again next year.

The teacher will inform the students how the inspection of and follow-up discussion of the exams will take place.

Reading list


  • A. Plaat, Learning to Play: Reinforcement Learning and Games, Leiden 2019, pre-print. Freely available here.


  • R. Sutton and A. Barto, Reinforcement Learning: an introduction, MIT Press, Second Edition, 2018. Freely available here.


  • You have to sign up for courses and exams (including retakes) in uSis. Check this link for information about how to register for courses.

  • There is limited space for students who are not enrolled in the Computer Science programme or one of the Data Science specialisations (Data Science: Computer Science and Astronomy and Data Science). Please contact the programme coordinator/study advisor ( if you are an external student.