Reinforcement Learning, 2021-2022 - Prospectus

Admission requirements

Assumed prior knowledge

One or more of a Bachelor level course on Artificial Intelligence, Machine Learning, Data Science, or Data Mining.
Bachelor level proficiency in the Python programming language.
Some familiarity with combinatorial games such as Chess, Checkers, Go, Othello is highly advisable.

Description

Reinforcement learning is a field of Artificial Intelligence that has attracted much attention since impressive achievements in Backgammon, Atari, and most recently Go, where human world champions were defeated by computer players. These results build upon a rich history of reinforcement learning research.
This course uses AlphaGo as a motivating example for teaching the field of reinforcement learning: How does it work, why does it work, and what are the reinforcement learning methods on which AlphaGo’s success is based? By the end of the course you should understand in detail how and why AlphaGo works, and have acquired a good understanding of the field of reinforcement learning. In this course we will focus on reinforcement learning with traditional two-player combinatorial games such as chess, checkers, and go.
The defining characteristic of reinforcement learning is that agents learn through interaction with an environment, not unlike humans learn by doing. Instead of telling a learner which action to take, the agent analyzes which action to take so as to maximize a reward signal. Reinforcement learning is a powerful technique for solving sequential decision problems.
Prominent reinforcement learning problems occur, amongst others, in games and robotics. In this course you will learn the necessary theory to apply reinforcement learning to realistic problems from the field of computer game playing.
The following topics and algorithms are planned to be discussed:

Reinforcement Learning as Markov Decision Problem – Q-learning
Heuristic Planning and Adaptive Sampling – Monte Carlo Tree Search
Function Approximation – DQN Deep Reinforcement Learning, Policy-based algorithms
Self-Play – AlphaGo

In addition, some history on computer game playing in Chess, Checkers, Backgammon, Go and Hex will be covered; developments in Poker and StarCraft will be touched upon; and the role of reinforcement learning in artificial intelligence and the relation with psychology will be discussed (human learning).
This a hands-on course, in which you will be challenged to build working game playing programs with different reinforcement learning methods: heuristic planning, adaptive sampling, function approximation and self-play. This is a challenging course in which proficiency in Python is important.
The field of reinforcement learning is highly active, with many algorithms available. Because of the prominence of Python in this field of research, this course is based on Python. All assignments should be made in Python. The assignment on function approximation uses Tensorflow and Keras.
If you are interested in reinforcement learning then the highly popular course on Deep Learning and Neural Networks by Wojtek Kowalczyk may be a great match, and the course on Modern Game AI Algorithms by Mike Preuss too.

Course objectives

After completing the reinforcement learning course, the students should be able to:

Understand the key features and components of reinforcement learning;
Knowledge of theoretical foundations on basic and advanced reinforcement learning techniques;
Understand the scientific state-of-the-art in the field of reinforcement learning, with application in games;
Understand four main reinforcement learning methods and their practical use in a simple game: heuristic planning, adaptive sampling, function approximation and self-play.

Timetable

The most recent timetable can be found at the Computer Science (MSc) student website.

Mode of instruction

Literature (see below). The relevant chapters should be read before the corresponding lecture.
Lectures
Computer lab

Course load

Hours of study: 168 hrs (= 6 EC)
Lectures: 26:00 hrs
Seminars: 26:00 hrs
Practical assignments: 70:00 hrs
Examination and preparation: 46:00 hrs

Assessment method

The final grade is a combination of grades for: (1) the written exam (20%, mandatory) and (2) the 4 reports including Python source code on the 4 practical assignment(s) (each 20%, mandatory, in total 80%).
Completed assignments are valid for one year. Failing the course means redoing all assignments again next year.

The teacher will inform the students how the inspection of and follow-up discussion of the exams will take place.

Reading list

Mandatory:

A. Plaat, Learning to Play: Reinforcement Learning and Games, Springer 2020. Freely available here. (A new book is in preparation.)

Optional:

R. Sutton and A. Barto, Reinforcement Learning: an introduction, MIT Press, Second Edition, 2018. Freely available here.

Registration

You have to sign up for courses and exams (including retakes) in uSis. Check this link for information about how to register for courses.
There is limited space for students who are not enrolled in the Computer Science programme or one of the Data Science specialisations (Data Science: Computer Science and Astronomy and Data Science). Please contact the programme coordinator/study advisor (mailto:mastercs@liacs.leideuniv.nl) if you are an external student.

Contact

Lecturer: prof. dr. Aske Plaat
Teaching assistants: Matthias Müller-Brockhausen, Thomas Moerland, Mike Huisman

Reinforcement Learning