# Statistical learning

Course
2017-2018

-

## Description

This course gives an overview of techniques for automated learning from ill-understood data for which it is hard or impossible to formulate a model that is even approximately correct. Here “learning” means: “finding structure, patterns, regularities” and using these patterns to predict future data. The field is very similar to an area within computer science called “machine learning”, since many contributions in this field have their origin in computer science (pattern recognition, artificial intelligence).

Main topics in the course will be (1) supervised learning (regression and classification, but with a strong focus on the latter); (2) model selection; (3) basic clustering; basic optimization.The methods discussed will include various classical and state-of-the-art classification methods: LDA (1930s), naive Bayes, perceptrons (1960s), decision trees (1980s), logistic regression, boosting and support vector machines (2000s), neural networks and deep learning. We explain interrelations between these methods and analyze their behaviour. As for model selection, we again consider both classical and state of the art methods including various forms of cross-validation, Ridge, Lasso and other L1- methods. As to clustering, we consider the classic k-means and EM methods. For optimization, we will cover stochastic gradient descent, which is the most widely used method to train neural networks.

See www.timvanerven.nl/teaching/statlearn2017/ for detailed course information.

## Prerequisites

• Familiarity with least squares linear regression

• Ability to program in R or Python

## Course objectives

An introduction to Statistical Learning

## Time Table

For the course days, course location and class hours check the Time Table under the tab “StatSci Students -> Program Schedule” at http://www.math.leidenuniv.nl/statisticalscience

## Mode of Instruction

Lectures and computer practicals.

## Assesssment method

• A written open-book exam (50%)

• Two assignments (each 25%)

It is required to have a passing score both for the assignments and for the exam. This means at least a 5.5 average for the assignments and a 5.5 for the exam.

Both homework assignments involve setting up some experiments in R or Python, experimenting, and writing a short report about the results. Discussing the problems with other students is encouraged, but every participant must do their own experiments and write a report on their own.

Date information about the exam and resit can be found in the Time Table pdf document under the tab “Masters Programme” at http://www.math.leidenuniv.nl/statscience. The room and building for the exam will be announced on the electronic billboard, to be found at the opposite of the entrance, the content can also be viewed here http://info.liacs.nl/math/.

• T. Hastie, R. Tibshirani, J. Friedman, The Elements of Statistical Learning, 2nd edition, 2009.

• Handouts of some (very few) papers and about optimization

## Registration

Enroll in Blackboard for the course materials and course updates.

To be able to obtain a grade and the ECTS for the course, sign up for the (re-)exam in uSis ten calendar days before the actual (re-)exam will take place. Note, the student is expected to participate actively in all activities of the program and therefore uses and registers for the first exam opportunity.

Exchange and Study Abroad students, please see the Prospective students website for information on how to apply.

## Contact information

Tim van Erven: tim@timvanerven.nl

## Remarks

• This is an elective course in the Master’s programme of the specialisation Statistical Science for the Life & Behavioural sciences.