This course gives an overview of techniques for automated learning from ill-understood data for which it is hard or impossible to formulate a model that is even approximately correct. Here “learning” means: “finding structure, patterns, regularities” and using these patterns to predict future data. The field is also known as “machine learning”, since many contributions in this field have their origin in computer science areas (pattern recognition, artificial intelligence).
Main topics in the course will be (1) supervised learning (regression and classification, but with a strong focus on the latter); (2) model selection and model averaging, (3) predictive analysis including sequential prediction. The methods discussed will include various classical and state-of-the-art classification methods: naive Bayes, perceptrons (1960s), neural networks, decision trees (1980s), logistic regression , boosting support vector machines, Gaussian processes and other kernel approaches (2000s). We explain interrelations between these methods and analyze their large-sample behaviour. As for model selection and averaging, we again consider both classical and state of the art methods including AIC, BIC, Bayes factor model averaging, Minimum Description Length (MDL), Structural Risk Minimization (SRM), Shrinkage, Lasso and other L1- methods. We explain how all these methods are related to Bayesian and non-Bayesian methods for combining predictors, and again we analyze their large-sample behavior.
An introduction to Statistical Learning
For the course days, course location and class hours check the Time Table 2013-14 under the tab “Masters Programme” at http://www.math.leidenuniv.nl/statscience
Mode of Instruction
Lectures and practicals (partly computer practicals, partly exercises).
A written open-book exam (50%)
Two assignments (each 25%)
Both homework assignments involve setting up some experiments in R, experimenting, and writing a short report about the results. Discussing the problems in the group is encouraged, but every participant must do her or his experiments and write her or his report on her or his own.
The date for the written exam is scheduled on January 8, 2014 from 14.00 – 17.00, the resit is scheduled on February 14, 2014 from 10.00 – 13.00.
- T. Hastie, R. Tibshirani, J. Friedman, The Elements of Statistical Learning, 2nd edition, 2009.
Handouts of some (very few) papers
Besides the registration for the (re-)exam in uSis, course registration via blackboard is compulsory.
Exchange and Study Abroad students, please see the Prospective students website for information on how to apply.
avdvaart [at] math [dot] leidenuniv [dot] nl
Visit for possible changes, updates etc.
This is an elective course in the Master’s programme of the specialisation Statistical Science for the Life & Behavioural sciences.