# Foundations of Statistics and Machine Learning

Vak
2021-2022

Please note this is the course description from last year

Prerequisites are basic probability theory (including laws of large numbers, central limit theorem) and statistics (maximum likelihood, least squares). Knowledge of machine learning or more advanced probability/statistics may be useful but is not essential. In particular, all stochastic process/martingale theory that is needed will be developed from scratch.

## Description

A large fraction (some claim > 1/2) of published research in top journals in applied sciences such as medicine and psychology is irreproduceable. In light of this 'replicability crisis', classical statistical methods, most notably testing based on p-values, have recently come under intense scrutiny. Indeed, p-value based tests but also other methods like confidence intervals and Bayesian methods have mostly been developed in the 1930s - and they are not really suitable at all for many 21st century applications of statistics. Most importantly, they do not deal well with situations in which new data can keep coming in. For example, based on the results of existing trials, one decides to do a new study of the same medication in a new hospital; or: whenever you type in new search terms, google can adjust the model that decides what advertisements to show to you.

In this class we first review the classical approaches to statistical testing, estimation and uncertainty quantification (confidence) and discuss what each of them can and cannot achieve. These include Fisherian testing, Neyman-Pearson testing, Jeffreys-Bayesian (all from the 1930s), sequential testing (1940s) and pure likelihood-based (1960s) approaches. From the confidence perspective, it includes classical (Neyman-Pearson) confidence intervals and Bayesian posteriors. For each of these we treat the mathematical results underlying them (such as complete class theorems and the 'law of likelihood') and we give examples of common settings in which they are mis-used. All these approaches, while quite different and achieving different goals, have difficulties in the modern age, in which "optional continuation" is the rule rather than the exception. We will also treat approaches from the 1980s and 1990s based on data-compression ideas.

We will then treat the one approach which seems more suitable for the modern context: the always-valid-confidence sets of Robbins, Darling and Lai (late 1960s), which has its roots in sequential testing (Wald, 1940s). The always-valid-approach has recently been re-invigorated and extended. The mathematics behind it involves martingale-based techniques such as Doob's optional stopping theorem, advanced concentration inequalities such as a finite-time law of the iterated logarithm and information-theoretic concepts such as the relative entropy.

The central organizing principle in our treatment is the concept of likelihood and its generalization, nonnegative supermartingales.

## Course Objectives

• Understand the notions of likelihood and its application in the classical statistical paradigms (frequentist, Bayesian, sequential)

• Understand the notion of nonnegative test martingale and its application in always-valid testing and estimation

• Understand the powers and limitations of existing statistical methods

## Mode of instruction

Weekly lectures. Bi-weekly exercise sessions in which homework of type (a) is discussed.
Homework consisting of (a) math exercises and (b) a project involving doing a few experiments with an R package.

## Assessment method

The final grade consists of homework (40%) and a written (retake) exam (60%). To pass the course, the grade for the (retake) exam should be at least 5 and the (unrounded) weighted average of the two partial grades at least 5.5. No minimum grade is required for the homework in order to take the exam or to pass the course. The homework counts as a practical and there is no retake for it; it consists of at least 5 written assignments, of which the lowest grade is dropped, as well as a small programming assignment.

## Literature

Parts of

• R. Royall, Statistical Evidence: a likelihood paradigm ( Chapman & Hall/CRC, 1999)

• P. Grünwald, the Minimum Description Length Principle (MIT Press, 2007, freely available on internet)

• handouts that will be made available during the lectures

## Website

The course is present on Brightspaca and we will use it heavily.

## Contact

By email: pdg[at]cwi.nl