## Admission requirements

It is recommended that students are familiar with linear models, e.g. regression and analysis of variance, and generalized linear models, e.g. logistic regression for binary data. Students should also be familiar with matrix and vector algebra. Within this master this pre-knowledge can be aquired from the courses 'Mathematics for statisticians' and 'Linear and generalized linear models', and it is recommended that students follow these courses first.

## Course Description

The linear model, e.g. analysis of variance or linear regression, and the generalized linear model, e.g. logistic regression for binary data or log linear models for count data, are widely used to analyze data in a variety of applications. However, these models are only appropriate for independent data, e.g. data considered as randomly sampled from some population. In many fields of application dependent data may occur. For instance, because observed animals are housed in the same pen, fertility trends create dependence between plants at close distance, individuals are from the same family or data are collected repeatedly in time for the same subjects or individuals.

Introduction of random effects in the linear or generalized linear model is a simple and constructive expedient to generate feasible dependence structures. The extended classes of models are referred to as linear mixed models (LMMs) and generalized linear mixed models (GLMMs). The use of such models is the subject of this course. Competing models, where dependence is not modeled by introduction of extra random effects, will be discussed as well. Part of this course will focus upon analysis of repeated measurements or longitudinal data.

Inferential techniques comprise restricted (or residual) maximum likelihood (REML), a modified version of maximum likelihood, but also generalized estimation equations (GEE) that require less strenuous model assumptions.

In this course, emphasis will be on gaining an understanding of the models and the kind of data that can be analyzed with these models. Different inferential techniques will be discussed, but without undue emphasis on mathematical rigor.

## Course objectives

Broadly, students, when confronted with practical data should be able (1) to decide whether there is a need to model dependence between the data, (2) to decide upon a model with an appropriate dependence structure and (3) to perform a proper analysis.

The course is in two parts. Part 1 of the course is about the linear mixed model (LMM) and the generalized linear mixed model (GLMM). Broadly, the LMM is about analysis of dependent continuous data (assumed to be normally distributed), and the GLMM is about analysis of dependent discrete data (0-1 data, proportions and percentages, counts). Part 2 of the course is about Longitudinal data (temporal data, repeated measurements). This is about data where units are repeatedly measured in time, e.g. patients that repeatedly visit a health care centre where relevant variables are measured. Below are more detailed goals for each of the two parts of this course.

After part 1 of this course a student should:

Be able to identify, for a practical problem, which factors and variables should be in the model and whether they should be represented by fixed or random effects.

Be able to interpret fixed and random effects in terms of population means and dependence structures.

Be able to explain in general terms how expected mean squares are used for estimation of components of variance and construction of F-tests with the ANOVA method for balanced data.

Have a global understanding, through the ANOVA method for balanced data, how different random main effects and interactions in the model affect inference about fixed effects, and what their implications are for model selection.

In particular be aware of the ins and outs of the analysis of a split-plot model for balanced data, including the use of broken degrees of freedom in some pairwise comparisons.

Be aware of the issues involved in representation of blocks by fixed or random effects in the analysis of data collected according to e.g. a block design when data are unbalanced (recovery of inter-block information).

Be able to motivate the use of restricted maximum likelihood (REML) for unbalanced data.

Know about the relationship between the ANOVA method and REML for balanced data.

Be able to decide what kind of test is required for testing fixed effects (Kenward & Roger Approximate F-test) or dispersion parameters (likelihood ratio test) for unbalanced data.

Be aware of possible boundary problems and remedies in testing dispersion parameters with the likelihood ratio test.

Have an understanding of BLUEs, BLUPs and the mixed model equations (MME). Understand the difference between the normal equations and MMEs when effects of a factor are either introduced as fixed or as random effects in the model.

Be aware of the differences between conditional and marginal means in a GLMM. In particular be aware of shrinkage for binary data and a shift for count data in deriving marginal means.

Be able to decide whether the mean or mode is the proper measure for location in a GLMM for a particular practical problem.

Be able to motivate penalized quasi-likelihood (PQL) for a GLMM by iterated reweighted REML.

Be able to broadly explain the pros and cons of a population averaged approach versus a subject specific approach in modelling dependent discrete data.

After part 2 of this course a student should:

Be able to identify which of the methods presented can be used for the analysis of longitudinal normal and non-normal distributed measurements.

Be able to identify which of the methods are appropriate in studies with unbalanced measurements, long follow-up times or missing data.

Be able, in the presence of missing data, to identify the different mechanisms that generate missing data and which of the discussed methods in this course give valid inference under the different mechanisms.

Be able to identify which are the hypotheses of interest, which model parameters are involved in these hypotheses, and which tests are appropriate?

Be able to decide upon a proper strategy for model building and

Be able to validate model assumptions.

Be able to use software, e.g. R, to perform an analysis with an LMM or GLMM, by the ANOVA method, ML, REML, or GEE.

Be able to interpret output from the software in terms of the practical problem.

Know the strengths and limitations of various procedures for statistical inference with a mixed model.

## Time Table

See the Leiden University students' website for the Statistical Science programme -> Schedules

## Mode of Instruction

The course will consist of a mix of lectures and practicals. Students will discuss some practical case studies in groups in class with respect to choice of model and aspects of statistical inference. PowerPoint presentations are leading and notes are supplied that provide details, worked case studies, R programs and output, and exercises with solutions for self-study. Some books are suggested (optional) for further details. Study material, including data setsfor the case studies mentioned, is available from Blackboard.

About halfway down the course students will start working in groups on case studies that are handed out, under supervision of the teachers. Each group of students will hand in a written report about their case study. This report will be graded and together with the grade of the written exam determines the final grade of an individual student.

## Assessment method

A written exam (2/3) with open questions, presentation of the case study and case study report (1/3) (pass / fail). The case study report and the written exam should each be assessed with a minimum grade of 5 to obtain the course credits. The final grade should be at least 5.5 (which will be rounded to 6) to get a pass. Students may take a written re-exam following the university rules. Unless the student decides to follow the course again in a next year, the final grade for the case study is binding. The date for handing in the case study report will be agreed upon during the course.

Date information about the exam and resit can be found in the Time Table pdf document under the tab “Masters Programme” at http://www.math.leidenuniv.nl/statscience. The room and building for the exam will be announced on the electronic billboard, to be found at the opposite of the entrance, the content can also be viewed here http://info.liacs.nl/math/.

## Reading List

The following books are occasionally referred to for further reading, but they are not compulsory reading for the exam.

Faraway (2006). Extending the linear model with R. generalized linear, mixed effects and nonparametric regression models. Chapman & Hall/CRC

Fitzmaurice, Laird & Ware (2004). Applied longitudinal analysis. John Wiley & Sons.

McCulloch, Searle & Neuhaus (2008) Generalized, linear and mixed models. Wiley Blackwell.

The first two books are indicative for the applied level of this course. The third book is more technical and intended as a reference book. The Faraway book is relevant for the course about linear and generalized linear models as well. These books are occasionally referred to for further reading, but they are not compulsory reading for the exam.

## Registration

Enroll in Blackboard for the course materials and course updates.

To be able to obtain a grade and the EC for the course, sign up for the (re-)exam in uSis ten calendar days before the actual (re-)exam will take place. Note, the student is expected to participate actively in all activities of the program and therefore uses and registers for the first exam opportunity.

Exchange and Study Abroad students, please see the Prospective students website for information on how to apply.

## Contact information

bas [dot] engel [at] wur [dot] nl

## Remarks

- This is a compulsory course of the Master Statistical Science for the Life and Behavioural sciences / Data Science.