In the study of the effect of one or more explanatory variables on a response variable, linear regression and analysis of variance are important techniques. In linear regression we study how a quantitative variable, like the dose of a medicine, influences a quantitative response variable, like blood pressure. In analysis of variance we compare different groups with respect to a quantitative response, e.g. comparing the yields of different corn varieties. The statistical models that underlie these techniques are special cases of linear models. In this course we discuss linear models with a thorough treatment of the matrix algebra.
Although linear models are widely used, sometimes alternatives are preferred. Therefore, we discuss how to check the assumptions underlying linear model: independent errors, with a normal distribution and constant variance. When the assumptions of normality and constant variance are violated, the wider class of generalized linear models may be employed. Examples are logistic regression for a binary response (assuming a binomial distribution), or log-linear models for counts (using a Poisson distribution). Data are still assumed to be independent. Analysis of dependent data will be discussed in the course on mixed and longitudinal modeling. Emphasis will be on gaining understanding of the models, the kind of data that can be analyzed with these models, and with the statistical analysis of empirical data itself.
Students should understand the basic concepts of linear models (regression, ANOVA, ANCOVA) and generalized linear models, and the proper statistical inference methods. Students, when confronted with practical data for a linear or generalized linear model assuming independence should be able (1) understand the statistical analysis of the empirical data itself, (2) check for violations on the assumptions (2), and perform a proper data analysis. Students should acquaint themselves with the basics of linear algebra, especially the matrix algebra that is needed to understand Linear Models.
Mode of Instruction
Lectures and practicals (partly computer practicals, partly exercises).
See the Leiden University students' website for the Statistical Science programme -> Schedules 2020-2021
Assessment of a student will be based on written exam (2/3), a case study report (1/3), and an oral presentation of the case study report (pass/ fail).
Case study report: In week four, students will be asked to analyze a practical data set or study a theoretical topic. A report should be handed and the student will give a 15 minutes short oral presentation on the topic of his or her report.
The teacher will inform the students how the inspection of and follow-up discussion of the exams will take place.
Fox (2008). Applied Regression Analysis and Generalized Linear Models. Sage
Faraway: Practical Regression and ANOVA using R. Text available as PDF at http://cran.r-project.org/doc/contrib/Faraway-PRA.pdf
Faraway (2006). Extending the linear model with R. Generalized linear, mixed effects and nonparametric regression models. Chapman & Hall/CRC
Enroll in Blackboard for the course materials and course updates.
To be able to obtain a grade and the EC for the course, sign up for the (re-)exam in uSis ten calendar days before the actual (re-)exam will take place. Note, the student is expected to participate actively in all activities of the program and therefore uses and registers for the first exam opportunity.
Exchange and Study Abroad students, please see the Prospective students website for information on how to apply.
gerrit [dot] gort [at] wur.nl
- This is a compulsory course of the Master Statistical Science for the Life & Behavioural sciences / Data Science.