The student should be acquainted with Chapters 1-8, 10 of the book “The Art of R Programming” by Norman Matloff (2011). Furthermore, the student should be acquainted with linear regression at the level of: Faraway: Practical Regression and ANOVA using R. Text available as PDF here
Make sure you have a laptop available during each lecture with SPSS version 25 or higher and the latest version of R and R-Studio (for details see Blackboard).
This course is about a large variety of methods for multivariate analysis and multidimensional data analysis. The first part deals with the analysis of measurements for N objects (persons) on P variables (attributes), and we typically wish to understand the relationships between those objects and variables. The data are usually given in one or more multivariate data matrices. The course extends classical approaches to multivariate analysis in various ways. We will not only deal with numeric, but also with categorical (both nominal and ordinal) multivariate data. In addition, we will be able to deal with nonlinear relationships between variables. Both extensions are part of the same optimal quantification/nonlinear transformation framework. Key concepts are dimension reduction and visualization (in principal components and multiple correspondence analysis), and prediction and regularization (in regression analysis).
The second part of the course is about a very important group of multidimensional techniques for the analysis of proximity data between objects (given in one or more N by N matrices). For the analysis of proximities we use the term multidimensional scaling. Here dimension reduction and visualization are of utmost importance by definition, while nonlinear transformations also play an important part.
The third part of the course will focus on supervised and unsupervised classification methods. Here the interest is primarily in the question whether we can predict the class an object (subject, person) belongs to given a set of explanatory variables. Three methods will be presented in detail: discriminant analysis, multinomial logistic regression and cluster analysis. The methods will be presented, and students will also learn how to program some methods in R.
Next to R, the first two parts of the course will also use the IBM-SPSS package CATEGORIES, which has been developed in Leiden.
See the Leiden University students' website for the Statistical Science programme -> Schedules
Mode of Instruction
The course consists of 2 course-days per week. Each course-day contains a two-hour lecture and a two-hour practical.
Assessment will be based on a written exam (60%) and 4 home assignments (40%). The minimum required grade for the written exam is a 5.
Date information about the exam and resit can be found in the Time Table.
The written exam is a closed exam. Books, laptop, internet or any other sources of external information are not allowed during the exam.
Reading material will be announced at the start of the course.
Enroll in Blackboard for the course materials and course updates.
To be able to obtain a grade and the EC for the course, sign up for the (re-)exam in uSis ten calendar days before the actual (re-)exam will take place. Note, the student is expected to participate actively in all activities of the program and therefore uses and registers for the first exam opportunity.
Exchange and Study Abroad students, please see the Prospective students website for information on how to apply.
s.j.w.willems [at] math [dot] leidenuniv [dot] nl
- This is a compulsory course of the Master Statistical Science for the Life and Behavioural sciences / Data Science.