Prospectus

nl en

Multivariate analysis and multidimensional data analysis

Course
2020-2021

Admission Requirements

The student should be acquainted with Chapters 1-8, 10 of the book “The Art of R Programming” by Norman Matloff (2011). Furthermore, the student should be acquainted with linear regression at the level of: Faraway: Practical Regression and ANOVA using R.

Make sure you have a laptop available during each lecture with SPSS version 25 or higher and the latest version of R and R-Studio (for details see Blackboard).

Description

This course is about a large variety of methods for multivariate analysis and multidimensional data analysis. The first part deals with the analysis of measurements for N objects (persons) on P variables (attributes), and we typically wish to understand the relationships between those objects and variables. The data are usually given in one or more multivariate data matrices. The course extends classical approaches to multivariate analysis in various ways. We will not only deal with numeric, but also with categorical (both nominal and ordinal) multivariate data. In addition, we will be able to deal with nonlinear relationships between variables. Both extensions are part of the same optimal quantification/nonlinear transformation framework. Key concepts are dimension reduction and visualization (in principal components and multiple correspondence analysis), and prediction and regularization (in regression analysis).

The second part of the course is about a very important group of multidimensional techniques for the analysis of proximity data between objects (given in one or more N by N matrices). For the analysis of proximities we use the term multidimensional scaling. Here dimension reduction and visualization are of utmost importance by definition, while nonlinear transformations also play an important part.

The third part of the course will focus on supervised and unsupervised classification methods. Here the interest is primarily in the question whether we can predict the class an object (subject, person) belongs to given a set of explanatory variables. Three methods will be presented in detail: discriminant analysis, multinomial logistic regression and cluster analysis. The methods will be presented, and students will also learn how to program some methods in R.

Next to R, the first two parts of the course will also use the IBM-SPSS package CATEGORIES, which has been developed in Leiden.

Course objectives

In this course we focus on both the theoretic understanding of the discussed techniques and the application to data.

At the end of the course the students are (at least)

  • to enumerate the features of the multivariate and multidimensional data analysis techniques discussed during the course, and to explain the differences between them.

  • to describe for each of the multivariate and multidimensional data analysis techniques

  • to explain the concepts and tools of the discussed multivariate analysis techniques

  • to notate mathematically and understand the optimization functions of the multivariate and multidimensional data analysis techniques discussed during the course.

  • To program some of the algorithms and models

  • To perform and report on the methods.

Mode of instruction

Originally, the mode of instruction was lectures with exercise classes. However, the teachers consider using the many videos that were made during the Corona-crisis to make the course more interactive. More information will be given at the start of the course.

Assessment method

Assessment will be based on a written exam (2/3 of final grade) and 4 home assignments (1/3 of final grade). The minimum required grade for the written exam is a 5.

The written exam is a closed exam. Books, laptop, internet or any other sources of external information are not allowed during the exam.

Literature

Reading material consists of scientific articles which will be announced at the start of the course.

Brightspace/website

Enroll in Brightspace for the course materials and course updates.

To be able to obtain a grade and the EC for the course, sign up for the (re-)exam in uSis ten calendar days before the actual (re-)exam will take place. Note, the student is expected to participate actively in all activities of the program and therefore uses and registers for the first exam opportunity. Exchange and Study Abroad students, please see the Prospective students website for information on how to apply.

Contact information

s.j.w.willems [at] math [dot] leidenuniv [dot] nl

Remarks

This is a compulsory course of the Master Statistical Science for the Life and Behavioural sciences / Data Science.