Studiegids

nl en

Exploratory Data Analysis

Vak
2025-2026

Admission requirements

There are no entry requirements for the course. However, we assume that students are acquainted with the contents of the following courses of the Statistics & Data Science program:

  • Linear algebra

  • Linear and generalized linear models

  • Statistical computing with R

Description

Studying the relationship between two (or maybe three) variables is easy; you can visualise them in two-dimensional (or maybe three-dimensional) graphs. However, when you are interested in the relationship between more than three variables, the human brain often falls short and you need specialised methods to get more insight in the dependencies between either cases or variables, even more so if the relations are not linear.

Many of these specialised methods include the option to transform data in order to explore any non-linear relationships and to reduce the dimensionality. Additionally, these methods allow for a mix of different types of variables, continuous or categorical. These models are generally data-driven and therefore descriptive in nature, but some statistical inference can be done.

The techniques that will be covered in class are:
a) Multidimensional scaling analysis (i.a. Sammon mapping)
b) Nonlinear dimension reduction (i.a. t-SNE, UMAP)
c) Clustering (i.a. k-means and hierarchical)
d) Multiple correspondence analysis
e) Linear and optimal scaling principal components analysis (i.a. catpca)
f) Linear and optimal scaling regression analysis (i.a. catreg)

Note: This course is related to the course Data Visualization, but they cover different topics. In the course Data Visualization, the focus is mostly on visualizing raw data to explore individual variables and to find associations between two or three variables. In this Exploratory Data Analysis course, the focus is on using methods and techniques to explore more complicated structures in data, e.g. non-linear structures or the associations between many variables, e.g. via dimension reduction.

Course objectives

By the end of the course, students can:
1. motivate which technique is suitable to explore or answer a research question about a particular dataset;
2. discuss the differences in the assumptions and objectives of the techniques covered in the course;
3. identify the different parts of the loss functions of techniques covered in the course;
4. program some of the algorithms in R;
5. analyse data using the various techniques discussed in the course and evaluate the results.

Timetable

In MyTimetable, you can find all course and programme schedules, allowing you to create your personal timetable. Activities for which you have enrolled via MyStudyMap will automatically appear in your timetable.

Additionally, you can easily link MyTimetable to a calendar app on your phone, and schedule changes will be automatically updated in your calendar. You can also choose to receive email notifications about schedule changes. You can enable notifications in Settings after logging in.

Questions? Watch the video, read the instructions, or contact the ISSC helpdesk.

Note: Joint Degree students from Leiden/Delft need to combine information from both the Leiden and Delft MyTimetables to see a complete schedule. This video explains how to do it.

Mode of Instruction

The course consists of one course-day per week during which we combine lectures with exercise classes to understand the workings of the various techniques. Students are expected to prepare for each exercise class by, for example, reading the literature and making preparatory exercises.

Make sure you have a laptop available during each lecture with SPSS version 27 or higher and the latest version of R and RStudio (for details see Brightspace).

Assessment method

  • Exam: 2/3 of the final grade and should be at least 5.0 to pass the course;

  • Assignments: mean of assignments is 1/3 of final grade and should be at least 5.0 to pass the course.

Resit opportunities:

  • Resit Exam

  • There is no resit for the assignments, but student can get the opportunity to improve one of their assignments. The maximum grade for this resit opportunity is a 6.0.

Reading List

Reading material will be announced at the start of the course via Brightspace and is available via Leiden University Library.

Registration

As a student, you are responsible for enrolling on time through MyStudyMap.

In this short video, you can see step-by-step how to enrol for courses in MyStudyMap.
Extensive information about the operation of MyStudyMap can be found here.

There are two enrolment periods per year:

  • Enrolment for the fall opens in July

  • Enrolment for the spring opens in December

See this page for more information about deadlines and enrolling for courses and exams.

Note:

  • It is mandatory to enrol for all activities of a course that you are going to follow.

  • Your enrolment is only complete when you submit your course planning in the ‘Ready for enrolment’ tab by clicking ‘Send’.

  • Not being enrolled for an exam/resit means that you are not allowed to participate in the exam/resit.

Contact

Until October 1st, contact: Dr. S.M.H. Huisman (s.m.h.huisman@fsw.leidenuniv.nl)
After October 1st: contact Dr. Sanne Willems (s.j.w.willems@fsw.leidenuniv.nl)

Remarks

Software
Starting from the 2024/2025 academic year, the Faculty of Science will use the software distribution platform Academic Software. Through this platform, you can access the software needed for specific courses in your studies. For some software, your laptop must meet certain system requirements, which will be specified with the software. It is important to install the software before the start of the course. More information about the laptop requirements can be found on the student website.