Prospectus

nl en

Information Theoretic Data Mining

Course
2022-2023

Admission requirements

Recommended prior knowledge

Advanced course, recommended for second-year students. It is recommended that students have knowledge of data mining, algorithms, probability theory, and statistics.

Description

How can we gain insight from data? How can we discover and explain structure in data if we don't know what to expect? What is the optimal model for our data? How do we develop principled algorithms for exploratory data mining? To answer these questions, we study and discuss the state of the art in the research area of information theoretic data mining. We focus on theory, problems, and algorithms; not on implementation and experimentation.

Over the last decade information theoretic methods for selecting the best model have become popular in the academic data mining community. This course provides an overview of the use of information theory for exploratory data mining, with a focus on pattern-based modelling. This includes the theoretical foundations, modelling and model selection, and algorithms.

In particular, the course covers concepts from Shannon's information theory, such as entropy and mutual information, and more advanced topics from algorithmic information theory (AIT), such as Kolmogorov Complexity. We show how the Minimum Description Length (MDL) principle and the Maximum Entropy (MaxEnt) principle can be used for exploratory data analysis and discuss problems, models, and algorithms that have been recently proposed.

This advanced course will have one meeting of two hours per week. The first part of the course will have both regular lectures and seminars, in which we discuss the material covered in the lectures and additional reading material (scientific articles). During the second part the students will write a scientific essay on an assigned topic (based on scientific articles) and give a presentation. In this phase students will have the opportunity for individual tutoring.

Note that there is a strict limit on the capacity of this course; see Registration for the details.

Course objectives

At the end of the course, students:

  • Have a clear understanding of information theory (both algorithmic and Shannon's).

  • Have a clear understanding of advanced algorithms for data analysis based on information theory.

  • Have an overview of the state of the art in information theoretic data mining.

  • Are able to study, understand, and critically assess scientific publications from top journals and proceedings in the field of data mining.

  • Are able to write a scientific essay discussing and interpreting the state of the art in the literature.

Timetable

The most recent timetable can be found at the Computer Science (MSc) student website.

You will find the timetables for all courses and degree programmes of Leiden University in the tool MyTimetable (login). Any teaching activities that you have sucessfully registered for in MyStudyMap will automatically be displayed in MyTimeTable. Any timetables that you add manually, will be saved and automatically displayed the next time you sign in.

MyTimetable allows you to integrate your timetable with your calendar apps such as Outlook, Google Calendar, Apple Calendar and other calendar apps on your smartphone. Any timetable changes will be automatically synced with your calendar. If you wish, you can also receive an email notification of the change. You can turn notifications on in ‘Settings’ (after login).

For more information, watch the video or go the the 'help-page' in MyTimetable. Please note: Joint Degree students Leiden/Delft have to merge their two different timetables into one. This video explains how to do this.

Mode of instruction

  • Lectures

  • Seminar

  • Tutoring

  • Essay

  • Presentation

Assessment method

Attendance of all course meetings is mandatory. The final grade is computed as the weighted average of the grades of

  • three assignments (5% each, together 15%)

  • presentation (including Q&A) (25%)

  • scientific essay (60%)

If an assignment is not completed, the resulting grade is a 0. There will be no retakes for the assignments and the presentation. The final grade can only be sufficient if 1) the presentation and essay have both been completed, and 2) the grade for the scientific essay is at least a 5.5.

The teacher will inform the students how the inspection of and follow-up discussion of the essays will take place.

Reading list

The literature list, including mandatory and optional reading material, and lecture slides will be made available on Brightspace.

Registration

From the academic year 2022-2023 on every student has to register for courses with the new enrollment tool MyStudyMap. There are two registration periods per year: registration for the fall semester opens in July and registration for the spring semester opens in December. Please see this page for more information.

Please note that it is compulsory to both preregister and confirm your participation for every exam and retake. Not being registered for a course means that you are not allowed to participate in the final exam of the course. Confirming your exam participation is possible until ten days before the exam.

Extensive FAQ's on MyStudymap can be found here.

Contact

Lecturer: dr. Matthijs van Leeuwen
Website: Website ITDM

Remarks

Important: because of the format of the course, there is a strict limit on the number of participants: at most 20 students can participate in this course.