Students should have basic knowledge of data mining, algorithms, and probability theory, e.g., by having successfully completed BSc courses on Data Mining and Artificial Intelligence.
How can we gain insight from data? How can we discover and explain structure in data if we don't know what to expect? What is the optimal model for our data? How do we develop principled algorithms for exploratory data mining? To answer these questions, we study and discuss the state of the art in the relatively young research area of information theoretic data mining. We focus on theory, problems, and algorithms, not on implementation and experimentation.
Over the last decade information theoretic methods for selecting the best model have slowly but surely become popular in the international data mining community. This course provides an overview of the use of information theory for exploratory data mining, with a focus on pattern-based modelling. This includes the theoretical foundations, modelling and model selection, and algorithms.
In particular, the course covers concepts from Shannon's information theory, such as entropy and mutual information, and more advanced topics from algorithmic information theory (AIT), such as Kolmogorov Complexity. We show how the Minimum Description Length (MDL) principle and the Maximum Entropy (MaxEnt) principle can be used for purposes of exploratory data analysis and discuss problems, models, and algorithms that have been recently proposed in the literature.
This advanced course will have one meeting of two hours per week. The first part of the course will have both regular lectures and seminars, in which we discuss the material covered in the lectures and additional reading material (scientific articles). During the second part the students will write an essay on an assigned topic (based on scientific articles) and give a presentation. In this phase students will have the opportunity for individual tutoring by the lecturer.
Note that there is a strict limit on the capacity of this course; see Registration for the details.
At the end of the course, students:
Have a clear understanding of information theory (both algorithmic and Shannon's).
Have a clear understanding of advanced algorithms for data analysis based on information theory.
Have an overview of the state of the art in information theoretic data mining.
Are able to study, understand, and critically assess scientific publications from top journals and proceedings in the field of data mining.
Are able to write an essay discussing and interpreting the state of the art in the literature.
The most recent timetable can be found at the students' website
Mode of instruction
Attendance of all course meetings is mandatory. The final mark is composed of
participation in discussions (20%)
presentation (including Q&A) (30%)
The literature list, including mandatory and optional reading material, and lecture slides will be made available on the website of the course.
Signing up for classes and exams
There is limited space for students who are not enrolled in the MSc programme of Computer Science or one of the Data Science programmes. Please contact the study coordinator/study adviser.
Important: because of the format of the course and the fact that it is organized for the first time, there is a strict limit on the number of participants: at most 18 students can participate in this course. Register by 1) signing up in uSis and 2) sending an e-mail to the lecturer in which you confirm your participation.