Recommended prior knowledge
Advanced course, recommended for second-year students. It is recommended that students have knowledge of data mining, algorithms, probability theory, statistics, and Turing machines.
How can we gain insight from data? How can we discover and explain structure in data if we don't know what to expect? What is the optimal model for our data? How do we develop principled algorithms for exploratory data mining? To answer these questions, we study and discuss the state of the art in the research area of information theoretic data mining. We focus on theory, problems, and algorithms; not on implementation and experimentation.
Over the last decade information theoretic methods for selecting the best model have become popular in the academic data mining community. This course provides an overview of the use of information theory for exploratory data mining, with a focus on pattern-based modelling. This includes the theoretical foundations, modelling and model selection, and algorithms.
In particular, the course covers concepts from Shannon's information theory, such as entropy and mutual information, and more advanced topics from algorithmic information theory (AIT), such as Kolmogorov Complexity. We show how the Minimum Description Length (MDL) principle and the Maximum Entropy (MaxEnt) principle can be used for exploratory data analysis and discuss problems, models, and algorithms that have been recently proposed.
This advanced course will have one meeting of two hours per week. The first part of the course will have both regular lectures and seminars, in which we discuss the material covered in the lectures and additional reading material (scientific articles). During the second part the students will write a scientific essay on an assigned topic (based on scientific articles) and give a presentation. In this phase students will have the opportunity for individual tutoring.
Note that there is a strict limit on the capacity of this course; see Registration for the details.
At the end of the course, students:
Have a clear understanding of information theory (both algorithmic and Shannon's).
Have a clear understanding of advanced algorithms for data analysis based on information theory.
Have an overview of the state of the art in information theoretic data mining.
Are able to study, understand, and critically assess scientific publications from top journals and proceedings in the field of data mining.
Are able to write a scientific essay discussing and interpreting the state of the art in the literature.
The most recent timetable can be found at the Computer Science (MSc) student website.
Mode of instruction
Total hours of study: 168 hrs. (= 6 EC)
Lectures 24:00 hrs.
Practical work 141:30 hrs.
Tutoring 1:30 hrs.
Attendance of all course meetings is mandatory. The final mark is composed of
presentation (including Q&A) (25%)
scientific essay (60%)
The teacher will inform the students how the inspection of and follow-up discussion of the essays will take place.
The literature list, including mandatory and optional reading material, and lecture slides will be made available on Brightspace.
- You have to sign up for courses and exams (including retakes) in uSis. Check this link for information about how to register for courses.
Important: because of the format of the course, there is a strict limit on the number of participants: at most 20 students can participate in this course. Register by 1) signing up in uSis and 2) sending an e-mail to the lecturer in which you confirm your participation.