Prospectus

nl en

Text Mining

Course
2023-2024

Admission requirements

Assumed prior knowledge

A Bachelor in AI or Computer Science is recommended for this course, as well as experience with programming in Python.

Description

Text mining is one of the core application areas of Natural Language Processing (NLP). Key text mining tasks are text classification, information extraction, and sentiment analysis. NLP is a fast developing field that is grounded in fundamental models for text representation. It has attracted much attention from researchers in other fields and the general public, especially in recent times with the increasing power of large language models. This course gives an overview of the field from both a theoretical angle (underlying models) and a practical angle (applications, challenges with data). In addition to the lectures, the students work on practical assignments.

Outline:
week 1. Introduction
week 2. Text processing
week 3. Vector Semantics
week 4. Text categorization
week 5. Data collection and annotation
week 6. Neural NLP and transfer learning
week 7. Information Extraction
week 8. Topic Modelling & Text summarization
week 9. Sentiment analysis & Stance detection
week 10. Generative large language models
week 11. Industrial Text Mining
week 12. Conclusions

Course objectives

After successful completion of this course, students have an understanding, both at the conceptual and the technical level, of natural language processing (NLP) methods for the purpose of text mining. Students can build models for a text mining task using machine learning algorithms and text data, and they can evaluate and report on the data, the developed models and modules. Also, students understand, from a theoretical perspective, which models are applicable in which situations, and which real-world challenges prevent the application of certain techniques, such as domain specificity, language variation, noise in the data, and trustworthiness of models.

Timetable

The most recent timetable can be found at the Computer Science (MSc) student website.

You will find the timetables for all courses and degree programmes of Leiden University in the tool MyTimetable (login). Any teaching activities that you have sucessfully registered for in MyStudyMap will automatically be displayed in MyTimeTable. Any timetables that you add manually, will be saved and automatically displayed the next time you sign in.

MyTimetable allows you to integrate your timetable with your calendar apps such as Outlook, Google Calendar, Apple Calendar and other calendar apps on your smartphone. Any timetable changes will be automatically synced with your calendar. If you wish, you can also receive an email notification of the change. You can turn notifications on in ‘Settings’ (after login).

For more information, watch the video or go the the 'help-page' in MyTimetable. Please note: Joint Degree students Leiden/Delft have to merge their two different timetables into one. This video explains how to do this.

Mode of instruction

Lectures, literature, assignments (no lab sessions).

Assessment method

  • a written individual exam, closed book (50% of course grade)

  • practical assignments in groups (50% of course grade)

    • two assignments (10% each) during the course
    • one more substantial assignment (30%) at the end of the course

The grade for the written exam should be 5.5 or higher in order to complete the course. The exam has a regular written re-sit opportunity. The weighted average grade for the practical assignments should be 5.5 or higher in order to complete the course. If one of the assignments is not submitted the grade for that assignment is 0. Each assignment has a re-sit opportunity (a later submission); the maximum grade for a re-sit assignment is 6.

The teacher will inform the students how the inspection of and follow-up discussion of the exams will take place.

Reading list

The literature will be distributed on Brightspace. The majority of the chapters come from this book: Dan Jurafsky and James H. Martin, Speech and Language Processing (3rd ed), January 2023 https://web.stanford.edu/~jurafsky/slp3/

Registration

From the academic year 2022-2023 on every student has to register for courses with the new enrollment tool MyStudyMap. There are two registration periods per year: registration for the fall semester opens in July and registration for the spring semester opens in December. Please see this page for more information.

Please note that it is compulsory to both preregister and confirm your participation for every exam and retake. Not being registered for a course means that you are not allowed to participate in the final exam of the course. Confirming your exam participation is possible until ten days before the exam.

Extensive FAQ's on MyStudymap can be found here.

Contact

Lecturer: dr. S. Verberne
Website: Course website

Remarks

Due to limited capacity, external students can only register after consultation with the programme coordinator/study adviser mastercs@liacs.leideuniv.nl.