During this seminar the fundamentals of audio processing and indexing will be studied. Applications in the area of speech recognition and understanding, audio synthesis and content based audio retrieval will be discussed. State of the art work on speech recognition, speech synthesis and content based audio retrieval will be studied and presented by the participants.
The seminar starts with several lectures and accompanying assignments in the form of workshops; followed by a literature selection, study, and presentations by all the students; the seminar ends with final project demos / presentations.
At the end of the seminar, students:
Should have a clear understanding of the fundamentals of audio processing, audio indexing, speech synthesis, and speech recognition and understanding.
Are able to apply basic audio processing algorithms to audio data sets.
Have experienced and studied the general setup of scientific research and experiments in the field of content based audio retrieval.
Are able to acquire necessary knowledge of state of the art methods in the field of audio indexing and retrieval by studying scientific publications from journals and proceedings.
Are able to design, implement, execute and report on a scientific audio processing or indexing experiment.
The most recent timetable can be found at the students' website.
Mode of instruction
Hours of study: 168 (= 6 EC)
Practical work: 62
Presentations and Project (60% of grade). Class discussions, attendance, and workshops (40% of grade).
The teacher will inform the students how the inspection of and follow-up discussion of the work will take place.
Lecture slides and further materials will be made available on the website of the course.
List of recommended books:
Fundamentals of Speech Recognition by Lawrence Rabiner, and Biing-Hwang Juang (Hardcover, 507 pages; Publisher: Pearson Education POD; ISBN: 0130151572; 1st edition, April 12, 1993)
Spoken Language Processing: A Guide to Theory, Algorithm and System Development by Xuedong Huang , Alex Acero , Hsiao-Wuen Hon , Raj Reddy (Hardcover, 980 pages; Publisher: Prentice Hall PTR; ISBN: 0130226165; 1st edition, April 25, 2001)
Dong Yu, Li Deng, Automatic Speech Recognition: A Deep Learning Approach (Signals and Communication Technology), Springer; 2015 edition (November 11, 2014).
- You have to sign up for courses and exams (including retakes) in uSis. Check this link for information about how to register for courses.