Due to a relatively recent surge of large-scale text digitization and spoken language transcription projects (by corpus linguists, digital humanities scholars, but also commercial companies), an unprecedented amount of naturally occurring (rather than experimentally elicited) linguistic data from a wide range of languages and language varieties is ready to be queried and analyzed. Yet, as these electronic text databases – or ‘corpora’ – are not only growing in number but also in size, it is no longer feasible to subject them to more traditional types (manual) of data retrieval, annotation and analysis.
In this course, students will be familiarized with a range of computational methods (in R and Python) to collect, process and analyze corpus data. At the same time, students will also be introduced to different strands of computational corpus research and the questions they can answer. To this end, we survey a range of case studies that represent computational corpus analysis in fields such as computational lexicography, variationist sociolinguistics, and historical linguistics. Finally, ethical issues (e.g. copyright, privacy and bias) associated with large-scale corpus analysis will also be discussed.
By the end of this course, students will have gained:
In-depth knowledge of recent developments in computational corpus analysis and digital/computational humanities;
Insight into the importance of ‘found’ (i.e corpus) data in linguistic theory;
Insight into the relevance of using computational methods in linguistic analysis;
Insight into the various aspects of working with large corpora;
Refined analytic skills;
Practical skills for collecting, processing and analyzing corpus data;
The ability to structure and write a detailed corpus research report.
The timetables are available through My Timetable.
Mode of instruction
The course is assessed by means of a final research paper plus a number of practical assignments throughout term.
To pass the course, you can miss no more than two sessions for the semester.
To pass this course, a total score of 5.5 must be obtained. Students who score below 5.5 may submit a resit essay.
The final mark is based on the grade for the final paper plus the additional requirement that the practical assignments throughout the term are completed with a sufficient result.
Weighting is as follows: paper 90%; practical assignments throughout the term 10%.
The end-of-term essay can be revised and submitted as a resit essay if the score is between 4.5 and 5.5.
If the end-of-term essay has a score below 4.5, a resit essay should be submitted on a new topic.
The essay resit will constitute 100% of the final grade.
Please note that there is no resit for the practical assignments score throughout term.
Inspection and feedback
How and when an exam review will take place will be disclosed together with the publication of the exam results at the latest. If a student requests a review within 30 days after publication of the exam results, an exam review will have to be organized.
Course readings will be made available on Brightspace via open access and Leiden University Library resources.
Enrolment through My Studymap is mandatory
Registration Studeren à la carte en Contractonderwijs
For substantive questions, contact the lecturer listed in the right information bar
For questions related to the content of the course, please contact the lecturer, you can find their contact information by clicking on their name in the sidebar.
For questions regarding enrollment please contact the Education Administration Office Reuvensplaats E-mail address Education Administration Office Reuvensplaats: firstname.lastname@example.org
For questions regarding your studyprogress contact the Coordinator of Studies
All other information.