Introduction to Data Science

Course
2019-2020

-

Description

Data Science emerged at the crossroads of many different fields, including statistics, machine learning, natural language processing, databases, and others. This course serves a dual purpose. On the one hand, several lecturers will introduce students to their experiences in the field, and students will work with practitioners to gain experience selecting plausible models and communicating results from real data. On the other hand, students with a Statistical Sciences background will be made familiar with the Python programming language, which is widely used by data scientists and which supports many useful tools and libraries.

Course objectives

The goal is to gain a better understanding of the very diverse field of data science, to be able to write small but readable and robust Python programs to solve statistical problems, and to get acquainted with some of the popular data analytical libraries that are often used with Python, in particular Numpy, Matplotlib and Scikit Learn.

Mode of Instruction

This course includes a mix of plenary lectures, lectures by invited speakers, potentially company visits, project work in small groups, and programming homework exercises. The Python lectures are specifically for Statistical Science students and consist of plenary lectures combined with workgroups for the homework, the other parts of the course are followed jointly with the Data Science specialization students from the Master in Computer Science.

Time Table

See the Leiden University students' website for the Statistical Science programme -> Schedules

Assessment method

Completion of the course depends on three factors: (1) attending the guest lectures and potentially company visits and participation in the group projects, (2) scoring at least 50/100 on the homework problems and on the written exam, and (3) scoring a final grade of at least 55/100. The final grade is calculated as H(40/100)+E(60/100), where H is the homework grade and E is the exam grade.
The written exam requires the use of an offline laptop. Exam and resit information about the date can be found in the time table.

Compulsory material will be provided through lecture slides. The following books are optional literature:

• Mathematical Statistics and Data Analysis. John A. Rice. Duxbury press (3rd ed. 2007)

• The Elements of Statistical Learning. Hastie, Tibshirani & Friedman. Springer Series in Statistics (2009 )

Course Registration

Enroll in Blackboard for the course materials and course updates.
To be able to obtain a grade and the ECTS for the course, sign up for the (re-)exam in uSis ten calendar days before the actual (re-)exam will take place. Note, the student is expected to participate actively in all activities of the program and therefore uses and registers for the first exam opportunity.
Exchange and Study Abroad students, please see the Prospective students website for information on how to apply.

Contact information

Steven de Rooij: steven [dot] de [dot] rooij [at] gmail [dot] com

Remarks

This is a compulsory course of the Master Statistical Science with the specialisation Data Science.