- knowledge of basic linear algebra and linear models
Python and R are the most frequently used programming languages of data science.
After this course the student shall be able to program reproducible analyses in Python.
An analysis would consist of data reading, cleaning, modelling and reporting steps.
Implementation would be based on the state of the art Python-specific data science libraries.
Data manipulation methods (with a brief reference to the SQL language) will be shown.
The students will be requested to practice usage of machine learning algorithms introduced earlier.
Moreover, relevance of data stewardship and FAIR principles (Findable, Accessible, Interoperable, Reusable) will be discussed.
After the course you will be able to:
write and execute a Python program or Python-notebook script/report
read/write data stored in standard tabular/hierarchical formats
perform data manipulation operations (table filtering, merging, wide/long conversion)
visualise histograms, scatter plots, etc.
execute several machine learning algorithms
explain the relevance of data stewardship for scientific research
properly handle research data during the complete data life cycle (planning research, collecting data, processing & analyzing data, preserving data, giving access to data, re-using data)
apply the FAIR principles (Findable, Accessible, Interoperable, Reusable)
Mode of instruction
Python tutorial: https://docs.python.org/3/tutorial/index.html
Current tutorials for Python Libraries for Data Science: NumPy, SciPy, Pandas, Matplotlib, TensorFlow
email: Szymon M. Kiełbasa firstname.lastname@example.org