Welcome to this training course on machine learning (ML). It is both available in R and Python.
ML can be viewed as a collection of statistical algorithms used to
- predict values (supervised ML) or to
- investigate data structure (unsupervised ML).
Our focus is on supervised ML. Depending on if we predict numbers or classes, we talk about regression or classification.
This lecture is being distributed under the creative commons license.
Michael Mayer. Introduction to Machine Learning (2024). Web: https://github.com/mayer79/ml_lecture/
The lecture is split into four chapters, each of which is accompanied with an R/Python notebook and exercise solutions. You will find them in the corresponding subfolders.
- Basics and Linear Models
- Basics
- Linear regression
- Generalized Linear Model
- Model Selection and Validation
- Trees
- Decision trees
- Random forests
- Gradient boosting
- Neural Nets
Each chapter will take us about two hours. You will do the exercises on your own.
The lecture notes are available both as Jupyter notebooks (Python) and HTML (R).
To follow the lecture, you should be familiar with
- Descriptive statistics
- Linear regression
- R or Python
To get the material, clone this repository via
git clone https://github.com/mayer79/ml_lecture.git
Python 3.11 and the packages specified here.
R version >= 4.1 and these packages: tidyverse, FNN, withr, rpart.plot, ranger, xgboost, keras, hstats, MetricsWeighted, insuranceData, lightgbm
For the last chapter, we will use Python with TensorFlow >= 2.15. You can install it by running the R command keras::install_keras(version = "release-cpu")
. If the following code works, you are all set. (Some red start-up messages/warnings are okay.)
library(tensorflow)
tf$constant("Hello Tensorflow!")
- James, G., Witten, D., Hastie, T., Tibshirani, R. (2013). An Introduction to Statistical Learning - with Applications in R. New York: Springer.
- James, G., Witten, D., Hastie, T., Tibshirani, R., Taylor, J. (2023). An Introduction to Statistical Learning - with Applications in Python. New York: Springer.
- Hastie, T., Tibshirani, R., Friedman, J. (2001). The Elements of Statistical Learning: Data Mining, Inference, and Prediction. New York: Springer.
- Wickham, H., Grolemund, G. (2017). R for Data Science: Import, Tidy, Transform, Visualize, and Model Data. O'Reilly Media.
- VanderPlas, J. (2016). Python data science handbook : essential tools for working with data. O'Reilly Media.
- Chollet, F. (2017). Deep Learning with Python. Manning Publications Co.
- Chollet, F., Allaire, J. J. (2018). Deep Learning with R. Manning Publications Co.