This repository includes Lecture Notes and Lab Exercises in Python for a course named "Data Analysis (for Networks)" taught at Sorbonne University. The slides of the Lecture Notes and the Exercises in Python have been developed by me (Anastasios Giovanidis, with contributions by Maximilien Danisch in L.11, and Lionel Tabourier in L.13, L.14), and closely follow the books:
"An Introduction to Statistical Learning (with applications in R)", by Gareth James, Daniela Witten, Trevor Hastie, Robert Tibshirani, (Springer) DOI 10.1007/978-1-4614-7138-7, ISSN 1431-875X, ISBN 978-1-4614-7137-0, see also web-link: http://www-bcf.usc.edu/~gareth/ISL/
"Pattern Recognition and machine Learning", by Christopher M. Bishop, Springer 2006, ISBN 978-0387-31073-2
and the online book:
H. Pishro-Nik, "Introduction to probability, statistics, and random processes", available at https://www.probabilitycourse.com, Kappa Research LLC, 2014.
For Clustering (L.11) the material is based on "Introduction to Data Mining", 2nd Edition by Tan, Steinbach, Karpatne, Kumar Chapter 7.
For Time-series (L.13&14) the material is based on the book by "Peter Brockwell and Richard Davis Introduction to Time Series and Forecasting", ISBN 0-387-95351-5.
In case you want to use the slides please contact me at [email protected]
The material covered includes:
- (L.1) Probability Basics (Discrete and continuous distributions, moments, limit theorems)
- (L.2) Estimation Frequentist (Sample Mean, Variance, MSE, Maximum Likelihood)
- (L.3) Bayesian Inference (Bayes Rule, MAP vs ML, conjucate priors, sequential learning)
- (L.4a) Hypothesis Tests (Confidence intervals, Neyman-Pearson, Likelihood Ratio Tests)
- (L.4b&5) Regression (Linear, Multi-dimensional)
- (L.6) Mode Selection and Cross-Validation (Polynomial regression, CV methods)
- (L.7-8) Classification (KNN, Logistic, LDA, QDA, Naive Bayes)
- (L.9) Tree-based methods (Trees, Random Forest, Boosting)
- (L.10) Clustering (K-means, Gaussian Mixture Model, Hierarchical, etc.)
- (L.11) Principal Component Analysis (SVD, Anomaly Detection applications)
- (L.12) Neural Networks (Stochastic Gradient Descent, Perceptron, Deep Neural Networks, backprop)
- (L.13-14) Time-Series (Stationarity, Autocorrelation, noise, decomposition, ARMA)
- (L.15) Support Vector Machines (classification, regression)