In order to build powerful models, I need to understand how machine learning algorithms work and how to tune hyperparameters so that I can achieve my goal. The basic idea is to study each algorithm on Introduction to Statistical Learning and implement the algorithm. I will try to implement the algorithm without using sklearn if I can because I think it will help me understand how the model works.
I plan to work on algorithms below:
- Linear Regression (done)
- Logistic Regression (done)
- KNN
- Decision Tree
- Random Forest
- SVM
- K-means
- Dimension Reduction
- XGBoost
- Gradient Boosting
- Naive Bayes
I will work on this in three steps:
- Understand the theory behind the algorithm including:
- how to get the formula of the algorithm
- when to use it
- cost function
- pros/ cons
- compare to similar models
- mapreduce computation
- time complexity
- Implement algorithm in Python.
- Utilize sklearn document to work on adjust parameters to practice improving the algorithm.