Skip to content

Latest commit

 

History

History
50 lines (38 loc) · 3.61 KB

Overfitting.md

File metadata and controls

50 lines (38 loc) · 3.61 KB

Back to ML

Overfitting: Low Bias + High Variance

Low Bias (Low error on train set) + High Variance (High error on the test set)

  • The model memorizes the data too well with certain assumptions including noise/error with very good accuracy on the training set.
  • But the model loses the ability to learn the hidden patterns and relationships.
  • The model performs better on the training dataset, but it does not generalize well on the new unseen test set.

How to identify overfitting?

Train Test Split | Observe Accuracy (Score)

  • Split the dataset into train and test sets (80/20 to 70/30 train test split)
  • Check whether the trained model generalizes well on the new unseen test dataset.
  • Calculate the accuracy score of the train set: model.score(X_train, y_train) and test set: model.score(X_test, y_test)
  • If the accuracy of the train set is better and the accuracy of the test set is worse then it's an overfitting scenario.

How to prevent from overfitting?

1. Collect more relevant data for training.

  • With more relevant data (observations), the model can learn the underlying patterns and relationships in a better way.
  • Data Augmentation: Artificially expand your dataset. creating new data points by applying transformations.
  • Slope | Coefficient | Weight | Gradient (All are same)
  • Regularization discourages the model from overfitting and encourages one to learn more generalized patterns.
  • Lasso (L1) adds a penalty equal to the sum of the absolute value of slope (Force coefficients to exactly 0)
  • Lasso simplifies the model by eliminating the features that do not create any impact on the target.
  • Ridge (L2) adds a penalty equal to the sum of the squared value of slope (Encourage small coefficients)
  • Ridge learn complex data patterns | Decreases the complexity of model.
  • Until certain iterations the new iterations will improve the accuracy of the model.
  • After some point the model's ability to generalize unseen data gets weak and model starts overfitting.
  • During training, monitor the model's performance on a validation set.
  • Stop training when the performance on the validation set starts to decline. This prevents the model from memorizing noise in training data

4. Feature Selection

  • More observations (rows) are good for model training but more features (columns) confuse the model.
  • Select only important features (Large number of features can confuse the model)
  • Each feature and observation should be independent of each other. Remove irrelevant and redundant features.
  • Remove multicollinear data (e.g DOB and age can express each other so we can remove one of them)
  • Train and combine multiple weak learners into a single strong accurate predicting model.
  • Bagging trains multiple individual weak learners (Decision tree) in parallel.
  • Boosting trains multiple weak learners in sequence (Improving in each step by learning from the mistakes of previous models)

Back to Questions