- We start with some initial configuration of the model and predict the output based on some input.
- Then we compare the predicted value with the target (actual value) and measure the performance.
- Parameters of the model are adjusted iteratively to reach the optimal value of the performance metric.
- Performance metric is a measurable value used to evaluate the model's performance.
- Performance metrics can be used to track progress towards accuracy and identify areas for improvement.
- The model that generalizes best to the new unseen data is finally selected.
- The model allows us to compare different models and choose the best one for a specific task.
- Predict continuous numeric dependent variables based on one or more independent variables
- The average absolute difference between actual and predicted values.
- MAE is better for datasets with small errors but fails in case of larger errors.
- MAE is expressed in the same units as the dependent variable.
- MAE is less sensitive towards outliers.
- A lower MAE indicates that the model is making more accurate predictions.
- The average squared difference between actual and predicted values.
- More sensitive towards outliers, hence affected/impacted by outliers.
- MSE is not good for larger errors. It changes the units/scale of the predicted values.
- MSE is expressed as squared units instead of natural data units.
- Squaring the difference removes negative MSE, which is usually always positive.
- A lower MSE indicates that the model is making more accurate predictions.
- Square Root of MSE, RMSE is useful at the time of undesired large errors.
- RMSE is a more intuitive measure of error than MSE. Provides an interpretable measure.
- It is measured in the same units as the predicted variable.
- It gives high weight to large errors. RMSE is useful when large errors are undesirable.
- Combines the properties of MAE (same unit) and MSE (magnifies smaller errors).
- A lower RMSE indicates that the model is making more accurate predictions.
MAE | MSE | RMSE |
---|---|---|
Absolute (Actual - Predicted) | Squared (Actual - Predicted) | Square Root (MSE) |
Good for small errors | Magnify small errors | Shrinks down larger errors |
Units of predicted value remain same | Unit gets squared | Units remain same |
Less sensitive towards outliers | More sensitive towards outliers | Less sensitive towards outliers |
- A measure of how well the model fits the data or how well the model makes predictions on new observations.
- Measure how close each data point fits the regression line or how well the regression line predicts actual values.
- Explains the variance of the data captured by the model (0.7 to 0.9 is a good value for R2)
- If R2 is 0.8 or 80% (Regression line explains 80% of the variance in data)
- Low R2 causes underfitting and high R2 results into overfitting.
- Ideal value for R2 is between 70% to 90% (i.e. Model fits the data very well)
- Help us to compare the created model with the baseline model (Mean)
- Best fit line predicts better than base fit line (Mean)
- The value of R2 always increases as new features are added to the model, without detecting the significance of the newly added feature.
- A higher R-squared indicates that the model is making more accurate predictions.
- Indicates how much of the variance of the dependent variable can be explained by the independent variables.
- It measures the variability in the dependent variable (Y) that is being explained by the independent variables (x)
-
R2 = 0 indicates that the independent variable does not explain any of the variance in the dependent variable.
-
R2 = 1 indicates that the independent variable perfectly explains the variance in the dependent variable.
- Improvement of R2 ( Adjusted R2 is always lower than R2 )
- Adjusted R-squared is a more reliable measure than R-squared.
- Compare models with different numbers of independent features.
- Adjusted R2 increases only if the new independent feature improves the model more than expected.
- Provides more accurate correlation between independent features.
- It is a more accurate measure of the model's fit if many independent variables exist.
MAE or MSE or RMSE | R2 | R2 ( Adj ) |
---|---|---|
Good Model: Value closer to 0 | Good Model: Value closer to 1 | Increases only if new term improves model |
MAE (Small errors), RMSE (Large errors) | Measures variability | Good if the dataset has many independent variables |
- Predict the class label of a data point based on one or more independent features.
- Depending on the number of class labels in the target variable, it can be a Binary or Multiclass classification.
- The data set should contain a well-balanced class distribution. (e.g. Total Students = 100 : 50 Boys + 50 Girls)
- Good Classifier: 1 or 100% | Bad Classifier < 0.5 or 50%
- A table that summarizes the performance of a classification model.
- Evaluate correct and incorrect classifications on each class label.
True Positive (TP): Predicts 1 when Actual is 1
True Negative (TN): Predicts 0 when Actual is 0
False Positive (FP): Predicts 1 when Actual is 0 | Type I Error | Incorrect True Prediction
False Negative (FN): Predicts 0 when Actual is 1 | Type II Error | Incorrect False Prediction
- The metric depends on the specific problem and the relative importance of different types of errors.
- For medical diagnosis, we might prioritize recall to minimize false negatives (FN)
- For the spam filtering problem, we might prioritize precision to minimize false positives (FP)
- The ratio of correct predictions to the total number of predictions.
- Accuracy score is good if the dataset is balanced. It can be misleading in imbalanced datasets.
- Used when all the classes (TP, TN, FP and FN) are equally important.
- Accuracy: (TP + TN) / TP + TN + FP + FN
- Measures the correctly identified positive cases (TPs) from all the predicted positive cases.
- Precision is a crucial metric when minimizing False Positives (FP) is a priority. (e.g. Antivirus, Spam Filtering)
- TP: The number of instances that were correctly classified as positive.
- FP: The number of instances that were incorrectly classified as positive.
- The ratio of true positive predictions (TPs) to the total actual positives.
- Measures the correctly identified positive cases (TPs) from all the actual positive cases.
- Recall is a crucial metric when minimizing False Negatives (FN) is a priority. (e.g. Medical Diagnosis, Corona, Fraud Detection)
- FN: The number of instances that were incorrectly classified as negative.
Example: Medical Diagnosis
- Precision: A FP (Incorrectly diagnosing a healthy person can lead to unnecessary treatment)
- Recall: A FN (Incorrectly diagnosing a sick person as healthy can lead to delayed treatment)
Example: Fraud Detection
- Precision: A FP (Flagging a legitimate transaction as fraudelent can damage customer relationships)
- Recall: A FN (Failing to detect fraudelent claims can result in significant financial losses)
- Measures the incorrectly identified positive cases from all the actual negative cases.
- False Positive Rate: Proportion of negative class that is incorrectly predicted as positive.
- F1 Score is a harmonic mean of precision and recall (Balancing precision and recall).
- Useful for imbalanced datasets (Uneven class distribution) and it also considers FP and FN.
- Accuracy is used when TP and TN are more important.
- Precision is used when FP is crucial (Antivirus is showing that the system is safe, even if it's affected by a virus)
- Recall is used when FN is risky (Medical diagnosis, Covid test is showing negative, even if the patient is affected)
- F1 Score is used when minimizing FN and FP are more crucial.
- F1-score is a better metric to evaluate in real life application.
- Best value for F1 Score is 1 | Worst value for F1 Score is 0.
- Precision, Recall and F1 Score are better metrics for imbalanced dataset.
- Support refers to the number of actual occurences of each class in the dataset.
- It essentially counts how many times each class appears in the true labels of the data.
- Explains the characteristics of curves by plotting, TPR on the Y-axis and FPR on the X-axis at different classification thresholds.
- The ROC curve helps to select the optimal threshold for a classifier.
- If the threshold is closer to 1.0 or 100%: Classifications get more accurate.
- Helps to understand the performance of a classification model across all the classification thresholds.
Score | Classifier |
---|---|
AUC = 1.0 | Perfect Classifier |
AUC > 0.75 | Good Classifier |
AUC > 0.5 | Bad Classifier |
AUC < 0.5 | Worst Classifier |