Skip to content

Latest commit

 

History

History
233 lines (174 loc) · 11.6 KB

File metadata and controls

233 lines (174 loc) · 11.6 KB

Back to ML

Metrics 🧮

  1. Mean Absolute Error ( MAE )
  2. Mean Squared Error ( MSE )
  3. Root Mean Squared Error ( RMSE )
  4. Coefficient of Determination ( R2 )
  5. Adjusted R2 ( Adj R2 )
  1. Confusion Matrix
  2. Accuracy
  3. Precision
  4. Recall | True Positive Rate ( TPR ) | Sensitivity
  5. False Positive Rate ( FPR ) | Specificity
  6. F1 Score or F Measure
  7. ROC | Receiver Operating Characteristic Curve
  8. AUC | Area Under Curve

How do you evaluate the performance of an ML model?

  1. We start with some initial configuration of the model and predict the output based on some input.
  2. Then we compare the predicted value with the target (actual value) and measure the performance.
  3. Parameters of the model are adjusted iteratively to reach the optimal value of the performance metric.
  4. Performance metric is a measurable value used to evaluate the model's performance.
  5. Performance metrics can be used to track progress towards accuracy and identify areas for improvement.
  6. The model that generalizes best to the new unseen data is finally selected.
  7. The model allows us to compare different models and choose the best one for a specific task.

Linear Regression

  • Predict continuous numeric dependent variables based on one or more independent variables

1. Mean Absolute Error ( MAE )

MAE

  • The average absolute difference between actual and predicted values.
  • MAE is better for datasets with small errors but fails in case of larger errors.
  • MAE is expressed in the same units as the dependent variable.
  • MAE is less sensitive towards outliers.
  • A lower MAE indicates that the model is making more accurate predictions.

MAE Scikit Learn

2. Mean Squared Error ( MSE ) | LOSS

MSE

  • The average squared difference between actual and predicted values.
  • More sensitive towards outliers, hence affected/impacted by outliers.
  • MSE is not good for larger errors. It changes the units/scale of the predicted values.
  • MSE is expressed as squared units instead of natural data units.
  • Squaring the difference removes negative MSE, which is usually always positive.
  • A lower MSE indicates that the model is making more accurate predictions.

MSE Scikit Learn

3. Root Mean Square Error ( RMSE )

RMSE

  • Square Root of MSE, RMSE is useful at the time of undesired large errors.
  • RMSE is a more intuitive measure of error than MSE. Provides an interpretable measure.
  • It is measured in the same units as the predicted variable.
  • It gives high weight to large errors. RMSE is useful when large errors are undesirable.
  • Combines the properties of MAE (same unit) and MSE (magnifies smaller errors).
  • A lower RMSE indicates that the model is making more accurate predictions.
MAE MSE RMSE
Absolute (Actual - Predicted) Squared (Actual - Predicted) Square Root (MSE)
Good for small errors Magnify small errors Shrinks down larger errors
Units of predicted value remain same Unit gets squared Units remain same
Less sensitive towards outliers More sensitive towards outliers Less sensitive towards outliers

4. Coefficient of Determination (R2) | Squared Correlation Coefficient

R2

  • A measure of how well the model fits the data or how well the model makes predictions on new observations.
  • Measure how close each data point fits the regression line or how well the regression line predicts actual values.
  • Explains the variance of the data captured by the model (0.7 to 0.9 is a good value for R2)
  • If R2 is 0.8 or 80% (Regression line explains 80% of the variance in data)
  • Low R2 causes underfitting and high R2 results into overfitting.
  • Ideal value for R2 is between 70% to 90% (i.e. Model fits the data very well)
  • Help us to compare the created model with the baseline model (Mean)
  • Best fit line predicts better than base fit line (Mean)
  • The value of R2 always increases as new features are added to the model, without detecting the significance of the newly added feature.
  • A higher R-squared indicates that the model is making more accurate predictions.
  • Indicates how much of the variance of the dependent variable can be explained by the independent variables.
  • It measures the variability in the dependent variable (Y) that is being explained by the independent variables (x)
  • R2 = 0 indicates that the independent variable does not explain any of the variance in the dependent variable.

  • R2 = 1 indicates that the independent variable perfectly explains the variance in the dependent variable.

R2 Score Scikit Learn

R2 Goog or Bad

5. Adjusted R2

  • Improvement of R2 ( Adjusted R2 is always lower than R2 )
  • Adjusted R-squared is a more reliable measure than R-squared.
  • Compare models with different numbers of independent features.
  • Adjusted R2 increases only if the new independent feature improves the model more than expected.
  • Provides more accurate correlation between independent features.
  • It is a more accurate measure of the model's fit if many independent variables exist.
MAE or MSE or RMSE R2 R2 ( Adj )
Good Model: Value closer to 0 Good Model: Value closer to 1 Increases only if new term improves model
MAE (Small errors), RMSE (Large errors) Measures variability Good if the dataset has many independent variables

Logistic Regression | Classification

  • Predict the class label of a data point based on one or more independent features.
  • Depending on the number of class labels in the target variable, it can be a Binary or Multiclass classification.
  • The data set should contain a well-balanced class distribution. (e.g. Total Students = 100 : 50 Boys + 50 Girls)
  • Good Classifier: 1 or 100% | Bad Classifier < 0.5 or 50%

1. Confusion Matrix

  • A table that summarizes the performance of a classification model.
  • Evaluate correct and incorrect classifications on each class label.

Classification

True Positive  (TP): Predicts 1 when Actual is 1 
True Negative  (TN): Predicts 0 when Actual is 0 
False Positive (FP): Predicts 1 when Actual is 0 | Type I Error  | Incorrect True Prediction 
False Negative (FN): Predicts 0 when Actual is 1 | Type II Error | Incorrect False Prediction 
  • The metric depends on the specific problem and the relative importance of different types of errors.
  • For medical diagnosis, we might prioritize recall to minimize false negatives (FN)
  • For the spam filtering problem, we might prioritize precision to minimize false positives (FP)

Confusion Matrix

2. Accuracy

  • The ratio of correct predictions to the total number of predictions.
  • Accuracy score is good if the dataset is balanced. It can be misleading in imbalanced datasets.
  • Used when all the classes (TP, TN, FP and FN) are equally important.
  • Accuracy: (TP + TN) / TP + TN + FP + FN

Accuracy

3. Precision

  • Measures the correctly identified positive cases (TPs) from all the predicted positive cases.
  • Precision is a crucial metric when minimizing False Positives (FP) is a priority. (e.g. Antivirus, Spam Filtering)
  • TP: The number of instances that were correctly classified as positive.
  • FP: The number of instances that were incorrectly classified as positive.

Precision

4. Recall | True Positive Rate (TPR) | Sensitivity

  • The ratio of true positive predictions (TPs) to the total actual positives.
  • Measures the correctly identified positive cases (TPs) from all the actual positive cases.
  • Recall is a crucial metric when minimizing False Negatives (FN) is a priority. (e.g. Medical Diagnosis, Corona, Fraud Detection)
  • FN: The number of instances that were incorrectly classified as negative.

Example: Medical Diagnosis

  1. Precision: A FP (Incorrectly diagnosing a healthy person can lead to unnecessary treatment)
  2. Recall: A FN (Incorrectly diagnosing a sick person as healthy can lead to delayed treatment)

Example: Fraud Detection

  1. Precision: A FP (Flagging a legitimate transaction as fraudelent can damage customer relationships)
  2. Recall: A FN (Failing to detect fraudelent claims can result in significant financial losses)

Recall

5. False Positive Rate (FPR) | Specificity

  • Measures the incorrectly identified positive cases from all the actual negative cases.
  • False Positive Rate: Proportion of negative class that is incorrectly predicted as positive.

FPR

6. F1 Score | F Measure

  • F1 Score is a harmonic mean of precision and recall (Balancing precision and recall).
  • Useful for imbalanced datasets (Uneven class distribution) and it also considers FP and FN.
  • Accuracy is used when TP and TN are more important.
  • Precision is used when FP is crucial (Antivirus is showing that the system is safe, even if it's affected by a virus)
  • Recall is used when FN is risky (Medical diagnosis, Covid test is showing negative, even if the patient is affected)
  • F1 Score is used when minimizing FN and FP are more crucial.
  • F1-score is a better metric to evaluate in real life application.
  • Best value for F1 Score is 1 | Worst value for F1 Score is 0.
  • Precision, Recall and F1 Score are better metrics for imbalanced dataset.

F1

Support

  • Support refers to the number of actual occurences of each class in the dataset.
  • It essentially counts how many times each class appears in the true labels of the data.

7. ROC | Receiver Operating Characteristic

  • Explains the characteristics of curves by plotting, TPR on the Y-axis and FPR on the X-axis at different classification thresholds.
  • The ROC curve helps to select the optimal threshold for a classifier.
  • If the threshold is closer to 1.0 or 100%: Classifications get more accurate.

ROC

8. AUC | Area Under ROC Curve

  • Helps to understand the performance of a classification model across all the classification thresholds.
Score Classifier
AUC = 1.0 Perfect Classifier
AUC > 0.75 Good Classifier
AUC > 0.5 Bad Classifier
AUC < 0.5 Worst Classifier

AUC

Back to Questions