Metrics 🧮

Regression	Classification
Mean Absolute Error ( MAE ) Mean Squared Error ( MSE ) Root Mean Squared Error ( RMSE ) Coefficient of Determination ( R² ) Adjusted R² ( Adj R² )	Confusion Matrix Accuracy Precision Recall \| True Positive Rate ( TPR ) \| Sensitivity False Positive Rate ( FPR ) \| Specificity F1 Score or F Measure ROC \| Receiver Operating Characteristic Curve AUC \| Area Under Curve

How do you evaluate the performance of an ML model?

We start with some initial configuration of the model and predict the output based on some input.
Then we compare the predicted value with the target (actual value) and measure the performance.
Parameters of the model are adjusted iteratively to reach the optimal value of the performance metric.
Performance metric is a measurable value used to evaluate the model's performance.
Performance metrics can be used to track progress towards accuracy and identify areas for improvement.
The model that generalizes best to the new unseen data is finally selected.
The model allows us to compare different models and choose the best one for a specific task.

Linear Regression

Predict continuous numeric dependent variables based on one or more independent variables

1. Mean Absolute Error ( MAE )

The average absolute difference between actual and predicted values.
MAE is better for datasets with small errors but fails in case of larger errors.
MAE is expressed in the same units as the dependent variable.
MAE is less sensitive towards outliers.
A lower MAE indicates that the model is making more accurate predictions.

2. Mean Squared Error ( MSE ) | LOSS

The average squared difference between actual and predicted values.
More sensitive towards outliers, hence affected/impacted by outliers.
MSE is not good for larger errors. It changes the units/scale of the predicted values.
MSE is expressed as squared units instead of natural data units.
Squaring the difference removes negative MSE, which is usually always positive.
A lower MSE indicates that the model is making more accurate predictions.

3. Root Mean Square Error ( RMSE )

Square Root of MSE, RMSE is useful at the time of undesired large errors.
RMSE is a more intuitive measure of error than MSE. Provides an interpretable measure.
It is measured in the same units as the predicted variable.
It gives high weight to large errors. RMSE is useful when large errors are undesirable.
Combines the properties of MAE (same unit) and MSE (magnifies smaller errors).
A lower RMSE indicates that the model is making more accurate predictions.

MAE	MSE	RMSE
Absolute (Actual - Predicted)	Squared (Actual - Predicted)	Square Root (MSE)
Good for small errors	Magnify small errors	Shrinks down larger errors
Units of predicted value remain same	Unit gets squared	Units remain same
Less sensitive towards outliers	More sensitive towards outliers	Less sensitive towards outliers

4. Coefficient of Determination (R²) | Squared Correlation Coefficient

A measure of how well the model fits the data or how well the model makes predictions on new observations.
Measure how close each data point fits the regression line or how well the regression line predicts actual values.
Explains the variance of the data captured by the model (0.7 to 0.9 is a good value for R2)
If R² is 0.8 or 80% (Regression line explains 80% of the variance in data)
Low R² causes underfitting and high R² results into overfitting.
Ideal value for R² is between 70% to 90% (i.e. Model fits the data very well)
Help us to compare the created model with the baseline model (Mean)
Best fit line predicts better than base fit line (Mean)
The value of R2 always increases as new features are added to the model, without detecting the significance of the newly added feature.
A higher R-squared indicates that the model is making more accurate predictions.
Indicates how much of the variance of the dependent variable can be explained by the independent variables.
It measures the variability in the dependent variable (Y) that is being explained by the independent variables (x)
R² = 0 indicates that the independent variable does not explain any of the variance in the dependent variable.
R² = 1 indicates that the independent variable perfectly explains the variance in the dependent variable.

5. Adjusted R²

Improvement of R² ( Adjusted R² is always lower than R² )
Adjusted R-squared is a more reliable measure than R-squared.
Compare models with different numbers of independent features.
Adjusted R² increases only if the new independent feature improves the model more than expected.
Provides more accurate correlation between independent features.
It is a more accurate measure of the model's fit if many independent variables exist.

MAE or MSE or RMSE	R²	R² ( Adj )
Good Model: Value closer to 0	Good Model: Value closer to 1	Increases only if new term improves model
MAE (Small errors), RMSE (Large errors)	Measures variability	Good if the dataset has many independent variables

Logistic Regression | Classification

Predict the class label of a data point based on one or more independent features.
Depending on the number of class labels in the target variable, it can be a Binary or Multiclass classification.
The data set should contain a well-balanced class distribution. (e.g. Total Students = 100 : 50 Boys + 50 Girls)
Good Classifier: 1 or 100% | Bad Classifier < 0.5 or 50%

1. Confusion Matrix

A table that summarizes the performance of a classification model.
Evaluate correct and incorrect classifications on each class label.

True Positive  (TP): Predicts 1 when Actual is 1 
True Negative  (TN): Predicts 0 when Actual is 0 
False Positive (FP): Predicts 1 when Actual is 0 | Type I Error  | Incorrect True Prediction 
False Negative (FN): Predicts 0 when Actual is 1 | Type II Error | Incorrect False Prediction

The metric depends on the specific problem and the relative importance of different types of errors.
For medical diagnosis, we might prioritize recall to minimize false negatives (FN)
For the spam filtering problem, we might prioritize precision to minimize false positives (FP)

2. Accuracy

The ratio of correct predictions to the total number of predictions.
Accuracy score is good if the dataset is balanced. It can be misleading in imbalanced datasets.
Used when all the classes (TP, TN, FP and FN) are equally important.
Accuracy: (TP + TN) / TP + TN + FP + FN

3. Precision

Measures the correctly identified positive cases (TPs) from all the predicted positive cases.
Precision is a crucial metric when minimizing False Positives (FP) is a priority. (e.g. Antivirus, Spam Filtering)
TP: The number of instances that were correctly classified as positive.
FP: The number of instances that were incorrectly classified as positive.

4. Recall | True Positive Rate (TPR) | Sensitivity

The ratio of true positive predictions (TPs) to the total actual positives.
Measures the correctly identified positive cases (TPs) from all the actual positive cases.
Recall is a crucial metric when minimizing False Negatives (FN) is a priority. (e.g. Medical Diagnosis, Corona, Fraud Detection)
FN: The number of instances that were incorrectly classified as negative.

Example: Medical Diagnosis

Precision: A FP (Incorrectly diagnosing a healthy person can lead to unnecessary treatment)
Recall: A FN (Incorrectly diagnosing a sick person as healthy can lead to delayed treatment)

Example: Fraud Detection

Precision: A FP (Flagging a legitimate transaction as fraudelent can damage customer relationships)
Recall: A FN (Failing to detect fraudelent claims can result in significant financial losses)

5. False Positive Rate (FPR) | Specificity

Measures the incorrectly identified positive cases from all the actual negative cases.
False Positive Rate: Proportion of negative class that is incorrectly predicted as positive.

6. F1 Score | F Measure

F1 Score is a harmonic mean of precision and recall (Balancing precision and recall).
Useful for imbalanced datasets (Uneven class distribution) and it also considers FP and FN.
Accuracy is used when TP and TN are more important.
Precision is used when FP is crucial (Antivirus is showing that the system is safe, even if it's affected by a virus)
Recall is used when FN is risky (Medical diagnosis, Covid test is showing negative, even if the patient is affected)
F1 Score is used when minimizing FN and FP are more crucial.
F1-score is a better metric to evaluate in real life application.
Best value for F1 Score is 1 | Worst value for F1 Score is 0.
Precision, Recall and F1 Score are better metrics for imbalanced dataset.

Support

Support refers to the number of actual occurences of each class in the dataset.
It essentially counts how many times each class appears in the true labels of the data.

7. ROC | Receiver Operating Characteristic

Explains the characteristics of curves by plotting, TPR on the Y-axis and FPR on the X-axis at different classification thresholds.
The ROC curve helps to select the optimal threshold for a classifier.
If the threshold is closer to 1.0 or 100%: Classifications get more accurate.

8. AUC | Area Under ROC Curve

Helps to understand the performance of a classification model across all the classification thresholds.

Score	Classifier
AUC = 1.0	Perfect Classifier
AUC > 0.75	Good Classifier
AUC > 0.5	Bad Classifier
AUC < 0.5	Worst Classifier

Back to Questions

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Metrics.md

Metrics.md

Metrics 🧮

Regression

Classification

How do you evaluate the performance of an ML model?

Linear Regression

1. Mean Absolute Error ( MAE )

2. Mean Squared Error ( MSE ) | LOSS

3. Root Mean Square Error ( RMSE )

4. Coefficient of Determination (R²) | Squared Correlation Coefficient

5. Adjusted R²

Logistic Regression | Classification

1. Confusion Matrix

2. Accuracy

3. Precision

4. Recall | True Positive Rate (TPR) | Sensitivity

5. False Positive Rate (FPR) | Specificity

6. F1 Score | F Measure

Support

7. ROC | Receiver Operating Characteristic

8. AUC | Area Under ROC Curve

Files

Metrics.md

Latest commit

History

Metrics.md

File metadata and controls

Metrics 🧮

Regression

Classification

How do you evaluate the performance of an ML model?

Linear Regression

1. Mean Absolute Error ( MAE )

2. Mean Squared Error ( MSE ) | LOSS

3. Root Mean Square Error ( RMSE )

4. Coefficient of Determination (R2) | Squared Correlation Coefficient

5. Adjusted R2

Logistic Regression | Classification

1. Confusion Matrix

2. Accuracy

3. Precision

4. Recall | True Positive Rate (TPR) | Sensitivity

5. False Positive Rate (FPR) | Specificity

6. F1 Score | F Measure

Support

7. ROC | Receiver Operating Characteristic

8. AUC | Area Under ROC Curve

4. Coefficient of Determination (R²) | Squared Correlation Coefficient

5. Adjusted R²