Skip to content

Model Management in Python. Steps involved in Model Validation and tuning. Testing Model Assumptions in Factor Analysis with OLS Regression.

Notifications You must be signed in to change notification settings

s1dewalker/Model_Validation

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

65 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Description


Example 1: Model validation of Assumptions of Linear regression in Fama French 3-Factor Model

1. Checking Multicollinearity of features or independent variables w/ Correlation matrix

2. Checking Linearity w/ Scatter plots

3. Checking Independence of residuals w/ Autocorrelation Function (ACF) and D-W test

4. Checking Normality of residuals w/ histogram

5. Checking Homoscedasticity (equal variance) of Residuals w/ scatter plot of residuals and fitted values


Description

Description


Consequences:

1. Multicollinearity = Redundancy = It will be difficult for the model to find which feature is actually contributing to predict the target

2. Non-linearity = Model won't capture the relationship closely, leading to large errors in fitting

3. Autocorrelation in residuals = Missing something important. Check for some important feature

4. Non-Normality of residuals = Assumption of tests of having a normal distribution on residuals won't hold. Apply transformations on features.

5. No Homoscedasticity of residuals = less precision in estimates



Example 2: Model validation and tuning in Random Forest Regression, on a continuos data

1. Get the data

2. Define the target (y) and features (X)

3. Split the data into training and testing set (validation if required)

4. Initiate a model, set parameters, and Fit the training set | X_train, y_train

5. Predict on X_test

6. Accuracy or Error metrics on y_test | Ex: R squared

7. Bias-Variance trade-off check | Balancing underfitting and overfitting

8. Iterate to tune the model (from step 4)

9. Cross Validation | if model not generalizing well

10. Selecting the best model w/ Hyperparameter tuning




Few Details:

Bias-Variance trade-off

Description

Bias = failing to find relationship b/w data and response = ERROR due to OVERLY SIMPLISTIC models (underfitting)

Variance = following training data too closely = ERROR due to OVERLY COMPLEX models (overfitting) that are SENSITIVE TO FLUCTUATIONS (noise) in the training data


High Bias + Low Variance: Underfitting (simpler models)
Low Bias + High Variance: Overfitting (complex models)

Training error high = Underfitting

Testing error >> Training error = Overfitting


Cross Validation - An efficient method to find the balance

Description

by sharpsightlabs.com

Splitting data into distinct subsets. Each subset used once as a test set while the remaining as training set. Results from all splits are averaged.


Why use?
  • Better Generalization: If our models are not generalizing well (Generalization refers to a model's ability to perform well on new, unseen data, not just the data it was trained on)
  • Reliable Evaluation
  • Efficient use of data (if we have limited data)

Types:

  1. cross_val_score

Description


  1. Leave-one-out-cross-validation (LOOCV)

Use when data is limited, but computationally expensive
Each data point is used as a test set

cv = X.shape[0]


About

Model Management in Python. Steps involved in Model Validation and tuning. Testing Model Assumptions in Factor Analysis with OLS Regression.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published