Dissertation

Single Point of Access

Since the EPAD data set has various amounts of data, a single point of access with necessary information would be more appropriate for the model.

Python script data_concat.py is used to create a single file with all the necessary data for modelling.

Data Preprocessing

Data preprocessing has been carried out in data_preprocessing.py.

Feature transformations like hanlding missing values and one-hot encoding have been carried out.

Feature Selection

For determing the importance of each feature, SelectKBest is used in feature_selection.py.

Best features can be analysed with the script.

Random Forest Model

RF model has been devised in random_forest_implementation.py.

Handling Data Imbalance

Synthetic Minority Oversampling Technique (SMOTE) - a very popular oversampling method that was proposed to improve random oversampling has been used on the minority class.

Same way, random undersampling has been performed on the majority class.

A pipeline has been created to ensure the resampling of dataset such that the data becomes balanced.

Cross Validation

Cross validation has been used to ensure that the hyper-parameters selected perform the best.

A pipeline was created for a 10-fold RepeatedStratifiedKFold and RandomizedSearchCV to choose the best parameters for the model.

After the data has been analysed by this pipeline, the chosen parameters have been used for the Random Forest Model.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
Testing		Testing
Training		Training
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Dissertation

Single Point of Access

Data Preprocessing

Feature Selection

Random Forest Model

Handling Data Imbalance

Cross Validation

About

Releases

Packages

Languages

MaheshkumarSundaram/Dissertation

Folders and files

Latest commit

History

Repository files navigation

Dissertation

Single Point of Access

Data Preprocessing

Feature Selection

Random Forest Model

Handling Data Imbalance

Cross Validation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages