This project focuses on credit card fraud detection using various sampling methods and machine learning algorithms. Different sampling methods such as simple random sampling, systematic sampling, stratified sampling, cluster sampling, and bootstrap sampling were applied to a credit card transaction dataset. Machine learning algorithms including KMeans, Decision Tree, Random Forest, XGBoost, and LightGBM were then trained on the sampled datasets to predict fraudulent transactions.
The following sampling methods were applied to the dataset:
- Simple Random Sampling: Randomly selecting data points without any specific criteria.
- Systematic Sampling: Selecting every nth data point from the dataset.
- Stratified Sampling: Dividing the dataset into homogeneous groups (strata) based on a specific feature and then sampling from each stratum.
- Cluster Sampling: Dividing the dataset into clusters and then randomly selecting entire clusters for sampling.
- Bootstrap Sampling: Sampling with replacement, allowing some data points to be selected multiple times.
The following machine learning algorithms were trained on the sampled datasets:
- KMeans Classifier
- Decision Tree Classifier
- Random Forest Classifier
- XGBoost (Extreme Gradient Boosting) Classifier
- LightGBM (Light Gradient Boosting Machine) Classifier
Accuracy scores for each combination of sampling method and machine learning algorithm were recorded and stored in tabular form. Random Forest Classifier gave the best accuracy score for each model.