- Nidhi
- Jeetesh
- Ramish
- Fatima
- Yukang Lin
This project focuses on predicting customer subscription to term deposits based on historical marketing campaign data. By leveraging machine learning techniques, we aim to identify potential subscribers and optimize marketing efforts.
This dataset is sourced from Kaggle: Bank Term Deposit Predictions.
The dataset contains records of a bank's marketing campaign interactions with customers.
- Training set: 45,211 instances
- Test set: 4,521 instances
- Features: Age, job, marital status, education, balance, contact type, previous campaign outcomes, etc.
- Target Variable:
y
(Indicates whether the customer subscribed to a term deposit:yes
orno
)
Below is a description of each field in the dataset:
Field | Description | Type |
---|---|---|
age |
The age of the customer. | Numerical |
job |
The occupation/employment status of the customer. | Categorical |
marital |
The marital status of the customer. | Categorical |
education |
The education level attained by the customer. | Categorical |
default |
Whether the customer has credit in default. | Categorical |
balance |
The balance in the customer's account. | Numerical |
housing |
Whether the customer has a housing loan. | Categorical |
contact |
Type of communication used to contact customers (phone, cellular, etc.). | Categorical |
day |
Day of the month when customers were last contacted. | Numerical |
duration |
Duration (in seconds) of the last contact with customers during the previous campaign. | Numerical |
pdays |
The number of days passed by after contact from the previous campaign. | Numerical |
poutcome |
Outcome from the previous marketing campaign. | Categorical |
- What factors influence a customer's decision to invest in a term deposit?
- Can the bank's operational status be inferred from customer usage records?
- Can we predict term deposit subscription likelihood and optimize marketing efforts?
- Can we classify customers into potential vs. non-potential subscribers?
- Class imbalance: More negative instances (
no
) than positive (yes
) - Handling missing values and categorical variables
- Feature selection and importance analysis
- No missing values detected
- Replaced
unknown
values ineducation
column with the most frequent value (secondary
) - Merged
month
andday
columns to create a proper date format
- Customers with higher balances and longer call durations were more likely to subscribe.
- Students and retirees showed higher subscription rates.
- Higher education levels correlated with increased subscriptions.
- Clients with no default records, no housing, or personal loans had a greater likelihood of subscribing.
- The month of contact significantly affected the subscription rate.
- Random Forest Classifier
- Provided baseline feature importance.
- Identified balance, duration, and contact type as key features.
- Neural Networks
- Captured complex, non-linear interactions between features.
- Identified education and job type as additional significant predictors.
- Logistic Regression
- Achieved 89.88% accuracy after feature selection.
- Used for baseline comparisons.
- XGBoost Classifier
- Achieved 90.55% accuracy after hyperparameter tuning.
- Strong feature importance insights.
- Decision Tree
- Tuned via GridSearchCV (optimal depth: 5 layers).
- Confusion matrix and ROC-AUC analysis.
- Adaptive Boosting (AdaBoost)
- Improved classification balance.
- Increased True Positive Rate while controlling False Negatives.
- Key Features:
balance
: Higher balance → Higher subscription likelihoodduration
: Longer call duration → Increased success ratemonth
: Certain months (Aug, Sep) showed higher engagementjob & education
: Higher education and job types matteredprevious campaign outcome
: Customers with previous successful interactions were more likely to subscribe
Model | Accuracy |
---|---|
Logistic Regression | 89.88% |
XGBoost | 90.55% |
Decision Tree | 89.01% |
AdaBoost | Improved TPR & FPR |
- Targeted Marketing: Focus on high-balance customers with longer call engagement.
- Optimal Timing: Execute campaigns in August and September for maximum conversion.
- Customer Segmentation: Classify customers based on past interaction success rates.
- Campaign Optimization: Reduce excessive follow-ups to avoid customer fatigue.
- Predictive modeling improves marketing efficiency by identifying potential subscribers.
- Feature engineering plays a crucial role in enhancing model accuracy.
- Advanced techniques like XGBoost and AdaBoost offer superior classification performance.
- Further Hyperparameter Tuning for improved model performance.
- Customer Profiling & Segmentation using clustering techniques.
- A/B Testing for Campaign Strategies to refine marketing effectiveness.