Bagging is a technique for merging the outputs of various models (for example, all decision trees) to produce a more generic result. Here's a question for you: Will it be useful if you develop all the models on the same set of data and integrate them? Given the same input, there's a good likelihood that these models will produce the same outcome. So, what are our options for resolving this issue? Bootstrapping is one of the techniques.
Bootstrapping is a sampling method that involves replacing subsets of observations from the original dataset. The size of the subsets is equal to the original set's size.
These subsets (bags) are used in the Bagging (or Bootstrap Aggregating) technique to acquire a good picture of the distribution (complete set). The size of the bagging subsets may be smaller than the original set.
- From the original dataset, many subsets are formed by replacing observations.
- On each of these subsets, a basic model (weak model) is generated.
- The models are independent of one another and run in parallel.
- The final predictions are determined by merging all of the models' projections.
Boosting is a sequential procedure in which each successive model seeks to rectify the prior model's mistakes. The models that follow are dependent on the prior model. Let's have a look at how boosting works in the steps below.
- A subset is created from the original dataset.
- Initially, all data points are given equal weights.
- A base model is created on this subset.
- This model is used to make predictions on the whole dataset
- Errors are calculated using the actual values and predicted values.
- The observations which are incorrectly predicted, are given higher weights.
- Another model is created and predictions are made on the dataset.
- Similarly, multiple models are created, each correcting the errors of the previous model.
- The final model (strong learner) is the weighted mean of all the models (weak learners).
- Bagging meta-estimator
- Random forest
- AdaBoost
- GBM
- XGBM
- Light GBM
- CatBoost
-
Just run
jupyter notebook
in terminal and it will run in your browser.Install Jupyter here i've you haven't.
-
install xgboost by using
pip install xgb
in command line prompt/ anconda i've you haven't.
- xgboost
- Pandas
- Scikit-Learn &
- seaborn
Please take a look notebook in folder, data set already uploaded in folders. And detailed Explaination of each steps are mentioned.