Skip to content

Python package for stacking (machine learning technique)

License

Notifications You must be signed in to change notification settings

wangdq1989/vecstack

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

PyPI version PyPI license Build Status Coverage Status PyPI pyversions

vecstack

Python package for stacking featuring lightweight functional API and fully compatible scikit-learn API
Convenient way to automate OOF computation, prediction and bagging using any number of models

Get started

Installation

Note: On Linux don't forget to use pip/pip3 (or python/python3) to install package for desired version

  • Classic 1st time installation (recommended):
    • pip install vecstack
  • Install for current user only (if you have some troubles with write permission):
    • pip install --user vecstack
  • If your PATH doesn't work:
    • /usr/bin/python -m pip install vecstack
    • C:/Python36/python -m pip install vecstack
  • Upgrade vecstack and all dependencies:
    • pip install --upgrade vecstack
  • Upgrade vecstack WITHOUT upgrading dependencies:
    • pip install --upgrade --no-deps vecstack
  • Upgrade directly from GitHub WITHOUT upgrading dependencies:
    • pip install --upgrade --no-deps https://github.com/vecxoz/vecstack/archive/master.zip
  • Uninstall
    • pip uninstall vecstack

Usage. Functional API

from vecstack import stacking

# Get your data

# Initialize 1st level estimators
models = [LinearRegression(),
          Ridge(random_state=0)]

# Get your stacked features in a single line
S_train, S_test = stacking(models, X_train, y_train, X_test, regression=True, verbose=2)

# Use 2nd level estimator with stacked features

Usage. Scikit-learn API

from vecstack import StackingTransformer

# Get your data

# Initialize 1st level estimators
estimators = [('lr', LinearRegression()),
              ('ridge', Ridge(random_state=0))]
              
# Initialize StackingTransformer
stack = StackingTransformer(estimators, regression=True, verbose=2)

# Fit
stack = stack.fit(X_train, y_train)

# Get your stacked features
S_train = stack.transform(X_train)
S_test = stack.transform(X_test)

# Use 2nd level estimator with stacked features

Stacking concept

  1. We want to predict train set and test set with some 1st level model(s), and then use these predictions as features for 2nd level model(s).
  2. Any model can be used as 1st level model or 2nd level model.
  3. To avoid overfitting (for train set) we use cross-validation technique and in each fold we predict out-of-fold (OOF) part of train set.
  4. The common practice is to use from 3 to 10 folds.
  5. Predict test set:
    • Variant A: In each fold we predict test set, so after completion of all folds we need to find mean (mode) of all temporary test set predictions made in each fold.
    • Variant B: We do not predict test set during cross-validation cycle. After completion of all folds we perform additional step: fit model on full train set and predict test set once. This approach takes more time because we need to perform one additional fitting.
  6. As an example we look at stacking implemented with single 1st level model and 3-fold cross-validation.
  7. Pictures:
    • Variant A: Three pictures describe three folds of cross-validation. After completion of all three folds we get single train feature and single test feature to use with 2nd level model.
    • Variant B: First three pictures describe three folds of cross-validation (like in Variant A) to get single train feature and fourth picture describes additional step to get single test feature.
  8. We can repeat this cycle using other 1st level models to get more features for 2nd level model.
  9. You can also look at animation of Variant A and Variant B.

Variant A

Fold 1 of 3


Fold 2 of 3


Fold 3 of 3

Variant A. Animation

Variant A. Animation

Variant B

Step 1 of 4


Step 2 of 4


Step 3 of 4


Step 4 of 4

Variant B. Animation

Variant B. Animation

About

Python package for stacking (machine learning technique)

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages

  • Python 100.0%