vecstack

Python package for stacking featuring lightweight functional API and fully compatible scikit-learn API
Convenient way to automate OOF computation, prediction and bagging using any number of models

Functional API:
- Minimalistic. Get your stacked features in a single line
- RAM-friendly. The lowest possible memory consumption
- Kaggle-ready. Stacked features and hyperparameters from each run can be automatically saved in files. No more mess at the end of the competition. Log example
Scikit-learn API:
- Standardized. Fully scikit-learn compatible transformer class exposing fit and transform methods
- Pipeline-certified. Implement and deploy multilevel stacking like it's no big deal using sklearn.pipeline.Pipeline
- And of course FeatureUnion and GridSearchCV are also invited to the party
Overall specs:
- Use any sklearn-like estimators
- Perform classification and regression tasks
- Predict class labels or probabilities in classification task
- Apply any user-defined metric
- Apply any user-defined transformations for target and prediction
- Python 2, Python 3
- Win, Linux, Mac
- MIT license
- Depends on numpy, scipy, scikit-learn>=18.0

Get started

Installation guide
Usage:
- Functional API
- Scikit-learn API
Tutorials:
- Stacking concept + Pictures + Stacking implementation from scratch
Examples:
- Functional API:
- Scikit-learn API:
  - Regression + Multilevel stacking using Pipeline
Documentation:
- Functional API or type >>> help(stacking)
- Scikit-learn API or type >>> help(StackingTransformer)

Installation

Note: On Linux don't forget to use pip/pip3 (or python/python3) to install package for desired version

Classic 1st time installation (recommended):
- pip install vecstack
Install for current user only (if you have some troubles with write permission):
- pip install --user vecstack
If your PATH doesn't work:
- /usr/bin/python -m pip install vecstack
- C:/Python36/python -m pip install vecstack
Upgrade vecstack and all dependencies:
- pip install --upgrade vecstack
Upgrade vecstack WITHOUT upgrading dependencies:
- pip install --upgrade --no-deps vecstack
Upgrade directly from GitHub WITHOUT upgrading dependencies:
- pip install --upgrade --no-deps https://github.com/vecxoz/vecstack/archive/master.zip
Uninstall
- pip uninstall vecstack

Usage. Functional API

from vecstack import stacking

# Get your data

# Initialize 1st level estimators
models = [LinearRegression(),
          Ridge(random_state=0)]

# Get your stacked features in a single line
S_train, S_test = stacking(models, X_train, y_train, X_test, regression=True, verbose=2)

# Use 2nd level estimator with stacked features

Usage. Scikit-learn API

from vecstack import StackingTransformer

# Get your data

# Initialize 1st level estimators
estimators = [('lr', LinearRegression()),
              ('ridge', Ridge(random_state=0))]
              
# Initialize StackingTransformer
stack = StackingTransformer(estimators, regression=True, verbose=2)

# Fit
stack = stack.fit(X_train, y_train)

# Get your stacked features
S_train = stack.transform(X_train)
S_test = stack.transform(X_test)

# Use 2nd level estimator with stacked features

Stacking concept

We want to predict train set and test set with some 1st level model(s), and then use these predictions as features for 2nd level model(s).
Any model can be used as 1st level model or 2nd level model.
To avoid overfitting (for train set) we use cross-validation technique and in each fold we predict out-of-fold (OOF) part of train set.
The common practice is to use from 3 to 10 folds.
Predict test set:
- Variant A: In each fold we predict test set, so after completion of all folds we need to find mean (mode) of all temporary test set predictions made in each fold.
- Variant B: We do not predict test set during cross-validation cycle. After completion of all folds we perform additional step: fit model on full train set and predict test set once. This approach takes more time because we need to perform one additional fitting.
As an example we look at stacking implemented with single 1st level model and 3-fold cross-validation.
Pictures:
- Variant A: Three pictures describe three folds of cross-validation. After completion of all three folds we get single train feature and single test feature to use with 2nd level model.
- Variant B: First three pictures describe three folds of cross-validation (like in Variant A) to get single train feature and fourth picture describes additional step to get single test feature.
We can repeat this cycle using other 1st level models to get more features for 2nd level model.
You can also look at animation of Variant A and Variant B.

Name		Name	Last commit message	Last commit date
Latest commit History 169 Commits
examples		examples
pic		pic
tests		tests
vecstack		vecstack
.gitignore		.gitignore
.travis.yml		.travis.yml
LICENSE.txt		LICENSE.txt
README.md		README.md
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

vecstack

Get started

Installation

Usage. Functional API

Usage. Scikit-learn API

Stacking concept

Variant A

Variant A. Animation

Variant B

Variant B. Animation

About

Releases

Packages

Languages

License

wangdq1989/vecstack

Folders and files

Latest commit

History

Repository files navigation

vecstack

Get started

Installation

Usage. Functional API

Usage. Scikit-learn API

Stacking concept

Variant A

Variant A. Animation

Variant B

Variant B. Animation

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages