Python package for stacking featuring lightweight functional API and fully compatible scikit-learn API
Convenient way to automate OOF computation, prediction and bagging using any number of models
- Functional API:
- Minimalistic. Get your stacked features in a single line
- RAM-friendly. The lowest possible memory consumption
- Kaggle-ready. Stacked features and hyperparameters from each run can be automatically saved in files. No more mess at the end of the competition. Log example
- Scikit-learn API:
- Standardized. Fully scikit-learn compatible transformer class exposing
fit
andtransform
methods - Pipeline-certified. Implement and deploy multilevel stacking like it's no big deal using
sklearn.pipeline.Pipeline
- And of course
FeatureUnion
andGridSearchCV
are also invited to the party
- Standardized. Fully scikit-learn compatible transformer class exposing
- Overall specs:
- Use any sklearn-like estimators
- Perform classification and regression tasks
- Predict class labels or probabilities in classification task
- Apply any user-defined metric
- Apply any user-defined transformations for target and prediction
- Python 2, Python 3
- Win, Linux, Mac
- MIT license
- Depends on numpy, scipy, scikit-learn>=18.0
- Installation guide
- Usage:
- Tutorials:
- Examples:
- Functional API:
- Scikit-learn API:
- Documentation:
- Functional API or type
>>> help(stacking)
- Scikit-learn API or type
>>> help(StackingTransformer)
- Functional API or type
Note: On Linux don't forget to use pip/pip3
(or python/python3
) to install package for desired version
- Classic 1st time installation (recommended):
pip install vecstack
- Install for current user only (if you have some troubles with write permission):
pip install --user vecstack
- If your PATH doesn't work:
/usr/bin/python -m pip install vecstack
C:/Python36/python -m pip install vecstack
- Upgrade vecstack and all dependencies:
pip install --upgrade vecstack
- Upgrade vecstack WITHOUT upgrading dependencies:
pip install --upgrade --no-deps vecstack
- Upgrade directly from GitHub WITHOUT upgrading dependencies:
pip install --upgrade --no-deps https://github.com/vecxoz/vecstack/archive/master.zip
- Uninstall
pip uninstall vecstack
from vecstack import stacking
# Get your data
# Initialize 1st level estimators
models = [LinearRegression(),
Ridge(random_state=0)]
# Get your stacked features in a single line
S_train, S_test = stacking(models, X_train, y_train, X_test, regression=True, verbose=2)
# Use 2nd level estimator with stacked features
from vecstack import StackingTransformer
# Get your data
# Initialize 1st level estimators
estimators = [('lr', LinearRegression()),
('ridge', Ridge(random_state=0))]
# Initialize StackingTransformer
stack = StackingTransformer(estimators, regression=True, verbose=2)
# Fit
stack = stack.fit(X_train, y_train)
# Get your stacked features
S_train = stack.transform(X_train)
S_test = stack.transform(X_test)
# Use 2nd level estimator with stacked features
- We want to predict train set and test set with some 1st level model(s), and then use these predictions as features for 2nd level model(s).
- Any model can be used as 1st level model or 2nd level model.
- To avoid overfitting (for train set) we use cross-validation technique and in each fold we predict out-of-fold (OOF) part of train set.
- The common practice is to use from 3 to 10 folds.
- Predict test set:
- Variant A: In each fold we predict test set, so after completion of all folds we need to find mean (mode) of all temporary test set predictions made in each fold.
- Variant B: We do not predict test set during cross-validation cycle. After completion of all folds we perform additional step: fit model on full train set and predict test set once. This approach takes more time because we need to perform one additional fitting.
- As an example we look at stacking implemented with single 1st level model and 3-fold cross-validation.
- Pictures:
- Variant A: Three pictures describe three folds of cross-validation. After completion of all three folds we get single train feature and single test feature to use with 2nd level model.
- Variant B: First three pictures describe three folds of cross-validation (like in Variant A) to get single train feature and fourth picture describes additional step to get single test feature.
- We can repeat this cycle using other 1st level models to get more features for 2nd level model.
- You can also look at animation of Variant A and Variant B.