Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feat/probabilistic ensemble #1692

Merged
merged 32 commits into from
Jun 2, 2023
Merged

Feat/probabilistic ensemble #1692

merged 32 commits into from
Jun 2, 2023

Conversation

madtoinou
Copy link
Collaborator

@madtoinou madtoinou commented Apr 4, 2023

Fixes #1682, fixes #1600.

Summary

The num_sample argument is properly passed to regression_model.predict() in RegressionEnsembleModel.ensemble().

In order to obtain probabilistic predictions, all the models of the RegressionEnsembleModel need to be probabilistic, including its regression_model.

It's however still possible to obtain deterministic predictions if the regression_model is not probabilistic but the others are or by using num_sample=1.

Other Information

The definition of a probabilistic EnsembleModel should be better defined; at the moment, it requires that all its model are probabilistic however for a RegressionEnsembleModel, we might also want the regression model itself to be probabilistic? Or on the opposite, ss a probabilistic regression_model enough to make a RegressionEnsembleModel probabilistic even if all the other models are deterministic?

@codecov-commenter
Copy link

codecov-commenter commented Apr 4, 2023

Codecov Report

Patch coverage: 97.36% and project coverage change: -0.12 ⚠️

Comparison is base (1f17580) 94.27% compared to head (d0dd636) 94.16%.

❗ Your organization is not using the GitHub App Integration. As a result you may experience degraded service beginning May 15th. Please install the Github App Integration for your organization. Read more.

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #1692      +/-   ##
==========================================
- Coverage   94.27%   94.16%   -0.12%     
==========================================
  Files         125      125              
  Lines       11607    11622      +15     
==========================================
+ Hits        10943    10944       +1     
- Misses        664      678      +14     
Impacted Files Coverage Δ
darts/models/forecasting/ensemble_model.py 96.58% <96.96%> (-0.01%) ⬇️
darts/models/forecasting/baselines.py 100.00% <100.00%> (ø)
...ts/models/forecasting/regression_ensemble_model.py 100.00% <100.00%> (ø)

... and 12 files with indirect coverage changes

☔ View full report in Codecov by Sentry.
📢 Do you have feedback about the report comment? Let us know in this issue.

@madtoinou madtoinou marked this pull request as ready for review May 3, 2023 11:50
@madtoinou madtoinou requested review from hrzn and dennisbader as code owners May 3, 2023 11:50
@madtoinou
Copy link
Collaborator Author

Simplified the definition of probabilistic RegressionEnsembleModel : its regression model (the "last/aggregation layer") must be probabilistic, regardless of the others models characteristics.

Having probabilistic models is allowed, they will be trained as such but their prediction will contain only one sample when passed to the regression model at fitting and prediction time (the first one, due to the way RegressionModel._create_lagged_data() get rid of the n_samples dimension).

Attaching two codes snippets to demonstrate how to get probabilistic forecast with EnsembleModel:

from darts.models import LinearRegressionModel, LightGBMModel, RegressionEnsembleModel
from darts.utils.timeseries_generation import sine_timeseries, gaussian_timeseries
import matplotlib.pyplot as plt

# creating synthetic data
start = 10
end = 400
sin_series = sine_timeseries(start=start, end=end, value_amplitude=10)
gaus_series = gaussian_timeseries(mean=10, start=start, end=end)

tmp = sin_series + gaus_series
train, val = tmp.split_after(0.8)

quantiles = [0.25, 0.5, 0.75]

# probabilistic ensembling model
ensemble_lin_reg = LinearRegressionModel(quantiles=quantiles,
                                        lags_future_covariates=[0],
                                        likelihood="quantile")

# probabilistic models
lgbm_model = LightGBMModel(quantiles=quantiles, lags=4, likelihood="quantile")
linreg_model = LinearRegressionModel(quantiles=quantiles, lags=4, likelihood="quantile")

ensemble = RegressionEnsembleModel([lgbm_model, linreg_model],
                                   regression_train_n_points=140,
                                   regression_model=ensemble_lin_reg
                                   )

ensemble.fit(train)
pred = ensemble.predict(len(val), num_samples=1000)

val.plot(label="test")
pred.plot(label="prediction")
plt.show()

all_probabilistic

# deterministic models
lgbm_model = LightGBMModel(lags=4)
linreg_model = LinearRegressionModel(lags=4)

ensemble = RegressionEnsembleModel([lgbm_model, linreg_model],
                                   regression_train_n_points=140,
                                   regression_model=ensemble_lin_reg
                                   )

ensemble.fit(train)
pred = ensemble.predict(len(val), num_samples=1000)

val.plot(label="test")
pred.plot(label="prediction")
plt.show()

aggreg_probabilistic

@review-notebook-app
Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

@madtoinou
Copy link
Collaborator Author

Extended the feature: if the forecasting_models are probabilistic, they can be sampled to generate the training dataset for the regression model (instead of just taking the first sample). This is controlled with the regression_train_num_samples argument.

These samples are then reduced component-wise (one for each forecasting model) using either the mean or a quantile (using the regression_train_samples_reduction argument) before being used to train the regression model.

Also, NaiveEnsemble can generate probabilistic forecasts if all its models are probabilistic; samples are averaged across the models/components and the n_samples dimension remains untouched.

@review-notebook-app
Copy link

review-notebook-app bot commented May 19, 2023

View / edit / reply to this conversation on ReviewNB

dennisbader commented on 2023-05-19T14:46:01Z
----------------------------------------------------------------

To make the RegressionEnsembleModel probabilistic, we simply have to use a probabilistic regression model:


Copy link
Collaborator

@dennisbader dennisbader left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice @madtoinou, thanks a lot 🚀 👍

There are one or two points where we could simplify a bit.

Regarding the non-reduction of the samples, I thought more about stacking the points vertically to generate more rows, instead of adding the dimensions as columns.

Maybe we can discuss this again?

Copy link
Collaborator

@dennisbader dennisbader left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, thanks a lot ! :)
We're close! Mainly added a couple of minor suggestions. I think we could allow a mix of deterministic and probabilistic forecasting_models for RegressionEnsembleModel as internally we reduce the probabilistic ones anyway to a deterministic forecast. WDYT?

@@ -26,10 +27,24 @@ class EnsembleModel(GlobalForecastingModel):
----------
models
List of forecasting models whose predictions to ensemble

.. note::
if all the models are probabilistic, the `EnsembleModel` will also be probabilistic.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is true for naive ensemble but not for RegressionEnsembleModel, right?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct, the docstring is different in RegressionEnsembleModel. This note could probably be removed since EnsembleModel cannot be instantiated anyway.

@@ -69,6 +92,52 @@ def __init__(
f"{regression_model.lags}",
)

raise_if(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we move all those tests to the base class?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, there will be a small discrepancies since the name of the argument/attributes are slightly different between the two classes (regression_ is used as a prefix in RegressionEnsembleModel)

Copy link
Collaborator

@dennisbader dennisbader left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work @madtoinou, thanks a lot! 🚀

One more test about the mixed proba model and we're good to go!

# forecasting models are a mix of probabilistic and deterministic, probabilistic regressor
ensemble_mixproba = RegressionEnsembleModel(
forecasting_models=[
self.get_probabilistic_global_model([-1, -3], quantiles),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you check also mixed proba model with train_num_samples > 1 and a reduction?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

had to move the reduction to _make_multiple_prediction() so that they could be stacked properly

Copy link
Collaborator

@dennisbader dennisbader left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perfect 💯 Great work @madtoinou!

@dennisbader dennisbader merged commit 80c0e5f into master Jun 2, 2023
@dennisbader dennisbader deleted the feat/probabilistic_ensemble branch June 2, 2023 14:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Archived in project
3 participants