Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Error in running model.predict with RegressionEnsembleModel #1340

Closed
TheNumbersAI opened this issue Nov 6, 2022 · 4 comments · Fixed by #1357
Closed

[BUG] Error in running model.predict with RegressionEnsembleModel #1340

TheNumbersAI opened this issue Nov 6, 2022 · 4 comments · Fixed by #1357
Labels
bug Something isn't working triage Issue waiting for triaging

Comments

@TheNumbersAI
Copy link

TheNumbersAI commented Nov 6, 2022

First of all, I really love the DARTS software! Magnificent! But I think there is an issue with RegressionEnsembleModel. Any help or guidance would be appreciated.

Describe the bug
When using past_covariates (and no future_covariates) with a RegressionEnsembleModel, I get an error such as this:

ERROR: ValueError: The corresponding future_covariate of the series at index 0 isn't sufficiently long. Given horizon n=1, min(lags_future_covariates)=0, max(lags_future_covariates)=0 and output_chunk_length=1
the future_covariate has to range from 2022-10-03 00:00:00 until 2022-10-03 00:00:00 (inclusive), but it ranges only from 2022-10-10 00:00:00 until 2022-10-10 00:00:00.

To Reproduce
Here is a code snippet, although it is a challenge uploading sample data etc. I've simplified it here to remove some of the training etc details

series is time series data
train_transformed is time series data sampled from start = 0.5
past_covariates is also time series data
both series and past_covariates have the same range, and are transformed via a scaler per darts documentation

window = 5
lags = [-1, -2, -5]
my_model1 = LinearRegressionModel(lags=lags, output_chunk_length=window, lags_past_covariates=lags)
my_model2 = LightGBMModel(lags=lags, output_chunk_length=window, lags_past_covariates=lags)
my_model1.fit(train_transformed, past_covariates=past_covariates)
my_model2.fit(train_transformed, past_covariates=past_covariates)
my_ensemble_model = RegressionEnsembleModel([my_model1, my_model2], regression_train_n_points=2 * window) # Strangely when I set this to be "window", I get an error during backtesting or prediction, but doubling at least bypasses any backtesting error.
backtest = my_ensemble_model.historical_forecasts(series, start=0.5, last_points_only=True, forecast_horizon=window, stride=1, verbose=True, past_covariates=past_covariates)
prediction = my_ensemble_model.predict(n=window, series=series, past_covariates=past_covariates)

Expected behavior
I expect to get a time series of prediction data, similar to what I get with any other DARTS model, which I am able to do with the underlying components of the RegressionEnsembleModel for example; LinearRegressionModel and LightGBMModel work as I have defined them. However, I get the ValueError above. Trying different values for n in predicting also gets the same error.

System (please complete the following information):

  • Python 3.10.6
  • darts==0.22.0

Additional context
I have no problem getting the NaiveEnsembleModel to work with the same parameters when I instantiate it and use it to make a prediction, it's only RegressionEnsembleModel which fails. I'm stumped!

Detailed stack trace:

2022-11-06 13:26:54 main_logger ERROR: ValueError: The corresponding future_covariate of the series at index 0 isn't sufficiently long. Given horizon n=1, min(lags_future_covariates)=0, max(lags_future_covariates)=0 and output_chunk_length=1
the future_covariate has to range from 2022-10-03 00:00:00 until 2022-10-03 00:00:00 (inclusive), but it ranges only from 2022-10-10 00:00:00 until 2022-10-10 00:00:00.

ValueError Traceback (most recent call last)
Cell In [36], line 1
----> 1 prediction = model.predict(n=1, series=series, past_covariates=all_covariates) # still need to set n=window
2 o_prediction = scaler.inverse_transform(prediction)

File ~/Documents/beachhome/lib/python3.10/site-packages/darts/models/forecasting/ensemble_model.py:172, in EnsembleModel.predict(self, n, series, past_covariates, future_covariates, num_samples)
163 predictions = self._make_multiple_predictions(
164 n=n,
165 series=series,
(...)
168 num_samples=num_samples,
169 )
171 if self.is_single_series:
--> 172 return self.ensemble(predictions)
173 else:
174 return self.ensemble(predictions, series)

File ~/Documents/beachhome/lib/python3.10/site-packages/darts/models/forecasting/regression_ensemble_model.py:161, in RegressionEnsembleModel.ensemble(self, predictions, series)
158 predictions = [predictions]
159 series = [series]
--> 161 ensembled = [
162 self.regression_model.predict(
163 n=len(prediction), series=serie, future_covariates=prediction
164 )
165 for serie, prediction in zip(series, predictions)
166 ]
168 return ensembled[0] if self.is_single_series else ensembled

File ~/Documents/beachhome/lib/python3.10/site-packages/darts/models/forecasting/regression_ensemble_model.py:162, in (.0)
158 predictions = [predictions]
159 series = [series]
161 ensembled = [
--> 162 self.regression_model.predict(
163 n=len(prediction), series=serie, future_covariates=prediction
164 )
165 for serie, prediction in zip(series, predictions)
166 ]
168 return ensembled[0] if self.is_single_series else ensembled

File ~/Documents/beachhome/lib/python3.10/site-packages/darts/models/forecasting/regression_model.py:553, in RegressionModel.predict(self, n, series, past_covariates, future_covariates, num_samples, **kwargs)
550 last_req_ts = last_pred_ts + lags[-1] * ts.freq
552 # check for sufficient covariate data
--> 553 raise_if_not(
554 cov.start_time() <= first_req_ts
555 and cov.end_time() >= last_req_ts,
556 f"The corresponding {cov_type}_covariate of the series at index {idx} isn't sufficiently long. "
557 f"Given horizon n={n}, min(lags_{cov_type}_covariates)={lags[0]}, "
558 f"max(lags_{cov_type}_covariates)={lags[-1]} and "
559 f"output_chunk_length={self.output_chunk_length}\n"
560 f"the {cov_type}_covariate has to range from {first_req_ts} until {last_req_ts} (inclusive), "
561 f"but it ranges only from {cov.start_time()} until {cov.end_time()}.",
562 )
564 # Note: we use slice() rather than the [] operator because
565 # for integer-indexed series [] does not act on the time index.
566 last_req_ts = (
567 # For range indexes, we need to make the end timestamp inclusive here
568 last_req_ts + ts.freq
569 if ts.has_range_index
570 else last_req_ts
571 )

File ~/Documents/beachhome/lib/python3.10/site-packages/darts/logging.py:78, in raise_if_not(condition, message, logger)
76 if not condition:
77 logger.error("ValueError: " + message)
---> 78 raise ValueError(message)

ValueError: The corresponding future_covariate of the series at index 0 isn't sufficiently long. Given horizon n=1, min(lags_future_covariates)=0, max(lags_future_covariates)=0 and output_chunk_length=1
the future_covariate has to range from 2022-10-03 00:00:00 until 2022-10-03 00:00:00 (inclusive), but it ranges only from 2022-10-10 00:00:00 until 2022-10-10 00:00:00.

@TheNumbersAI TheNumbersAI added bug Something isn't working triage Issue waiting for triaging labels Nov 6, 2022
@dennisbader
Copy link
Collaborator

dennisbader commented Nov 13, 2022

Hey @Jason-Merkoski and thanks for raising this issues.
It is indeed a bug where we reused the training series in RegressionEnsembleModel.predict() if only a single target series was used at fitting time. I will work on a fix.

The future covariates here is specific to the Ensemble models where we use the output of your two sub models as a future covariates.

@TheNumbersAI
Copy link
Author

@dennisbader could you reopen this and take a look? After I delete my darts installation, and rebuild from scratch after purging the pip cache, to pull changes from "main" I still cannot use historical_forecasts or predict. Maybe I'm missing something? My code is unchanged from above.

File "/lib/python3.10/site-packages/darts/utils/utils.py", line 172, in sanitized_method
return method_to_sanitize(self, *only_args.values(), **only_kwargs)
File "/lib/python3.10/site-packages/darts/models/forecasting/forecasting_model.py", line 500, in historical_forecasts
forecast = self._predict_wrapper(
File "/lib/python3.10/site-packages/darts/models/forecasting/forecasting_model.py", line 1228, in _predict_wrapper
return self.predict(
File "/lib/python3.10/site-packages/darts/models/forecasting/ensemble_model.py", line 172, in predict
return self.ensemble(predictions)
File "/lib/python3.10/site-packages/darts/models/forecasting/regression_ensemble_model.py", line 161, in ensemble
ensembled = [
File "/lib/python3.10/site-packages/darts/models/forecasting/regression_ensemble_model.py", line 162, in
self.regression_model.predict(
File "/lib/python3.10/site-packages/darts/models/forecasting/regression_model.py", line 613, in predict
covariate_matrices[cov_type][
IndexError: index 1 is out of bounds for axis 1 with size 1

@dennisbader
Copy link
Collaborator

dennisbader commented Nov 14, 2022

It works on my end.
I wonder why this line is actually not crashing for you:

my_ensemble_model = RegressionEnsembleModel([my_model1, my_model2], regression_train_n_points=2 * window)

According your example you fit my_model1, and my_model2 before creating the ensemble model. This should raise an error like the following:

ValueError: Cannot instantiate EnsembleModel with trained/fitted models. Consider resetting all models with `my_model.untrained_model()`

I get your code working with some dummy timeseries:

from darts.models import LinearRegressionModel, LightGBMModel, RegressionEnsembleModel
from darts.utils.timeseries_generation import linear_timeseries
from darts.dataprocessing.transformers import Scaler

scaler = Scaler()
series = linear_timeseries(length=100)

ts_train, ts_val = series[:50], series[50:]
train_transformed = scaler.fit_transform(ts_train)

past_covariates = series

window = 5
lags = [-1, -2, -5]

my_model1 = LinearRegressionModel(lags=lags, output_chunk_length=window, lags_past_covariates=lags)
my_model2 = LightGBMModel(lags=lags, output_chunk_length=window, lags_past_covariates=lags)

# do not fit the models before creating the ensemble models
my_ensemble_model = RegressionEnsembleModel([my_model1, my_model2], regression_train_n_points=2*window)
backtest = my_ensemble_model.historical_forecasts(
    series,
    start=0.5,
    last_points_only=True,
    forecast_horizon=window,
    stride=1,
    verbose=True,
    past_covariates=past_covariates
)

prediction = my_ensemble_model.predict(n=window, series=series, past_covariates=past_covariates)

@TheNumbersAI
Copy link
Author

TheNumbersAI commented Nov 14, 2022

Thanks for the speedy response @dennisbader !

As for the fitting of each input model, my sample code for reproducing the issue missed that part; in my production code I do use a construct like models = [m.untrained_model() for m in models] as inputs.

The fix works now, much obliged, and I look forward to using your great software!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working triage Issue waiting for triaging
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants