Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] RegressionEnsembleModel not properly handling future_covariates #1814

Closed
chitxxx opened this issue Jun 5, 2023 · 3 comments
Closed
Labels
bug Something isn't working

Comments

@chitxxx
Copy link

chitxxx commented Jun 5, 2023

Describe the bug
.historical_forecasts() in RegressionEnsembleModel is bugged. It returns ValueError: The model has been trained without `future_covariates` variable, but the `future_covariates` parameter provided to `predict()` is not None. When the underlying models uses future_covariates.

To Reproduce
Steps to reproduce the behavior, preferably code snippet.

Expected behavior
I expect the model to feed in the ensemble models, with each underlying model using future_covariates, when using the backtest and historical_forecasts methods

System (please complete the following information):

  • Python version: 3.9
  • darts version 0.23.1

Additional context
Add any other context about the problem here.

@chitxxx chitxxx added bug Something isn't working triage Issue waiting for triaging labels Jun 5, 2023
@madtoinou
Copy link
Collaborator

Hi,

Can you please share a code snippet to reproduce the error? I am pretty sure that this bug should has been fixed by #1745, which be be part of the next release.

@madtoinou madtoinou changed the title [BUG] RegressionEnsembleModel [BUG] RegressionEnsembleModel not properly handling future_covariates Jun 14, 2023
@madtoinou madtoinou removed the triage Issue waiting for triaging label Jun 14, 2023
@philippGraf
Copy link

philippGraf commented Jun 21, 2023

i also have an issue
python 3.8
darts 0.24.0

I guess the extreme_lags attributes is not correct:

import pandas as pd
from sklearn.ensemble import RandomForestRegressor

data = pd.DataFrame(
    {"a": [1] * 24 * 7 * 5, "b": [2] * 24 * 7 * 5},
    index=pd.date_range(start="2023-01-01", freq="1H", periods=24 * 7 * 5),
)

from darts import TimeSeries
from darts.models import RandomForest, RegressionModel
from darts.models.forecasting.regression_ensemble_model import RegressionEnsembleModel
from darts.dataprocessing.transformers import Scaler
from darts.metrics import r2_score, mape

TARGET = "a"
COVARIATES = ["b"]

data = TimeSeries.from_dataframe(data)

data = data.add_datetime_attribute("hour", one_hot=True)

COVARIATES = COVARIATES + [f"hour_{i}" for i in range(1, 25)]


def _run():
    model1 = RandomForest(
        lags_future_covariates=[0],
        n_estimators=500,
        add_encoders={
            "datetime_attribute": {"future": ["hour"]},
            "transformer": Scaler(),
            "position": {"future": "relative"},
        },
    )
    print("forest extreme lags: ", model1.extreme_lags)
    model2 = RegressionModel(
        lags_future_covariates=[0],
        add_encoders={
            # "datetime_attribute": {"future": ["hour"]},
            "transformer": Scaler(),
            "position": {"future": "relative"},
        },
    )
    print("linear extreme lags: ", model2.extreme_lags)

    class M(RegressionEnsembleModel):
        """so monkey it's sexy"""

        @property
        def extreme_lags(
            self,
        ):
            return (None, 0, None, None, 0, 0)

    model = M(
        forecasting_models=[model1, model2],
        regression_train_n_points=50,
        regression_model=RandomForestRegressor(),
    )

    print("ensemble extreme lags: ", model.extreme_lags)
    future_covariates = data[[x for x in data.columns if x in COVARIATES]]

    results = model.historical_forecasts(
        series=data[TARGET],
        future_covariates=future_covariates,
        forecast_horizon=1 * 24,
        stride=1 * 24,
        start=24 * 7 * 4,
        train_length=24 * 7 * 4,
        last_points_only=False,
    )

    return results


d = _run()

works. Without the monkey patch it crashes.

@madtoinou
Copy link
Collaborator

This indeed looks like a bug, could you please open another issue so that we can track it more easily?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants