Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[<Library component: Model|Core|etc...>] ValueError: xreg is rank deficient #993

Open
sumathysubramanian20 opened this issue Mar 3, 2025 · 0 comments
Labels

Comments

@sumathysubramanian20
Copy link

What happened + What you expected to happen

My input dataframe to the statsforecast model has the following columns:
'unique_id', 'ds', 'negative_arr_ccfx', 'monthly_arr_for_renewal', 'annual_arr_for_renewal', 'monthly_plan_asp', 'annual_plan_asp', 'entry_date_week_of_month_eq2', 'entry_date_week_of_month_eq3', 'entry_date_week_of_month_eq4', 'entry_date_week_of_month_eq5'

sf = StatsForecast(models=models, freq="W-WED", n_jobs=-1)
level = [95]
fcst = sf.forecast(df=train, h=h, X_df=X_fcst, level=level)

I am backtesting this model for the last 40 weeks since friday. OI was able to run this until friday and when i try to continue my backtesting process today, it gives me xreg is rank deficient error.
If i remove annual_plan_asp as an input the model runs. But the problem is that monthly_plan_asp and annual_plan_asp do not have redundant values. Ideally I would like to give both as inputs.
Please help me with this.

Versions / Dependencies

"""
Nixtla's forecast function with exogenous variables

  • Create train, test and forecast datasets to apply the nixtla's forecast model
    """
    def exog_nixtla(df, y_var, n_weeks_to_test=10):
    df = df.copy()
    """

    • Convert the 'ds column to date
    • Name the objective column 'y'. Nixtla wants this
      """
      df["ds"] = df["ds"].dt.date
      df = df.rename(columns={y_var: "y"})

    """

    • create 2 new dataframes:
      1. records_with_objective_missing - where the 'y' variable is null
      2. records_with_objective_present - everything from df that is not records_with_objective_missing
    • Then create 2 new dataframes with just unique 'ds':
      1. dates_with_objective_missing - unique ds from records_with_objective_missing
      2. dates_with_objective_present - unique ds from records_with_objective_present
        """
        records_with_objective_missing = df["y"].isnull()
        records_with_objective_present = ~records_with_objective_missing
        dates_with_objective_missing = np.sort(
        df[records_with_objective_missing]["ds"].unique()
        )
        dates_with_objective_present = np.sort(
        df[records_with_objective_present]["ds"].unique()
        )
        """
        Next steps:
    • Add logic to make sure that there are no missing values in the past...
    • And if there are, then return a value error to make this assumption clear to the user.
      """

    """
    Further create 4 dataframes which just consist of ds which will later be used to create train, test amd forecast dataset
    1. dates - all the unique ds where 'y' variable is present (dates_with_objective_present created above)
    2. dtrain - all the unique ds in dates created above minus the weeks to test
    3. dtest - all the unique ds in dates created above for only the weeks to test
    4. dfcst - all the unique ds where 'y' variable is absent (dates_with_objective_missing created above)

    • Test the model on the final n_weeks_to_test weeks
    • Train the model on the first n weeks where n + n_weeks_to_test = total number of weeks
      """
      dates = dates_with_objective_present
      dtrain = dates[:-n_weeks_to_test]
      dtest = dates[-n_weeks_to_test:]
      dfcst = dates_with_objective_missing

    """

    • h is the forecast window
    • h_test is the forecast window for the test period
      """
      h = len(dfcst)
      h_test = len(dtest)

    Y_ts = df.copy()
    X_ts = df.copy()

    """

    • Y_ts is a copy of original dataframe with only unique_id, ds and y columns
    • X_ts is a copy of original dataframe with all columns except y column
    • Y_train and X_train are dataframes where Y_ts and X_ts are filtered for ds in dtrain
    • Y_test and X_test are dataframes where Y_ts and X_ts are filtered for ds in dtest
    • Y_fcst is a dataframe where Y_ts is filtered for ds in dates_with_objective_present
    • X_fcst is a dataframe where X_ts is filtered for ds in dates_with_objective_missing
    • train is a dataframe where Y_fcst is merged with X_ts on unique_id and ds
      """
      Y_ts = Y_ts[["unique_id", "ds", "y"]]
      X_ts = X_ts.drop(labels=["y"], axis="columns")

    Y_train = Y_ts.query("ds in @DTRAIN")
    Y_test = Y_ts.query("ds in @dtest")

    X_train = X_ts.query("ds in @DTRAIN")
    X_test = X_ts.query("ds in @dtest")

    Y_fcst = Y_ts.query("ds in @dates_with_objective_present")
    X_fcst = X_ts.query("ds in @dates_with_objective_missing")
    X_fcst["ds"] = X_fcst.ds.astype('datetime64[ns]')

    train = Y_fcst.merge(X_ts, how="left", on=["unique_id", "ds"])
    train["ds"] = train.ds.astype('datetime64[ns]')
    train["y"] = train.y.astype(float)
    print(train.shape)

    """

    • StatsForecast is a Nixtla function that helps us forecast 'y' variable based on models and frequency specified
    • Forecast method trains on the training dataset and uses the exogenous variables for the training dataset and fcst dataset to forecast 'y' variable for future weeks based on 95% confidence interval
      """
      models = [AutoARIMA()]

    sf = StatsForecast(models = models, freq = 'W', n_jobs = -1)

    sf = StatsForecast(models=models, freq="W-WED", n_jobs=-1)

    level = [95]
    fcst = sf.forecast(df=train, h=h, X_df=X_fcst, level=level)
    fcst = fcst.reset_index()

    return fcst

Reproducible example

# paste your code here

model = exog_nixtla(df = input_df, y_var = 'negative_arr_ccfx')

Issue Severity

None

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant