[<Library component: Model|Core|etc...>] ValueError: xreg is rank deficient #993

sumathysubramanian20 · 2025-03-03T23:45:45Z

What happened + What you expected to happen

My input dataframe to the statsforecast model has the following columns:
'unique_id', 'ds', 'negative_arr_ccfx', 'monthly_arr_for_renewal', 'annual_arr_for_renewal', 'monthly_plan_asp', 'annual_plan_asp', 'entry_date_week_of_month_eq2', 'entry_date_week_of_month_eq3', 'entry_date_week_of_month_eq4', 'entry_date_week_of_month_eq5'

sf = StatsForecast(models=models, freq="W-WED", n_jobs=-1)
level = [95]
fcst = sf.forecast(df=train, h=h, X_df=X_fcst, level=level)

I am backtesting this model for the last 40 weeks since friday. OI was able to run this until friday and when i try to continue my backtesting process today, it gives me xreg is rank deficient error.
If i remove annual_plan_asp as an input the model runs. But the problem is that monthly_plan_asp and annual_plan_asp do not have redundant values. Ideally I would like to give both as inputs.
Please help me with this.

Versions / Dependencies

"""
Nixtla's forecast function with exogenous variables

Create train, test and forecast datasets to apply the nixtla's forecast model
"""
def exog_nixtla(df, y_var, n_weeks_to_test=10):
df = df.copy()
"""
- Convert the 'ds column to date
- Name the objective column 'y'. Nixtla wants this
  """
  df["ds"] = df["ds"].dt.date
  df = df.rename(columns={y_var: "y"})
"""
- create 2 new dataframes:
  1. records_with_objective_missing - where the 'y' variable is null
  2. records_with_objective_present - everything from df that is not records_with_objective_missing
- Then create 2 new dataframes with just unique 'ds':
  1. dates_with_objective_missing - unique ds from records_with_objective_missing
  2. dates_with_objective_present - unique ds from records_with_objective_present
    """
    records_with_objective_missing = df["y"].isnull()
    records_with_objective_present = ~records_with_objective_missing
    dates_with_objective_missing = np.sort(
    df[records_with_objective_missing]["ds"].unique()
    )
    dates_with_objective_present = np.sort(
    df[records_with_objective_present]["ds"].unique()
    )
    """
    Next steps:
- Add logic to make sure that there are no missing values in the past...
- And if there are, then return a value error to make this assumption clear to the user.
  """
"""
Further create 4 dataframes which just consist of ds which will later be used to create train, test amd forecast dataset
1. dates - all the unique ds where 'y' variable is present (dates_with_objective_present created above)
2. dtrain - all the unique ds in dates created above minus the weeks to test
3. dtest - all the unique ds in dates created above for only the weeks to test
4. dfcst - all the unique ds where 'y' variable is absent (dates_with_objective_missing created above)
- Test the model on the final n_weeks_to_test weeks
- Train the model on the first n weeks where n + n_weeks_to_test = total number of weeks
  """
  dates = dates_with_objective_present
  dtrain = dates[:-n_weeks_to_test]
  dtest = dates[-n_weeks_to_test:]
  dfcst = dates_with_objective_missing
"""
- h is the forecast window
- h_test is the forecast window for the test period
  """
  h = len(dfcst)
  h_test = len(dtest)
Y_ts = df.copy()
X_ts = df.copy()

"""
- Y_ts is a copy of original dataframe with only unique_id, ds and y columns
- X_ts is a copy of original dataframe with all columns except y column
- Y_train and X_train are dataframes where Y_ts and X_ts are filtered for ds in dtrain
- Y_test and X_test are dataframes where Y_ts and X_ts are filtered for ds in dtest
- Y_fcst is a dataframe where Y_ts is filtered for ds in dates_with_objective_present
- X_fcst is a dataframe where X_ts is filtered for ds in dates_with_objective_missing
- train is a dataframe where Y_fcst is merged with X_ts on unique_id and ds
  """
  Y_ts = Y_ts[["unique_id", "ds", "y"]]
  X_ts = X_ts.drop(labels=["y"], axis="columns")
Y_train = Y_ts.query("ds in @DTRAIN")
Y_test = Y_ts.query("ds in @dtest")

X_train = X_ts.query("ds in @DTRAIN")
X_test = X_ts.query("ds in @dtest")

Y_fcst = Y_ts.query("ds in @dates_with_objective_present")
X_fcst = X_ts.query("ds in @dates_with_objective_missing")
X_fcst["ds"] = X_fcst.ds.astype('datetime64[ns]')

train = Y_fcst.merge(X_ts, how="left", on=["unique_id", "ds"])
train["ds"] = train.ds.astype('datetime64[ns]')
train["y"] = train.y.astype(float)
print(train.shape)

"""
- StatsForecast is a Nixtla function that helps us forecast 'y' variable based on models and frequency specified
- Forecast method trains on the training dataset and uses the exogenous variables for the training dataset and fcst dataset to forecast 'y' variable for future weeks based on 95% confidence interval
  """
  models = [AutoARIMA()]
sf = StatsForecast(models = models, freq = 'W', n_jobs = -1)

sf = StatsForecast(models=models, freq="W-WED", n_jobs=-1)

level = [95]
fcst = sf.forecast(df=train, h=h, X_df=X_fcst, level=level)
fcst = fcst.reset_index()

return fcst

Reproducible example

# paste your code here

model = exog_nixtla(df = input_df, y_var = 'negative_arr_ccfx')

Issue Severity

None

sumathysubramanian20 added the bug label Mar 3, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[<Library component: Model|Core|etc...>] ValueError: xreg is rank deficient #993

[<Library component: Model|Core|etc...>] ValueError: xreg is rank deficient #993

sumathysubramanian20 commented Mar 3, 2025

sf = StatsForecast(models = models, freq = 'W', n_jobs = -1)

[<Library component: Model|Core|etc...>] ValueError: xreg is rank deficient #993

[<Library component: Model|Core|etc...>] ValueError: xreg is rank deficient #993

Comments

sumathysubramanian20 commented Mar 3, 2025

What happened + What you expected to happen

Versions / Dependencies

sf = StatsForecast(models = models, freq = 'W', n_jobs = -1)

Reproducible example

Issue Severity