You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
My input dataframe to the statsforecast model has the following columns:
'unique_id', 'ds', 'negative_arr_ccfx', 'monthly_arr_for_renewal', 'annual_arr_for_renewal', 'monthly_plan_asp', 'annual_plan_asp', 'entry_date_week_of_month_eq2', 'entry_date_week_of_month_eq3', 'entry_date_week_of_month_eq4', 'entry_date_week_of_month_eq5'
I am backtesting this model for the last 40 weeks since friday. OI was able to run this until friday and when i try to continue my backtesting process today, it gives me xreg is rank deficient error.
If i remove annual_plan_asp as an input the model runs. But the problem is that monthly_plan_asp and annual_plan_asp do not have redundant values. Ideally I would like to give both as inputs.
Please help me with this.
Versions / Dependencies
"""
Nixtla's forecast function with exogenous variables
Create train, test and forecast datasets to apply the nixtla's forecast model
"""
def exog_nixtla(df, y_var, n_weeks_to_test=10):
df = df.copy()
"""
Convert the 'ds column to date
Name the objective column 'y'. Nixtla wants this
"""
df["ds"] = df["ds"].dt.date
df = df.rename(columns={y_var: "y"})
"""
create 2 new dataframes:
records_with_objective_missing - where the 'y' variable is null
records_with_objective_present - everything from df that is not records_with_objective_missing
Then create 2 new dataframes with just unique 'ds':
dates_with_objective_missing - unique ds from records_with_objective_missing
Add logic to make sure that there are no missing values in the past...
And if there are, then return a value error to make this assumption clear to the user.
"""
"""
Further create 4 dataframes which just consist of ds which will later be used to create train, test amd forecast dataset
1. dates - all the unique ds where 'y' variable is present (dates_with_objective_present created above)
2. dtrain - all the unique ds in dates created above minus the weeks to test
3. dtest - all the unique ds in dates created above for only the weeks to test
4. dfcst - all the unique ds where 'y' variable is absent (dates_with_objective_missing created above)
Test the model on the final n_weeks_to_test weeks
Train the model on the first n weeks where n + n_weeks_to_test = total number of weeks
"""
dates = dates_with_objective_present
dtrain = dates[:-n_weeks_to_test]
dtest = dates[-n_weeks_to_test:]
dfcst = dates_with_objective_missing
"""
h is the forecast window
h_test is the forecast window for the test period
"""
h = len(dfcst)
h_test = len(dtest)
Y_ts = df.copy()
X_ts = df.copy()
"""
Y_ts is a copy of original dataframe with only unique_id, ds and y columns
X_ts is a copy of original dataframe with all columns except y column
Y_train and X_train are dataframes where Y_ts and X_ts are filtered for ds in dtrain
Y_test and X_test are dataframes where Y_ts and X_ts are filtered for ds in dtest
Y_fcst is a dataframe where Y_ts is filtered for ds in dates_with_objective_present
X_fcst is a dataframe where X_ts is filtered for ds in dates_with_objective_missing
train is a dataframe where Y_fcst is merged with X_ts on unique_id and ds
"""
Y_ts = Y_ts[["unique_id", "ds", "y"]]
X_ts = X_ts.drop(labels=["y"], axis="columns")
Y_train = Y_ts.query("ds in @DTRAIN")
Y_test = Y_ts.query("ds in @dtest")
X_train = X_ts.query("ds in @DTRAIN")
X_test = X_ts.query("ds in @dtest")
Y_fcst = Y_ts.query("ds in @dates_with_objective_present")
X_fcst = X_ts.query("ds in @dates_with_objective_missing")
X_fcst["ds"] = X_fcst.ds.astype('datetime64[ns]')
StatsForecast is a Nixtla function that helps us forecast 'y' variable based on models and frequency specified
Forecast method trains on the training dataset and uses the exogenous variables for the training dataset and fcst dataset to forecast 'y' variable for future weeks based on 95% confidence interval
"""
models = [AutoARIMA()]
What happened + What you expected to happen
My input dataframe to the statsforecast model has the following columns:
'unique_id', 'ds', 'negative_arr_ccfx', 'monthly_arr_for_renewal', 'annual_arr_for_renewal', 'monthly_plan_asp', 'annual_plan_asp', 'entry_date_week_of_month_eq2', 'entry_date_week_of_month_eq3', 'entry_date_week_of_month_eq4', 'entry_date_week_of_month_eq5'
sf = StatsForecast(models=models, freq="W-WED", n_jobs=-1)
level = [95]
fcst = sf.forecast(df=train, h=h, X_df=X_fcst, level=level)
I am backtesting this model for the last 40 weeks since friday. OI was able to run this until friday and when i try to continue my backtesting process today, it gives me xreg is rank deficient error.
If i remove annual_plan_asp as an input the model runs. But the problem is that monthly_plan_asp and annual_plan_asp do not have redundant values. Ideally I would like to give both as inputs.
Please help me with this.
Versions / Dependencies
"""
Nixtla's forecast function with exogenous variables
Create train, test and forecast datasets to apply the nixtla's forecast model
"""
def exog_nixtla(df, y_var, n_weeks_to_test=10):
df = df.copy()
"""
"""
df["ds"] = df["ds"].dt.date
df = df.rename(columns={y_var: "y"})
"""
"""
records_with_objective_missing = df["y"].isnull()
records_with_objective_present = ~records_with_objective_missing
dates_with_objective_missing = np.sort(
df[records_with_objective_missing]["ds"].unique()
)
dates_with_objective_present = np.sort(
df[records_with_objective_present]["ds"].unique()
)
"""
Next steps:
"""
"""
Further create 4 dataframes which just consist of ds which will later be used to create train, test amd forecast dataset
1. dates - all the unique ds where 'y' variable is present (dates_with_objective_present created above)
2. dtrain - all the unique ds in dates created above minus the weeks to test
3. dtest - all the unique ds in dates created above for only the weeks to test
4. dfcst - all the unique ds where 'y' variable is absent (dates_with_objective_missing created above)
"""
dates = dates_with_objective_present
dtrain = dates[:-n_weeks_to_test]
dtest = dates[-n_weeks_to_test:]
dfcst = dates_with_objective_missing
"""
"""
h = len(dfcst)
h_test = len(dtest)
Y_ts = df.copy()
X_ts = df.copy()
"""
"""
Y_ts = Y_ts[["unique_id", "ds", "y"]]
X_ts = X_ts.drop(labels=["y"], axis="columns")
Y_train = Y_ts.query("ds in @DTRAIN")
Y_test = Y_ts.query("ds in @dtest")
X_train = X_ts.query("ds in @DTRAIN")
X_test = X_ts.query("ds in @dtest")
Y_fcst = Y_ts.query("ds in @dates_with_objective_present")
X_fcst = X_ts.query("ds in @dates_with_objective_missing")
X_fcst["ds"] = X_fcst.ds.astype('datetime64[ns]')
train = Y_fcst.merge(X_ts, how="left", on=["unique_id", "ds"])
train["ds"] = train.ds.astype('datetime64[ns]')
train["y"] = train.y.astype(float)
print(train.shape)
"""
"""
models = [AutoARIMA()]
sf = StatsForecast(models = models, freq = 'W', n_jobs = -1)
sf = StatsForecast(models=models, freq="W-WED", n_jobs=-1)
level = [95]
fcst = sf.forecast(df=train, h=h, X_df=X_fcst, level=level)
fcst = fcst.reset_index()
return fcst
Reproducible example
# paste your code here
model = exog_nixtla(df = input_df, y_var = 'negative_arr_ccfx')
Issue Severity
None
The text was updated successfully, but these errors were encountered: