-
Notifications
You must be signed in to change notification settings - Fork 918
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature/four theta #123
Feature/four theta #123
Changes from 12 commits
01e1837
f6831be
53ceeab
7ae0803
4001009
264ddf3
e5bc790
e951554
f596476
b712c7d
dee5b0d
ba64351
36d99fe
1dc5bb3
4d24f75
240cea4
7d5dffa
b19cf97
bbd58cc
71aa8d3
8286062
af69e63
97f6956
97cff2a
f6df487
2bad4f7
49fafda
fe1b320
fc04e68
cae49f6
27b9143
2fa7f4f
5ed92bd
a2b64e4
98d1b7a
f405e8b
9cad7cf
1e6be8e
2f9bc5b
a405e47
25f3de2
e15f52b
61c0f17
e63e215
be1427d
34a8c41
d8e890f
e02d65d
8e92eba
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -80,7 +80,7 @@ def backtest_forecasting(series: TimeSeries, | |
|
||
# build the prediction times in advance (to be able to use tqdm) | ||
pred_times = [start] | ||
while pred_times[-1] <= last_pred_time: | ||
while pred_times[-1] < last_pred_time: | ||
pred_times.append(pred_times[-1] + series.freq()) | ||
|
||
# what we'll return | ||
|
@@ -313,15 +313,16 @@ def backtest_gridsearch(model_class: type, | |
|
||
Parameters | ||
---------- | ||
model | ||
model_class | ||
The ForecastingModel subclass to be tuned for 'series'. | ||
parameters | ||
A dictionary containing as keys hyperparameter names, and as values lists of values for the | ||
respective hyperparameter. | ||
train_series | ||
The univariate TimeSeries instance used for training (and also validation in split mode). | ||
test_series | ||
val_series | ||
The univariate TimeSeries instance used for validation in split mode. | ||
Use `train` to compare with model.fitted_values | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. What is There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
|
||
fcast_horizon_n | ||
The integer value of the forecasting horizon used in expanding window mode. | ||
num_predictions: | ||
|
@@ -360,6 +361,10 @@ def backtest_gridsearch(model_class: type, | |
if val_series is None: # expanding window mode | ||
backtest_forecast = backtest_forecasting(train_series, model, backtest_start_time, fcast_horizon_n) | ||
error = metric(backtest_forecast, train_series) | ||
elif val_series == 'train': | ||
model.fit(train_series) | ||
# Use ndarray because casting to TimeSeries takes too much time | ||
error = metric(model.fitted_values, train_series.univariate_values()) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. As far as I can tell, FourTheta is the only model that possesses the attribute There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. At least all ExponentialSmoothing have a fitted attribute that can be retrieved (So the current theta model could have one too, I think). If you approve this functionality, it might be interesting to add this attribute in other models. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. That doesn't respect the types. I think you should convert to |
||
else: # split mode | ||
model.fit(train_series) | ||
error = metric(model.predict(len(val_series)), val_series) | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -4,7 +4,7 @@ | |
""" | ||
|
||
import math | ||
from typing import Optional | ||
from typing import Optional, List | ||
|
||
import numpy as np | ||
import statsmodels.tsa.holtwinters as hw | ||
|
@@ -78,7 +78,8 @@ def fit(self, series: TimeSeries, component_index: Optional[int] = None): | |
self.is_seasonal, self.season_period = check_seasonality(ts, self.season_period, max_lag=max_lag) | ||
logger.info('Theta model inferred seasonality of training series: {}'.format(self.season_period)) | ||
else: | ||
self.is_seasonal = True # force the user-defined seasonality to be considered as a true seasonal period. | ||
# force the user-defined seasonality to be considered as a true seasonal period. | ||
self.is_seasonal = self.season_period > 1 | ||
|
||
new_ts = ts | ||
|
||
|
@@ -88,13 +89,13 @@ def fit(self, series: TimeSeries, component_index: Optional[int] = None): | |
new_ts = remove_seasonality(ts, self.season_period, model=self.mode) | ||
|
||
# SES part of the decomposition. | ||
self.model = hw.SimpleExpSmoothing(new_ts.values()).fit() | ||
self.model = hw.SimpleExpSmoothing(new_ts.values()).fit(initial_level=0.2) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. What is the effect of this change? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The problem with SES in statsmodels is that alpha is not constrained between [0.1, 0.99] as it should, thus giving NaN values when alpha is 0. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Seems like it leads to different results. I will add it only in the case alpha is 0, and it will recompute it |
||
|
||
# Linear Regression part of the decomposition. We select the degree one coefficient. | ||
b_theta = np.polyfit(np.array([i for i in range(0, self.length)]), (1.0 - self.theta) * new_ts.values(), 1)[0] | ||
|
||
# Normalization of the coefficient b_theta. | ||
self.coef = b_theta / (2.0 - self.theta) | ||
self.coef = b_theta / (2.0 - self.theta) # change to b_theta / (-self.theta) if classical theta | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Did you manage to figure out why is the formula written like this? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Mathematically, I'm still not quite sure about the rest, but I am sure it is correct for theta = 0 and 1. But comparing with the classical theta (4theta with default parameter), we have the exact same results if we change the normalization by -1/theta. (because there is a symmetry: 2-theta_1 = theta_2) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Could we unify the meaning of theta? E.g. by changing to |
||
|
||
self.alpha = self.model.params["smoothing_level"] | ||
|
||
|
@@ -126,3 +127,217 @@ def predict(self, n: int) -> 'TimeSeries': | |
|
||
def __str__(self): | ||
return 'Theta({})'.format(self.theta) | ||
|
||
|
||
class FourTheta(UnivariateForecastingModel): | ||
def __init__(self, | ||
theta: int = 0, | ||
seasonality_period: Optional[int] = None, | ||
model_mode: str = 'multiplicative', | ||
season_mode: str = 'multiplicative', | ||
trend_mode: str = 'linear'): | ||
""" | ||
An implementation of the 4Theta method with configurable `theta` parameter. | ||
|
||
See M4 competition `solution <https://github.com/Mcompetitions/M4-methods/blob/master/4Theta%20method.R>`_. | ||
|
||
The training time series is de-seasonalized according to `seasonality_period`, | ||
or an inferred seasonality period. | ||
|
||
This model is similar to Theta, with `theta` = 2-theta, `model_mode` = additive and `trend_mode` = linear. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This comment is important, but doesn't make it clear what is what. For instance, when reading only this sentence it's not clear whether FourTheta() is similar to Theta(trend_model="linear") or FourTheta(trend_mode="linear") = Theta(). You could reformulate, e.g. something like
|
||
|
||
Parameters | ||
---------- | ||
theta | ||
Value of the theta parameter. Defaults to 2. | ||
If theta = 1, then the theta method restricts to a simple exponential smoothing (SES) | ||
seasonality_period | ||
User-defined seasonality period. If not set, will be tentatively inferred from the training series upon | ||
calling `fit()`. | ||
model_mode | ||
Type of model combining the Theta lines. Either "additive" or "multiplicative". | ||
season_mode | ||
Type of seasonality. Either "additive" or "multiplicative". | ||
trend_mode | ||
Type of trend to fit. Either "linear" or "exponential". | ||
""" | ||
|
||
super().__init__() | ||
|
||
self.model = None | ||
self.drift = None | ||
self.coef = 1 | ||
self.mean = 1 | ||
self.length = 0 | ||
self.theta = theta | ||
self.is_seasonal = False | ||
self.seasonality = None | ||
self.season_period = seasonality_period | ||
self.model_mode = model_mode | ||
self.season_mode = season_mode | ||
self.trend_mode = trend_mode | ||
self.wses = 0 if self.theta == 0 else (1 / theta) | ||
self.wdrift = 1 - self.wses | ||
self.fitted_values = None | ||
|
||
# Remark on the values of the theta parameter: | ||
# - if theta = 1, then the theta method restricts to a simple exponential smoothing (SES) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. There is the same comment in the |
||
# - if theta = 0, then the theta method restricts to a simple `trend_mode` regression. | ||
|
||
def fit(self, ts, component_index: Optional[int] = None): | ||
super().fit(ts, component_index) | ||
|
||
self.length = len(ts) | ||
# normalization of data | ||
self.mean = ts.mean() | ||
new_ts = ts / self.mean | ||
|
||
# Check for statistical significance of user-defined season period | ||
# or infers season_period from the TimeSeries itself. | ||
if self.season_period is None: | ||
max_lag = len(ts) // 2 | ||
self.is_seasonal, self.season_period = check_seasonality(ts, self.season_period, max_lag=max_lag) | ||
logger.info('Theta model inferred seasonality of training series: {}'.format(self.season_period)) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
|
||
else: | ||
# force the user-defined seasonality to be considered as a true seasonal period. | ||
self.is_seasonal = self.season_period > 1 | ||
|
||
# Store and remove seasonality effect if there is any. | ||
if self.is_seasonal: | ||
_, self.seasonality = extract_trend_and_seasonality(new_ts, self.season_period, model=self.season_mode) | ||
new_ts = remove_seasonality(new_ts, self.season_period, model=self.season_mode) | ||
|
||
if (new_ts <= 0).values.any(): | ||
self.model_mode = 'additive' | ||
self.trend_mode = 'linear' | ||
logger.warn("Time series has negative values. Fallback to additive and linear model") | ||
|
||
# Drift part of the decomposition | ||
if self.trend_mode == 'linear': | ||
linreg = new_ts.values() | ||
elif self.trend_mode == 'exponential': | ||
linreg = np.log(new_ts.values()) | ||
else: | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Will be less code and more coherent with other implementations if we check for either linear or exponential with assert and throw an exception if none of the values is selected. |
||
self.trend_mode = 'linear' | ||
linreg = new_ts.values() | ||
logger.warn("Unknown value for trend. Fallback to linear.") | ||
self.drift = np.poly1d(np.polyfit(np.arange(self.length), linreg, 1)) | ||
theta0_in = self.drift(np.arange(self.length)) | ||
if self.trend_mode == 'exponential': | ||
theta0_in = np.exp(theta0_in) | ||
|
||
if self.model_mode == 'additive': | ||
theta_t = self.theta * new_ts.values() + (1 - self.theta) * theta0_in | ||
elif self.model_mode == 'multiplicative' and (theta0_in > 0).all(): | ||
theta_t = (new_ts.values() ** self.theta) * (theta0_in ** (1 - self.theta)) | ||
else: | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Similar comment as before, we can assert at the beginning and throw an exception There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. For a first draft, I blindly replicated the step of the original alorithm. So I will change it. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. There should be a more optimized implementation, like our theta. But we need to do the math to find it |
||
self.model_mode = 'additive' | ||
theta_t = self.theta * new_ts.values() + (1 - self.theta) * theta0_in | ||
logger.warn("Negative Theta line. Fallback to additive model") | ||
|
||
# SES part of the decomposition. | ||
self.model = hw.SimpleExpSmoothing(theta_t).fit() | ||
theta2_in = self.model.fittedvalues | ||
|
||
if self.model_mode == 'additive': | ||
self.fitted_values = self.wses * theta2_in + self.wdrift * theta0_in | ||
elif self.model_mode == 'multiplicative' and (theta2_in > 0).all(): | ||
self.fitted_values = theta2_in**self.wses * theta0_in**self.wdrift | ||
else: | ||
# Fallback to additive model | ||
self.model_mode = 'additive' | ||
theta_t = self.theta * new_ts.values() + (1 - self.theta) * theta0_in | ||
self.model = hw.SimpleExpSmoothing(theta_t).fit() | ||
theta2_in = self.model.fittedvalues | ||
self.fitted_values = self.wses * theta2_in + self.wdrift * theta0_in | ||
logger.warn("Negative Theta line. Fallback to additive model") | ||
|
||
if self.is_seasonal: | ||
if self.season_mode == 'additive': | ||
self.fitted_values += self.seasonality.values() | ||
elif self.season_mode == 'multiplicative': | ||
self.fitted_values *= self.seasonality.values() | ||
self.fitted_values *= self.mean | ||
# takes too much time to create a time series for fitted_values | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. what's the overhead? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Also, could you add a comment explaining what |
||
# self.fitted_values = TimeSeries.from_times_and_values(ts.time_index(), self.fitted_values) | ||
|
||
def predict(self, n: int) -> 'TimeSeries': | ||
super().predict(n) | ||
|
||
# Forecast of the SES part. | ||
forecast = self.model.forecast(n) | ||
|
||
# Forecast of the Linear Regression part. | ||
drift = self.drift(np.arange(self.length, self.length + n)) | ||
if self.trend_mode == 'exponential': | ||
drift = np.exp(drift) | ||
|
||
if self.model_mode == 'additive': | ||
forecast = self.wses * forecast + self.wdrift * drift | ||
elif self.model_mode == 'multiplicative': | ||
forecast = forecast**self.wses * drift**self.wdrift | ||
else: | ||
raise_log(ValueError("model_mode cannot be {}".format(self.model_mode))) | ||
|
||
# Re-apply the seasonal trend of the TimeSeries | ||
if self.is_seasonal: | ||
|
||
replicated_seasonality = np.tile(self.seasonality.pd_series()[-self.season_period:], | ||
math.ceil(n / self.season_period))[:n] | ||
if self.season_mode in ['multiplicative', 'mul']: | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. for simplicity I would also pick one of them, and also an ENUM that we pass here might be helpful to avoid typos https://docs.python.org/3/library/enum.html There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. To be sure I understand correctly, the different model arguments will accept an Enum value? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes this comment actually applies to both theta methods - instead of retyping "multiplicative"/"additive" string we can have an enum that would have 2 values - MULTIPLICATIVE or ADDITIVE and if you make a typo the IDE will tell you something is not matching (not the case for raw strings) |
||
forecast *= replicated_seasonality | ||
elif self.season_mode in ['additive', 'add']: | ||
forecast += replicated_seasonality | ||
else: | ||
raise_log(ValueError("season_mode cannot be {}".format(self.season_mode))) | ||
|
||
forecast *= self.mean | ||
|
||
return self._build_forecast_series(forecast) | ||
|
||
@staticmethod | ||
def select_best_model(ts: TimeSeries, thetas: List[int] = None, m: Optional[int] = None): | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Could you add type annotation for the return type? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Also, since you plan for the case There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Sorry, I forgot about that. I will change it There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We will have to think where to put this method. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
|
||
""" | ||
Performs a grid search over all hyper parameters to select the best model. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Perhaps mention that it is using the fitted values on the training series here. |
||
|
||
Parameters | ||
---------- | ||
ts | ||
The TimeSeries on which the model will be tested. | ||
thetas | ||
A list of thetas to loop over. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Perhaps mention the default values here. |
||
m | ||
Optionally, the season used to decompose the time series. | ||
Returns | ||
------- | ||
theta | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Here you should have a type |
||
The best performing model on the time series. | ||
""" | ||
# Only import if needed | ||
from ..backtesting.backtesting import backtest_gridsearch | ||
from sklearn.metrics import mean_absolute_error as mae | ||
if thetas is None: | ||
thetas = [1, 2, 3] | ||
elif isinstance(thetas, int): | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I would remove this case, that's abusing the types, better let it crash IMO. |
||
thetas = [thetas] | ||
season_mode = ["additive", "multiplicative"] | ||
model_mode = ["additive", "multiplicative"] | ||
drift_mode = ["linear", "exponential"] | ||
if (ts.values() <= 0).any(): | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Perhaps add a log message if this happens? |
||
drift_mode = ["linear"] | ||
model_mode = ["additive"] | ||
season_mode = ["additive"] | ||
|
||
theta = backtest_gridsearch(FourTheta, | ||
{"theta": thetas, | ||
"model_mode": model_mode, | ||
"season_mode": season_mode, | ||
"trend_mode": drift_mode, | ||
"seasonality_period": [m] | ||
}, | ||
ts, val_series='train', metric=mae) | ||
return theta | ||
|
||
def __str__(self): | ||
return '4Theta(theta:{}, curve:{}, model:{}, seasonality:{})'.format(self.theta, self.trend_mode, | ||
self.model_mode, self.season_mode) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I let this slip through, thanks!