-
Notifications
You must be signed in to change notification settings - Fork 917
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactorised tabularisation + Jupyter notebook w/ experiments. #1399
Changes from 33 commits
52bee74
748caf5
83d78b1
9356cd7
85011fe
79b4fe7
c64b330
ab48acf
ededa0a
9e954db
ab548c5
fd3e5ce
8be63d1
5b25175
9800cbd
6298bb1
9da4ade
4c1313f
01a6b14
479ba8d
7fa3eea
2766c68
6bda371
722765d
9baa820
cadb51c
7002447
e38e240
a4ee267
ca54b13
8dab315
7845c60
4c7c163
557d100
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -36,7 +36,7 @@ | |
from darts.logging import get_logger, raise_if, raise_if_not, raise_log | ||
from darts.models.forecasting.forecasting_model import GlobalForecastingModel | ||
from darts.timeseries import TimeSeries | ||
from darts.utils.data.tabularization import _add_static_covariates, _create_lagged_data | ||
from darts.utils.data.tabularization import create_lagged_training_data | ||
from darts.utils.multioutput import MultiOutputRegressor | ||
from darts.utils.utils import _check_quantiles, seq2series, series2seq | ||
|
||
|
@@ -324,7 +324,7 @@ def _create_lagged_data( | |
lags_past_covariates = self.lags.get("past") | ||
lags_future_covariates = self.lags.get("future") | ||
|
||
training_samples, training_labels, _ = _create_lagged_data( | ||
features, labels, _ = create_lagged_training_data( | ||
target_series=target_series, | ||
output_chunk_length=self.output_chunk_length, | ||
past_covariates=past_covariates, | ||
|
@@ -334,20 +334,111 @@ def _create_lagged_data( | |
lags_future_covariates=lags_future_covariates, | ||
max_samples_per_ts=max_samples_per_ts, | ||
multi_models=self.multi_models, | ||
check_inputs=False, | ||
concatenate=False, | ||
) | ||
|
||
training_samples = _add_static_covariates( | ||
self, | ||
training_samples, | ||
for i, (X_i, y_i) in enumerate(zip(features, labels)): | ||
features[i] = X_i[:, :, 0] | ||
labels[i] = y_i[:, :, 0] | ||
|
||
features = self._add_static_covariates( | ||
features, | ||
target_series, | ||
*self.extreme_lags, | ||
past_covariates=past_covariates, | ||
future_covariates=future_covariates, | ||
max_samples_per_ts=max_samples_per_ts, | ||
) | ||
|
||
training_samples = np.concatenate(features, axis=0) | ||
training_labels = np.concatenate(labels, axis=0) | ||
|
||
return training_samples, training_labels | ||
|
||
def _add_static_covariates( | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'm still wondering whether this shouldn't belong to There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Hmmm - definitely seeing where you're coming from. I suppose my main hesitation at the moment is that the process of computing the static covariates seems to depend a lot on the attributes of Is there somehow a way to 'decouple' computing the actual static covariates values from the |
||
self, | ||
features: Union[np.array, Sequence[np.array]], | ||
target_series: Union[TimeSeries, Sequence[TimeSeries]], | ||
) -> Union[np.array, Sequence[np.array]]: | ||
""" | ||
Add static covariates to the features' table for RegressionModels. | ||
Accounts for series with potentially different static covariates by padding with 0 to accomodate for the maximum | ||
number of available static_covariates in any of the given series in the sequence. | ||
|
||
If no static covariates are provided for a given series, its corresponding features are padded with 0. | ||
Accounts for the case where the model is trained with series with static covariates and then used to predict | ||
on series without static covariates by padding with 0 the corresponding features of the series without | ||
static covariates. | ||
|
||
Parameters | ||
---------- | ||
features | ||
The features' numpy array(s) to which the static covariates will be added. Can either be a lone feature | ||
matrix or a `Sequence` of feature matrices; in the latter case, static covariates will be appended to | ||
each feature matrix in this `Sequence`. | ||
target_series | ||
The target series from which to read the static covariates. | ||
|
||
Returns | ||
------- | ||
features | ||
The features' array(s) with appended static covariates columns. If the `features` input was passed as a | ||
`Sequence` of `np.array`s, then a `Sequence` is also returned; if `features` was passed as an `np.array`, | ||
a `np.array` is returned. | ||
""" | ||
|
||
input_not_list = not isinstance(features, Sequence) | ||
if input_not_list: | ||
features = [features] | ||
target_series = series2seq(target_series) | ||
# collect static covariates info | ||
scovs_map = { | ||
"covs_exist": False, | ||
"vals": [], # Stores values of static cov arrays in each timeseries | ||
"sizes": {}, # Stores sizes of static cov arrays in each timeseries | ||
} | ||
for ts in target_series: | ||
if ts.has_static_covariates: | ||
scovs_map["covs_exist"] = True | ||
# Each static covariate either adds 1 extra columns or | ||
# `n_component` extra columns: | ||
vals_i = {} | ||
for name, row in ts.static_covariates.items(): | ||
vals_i[name] = row | ||
scovs_map["sizes"][name] = row.size | ||
scovs_map["vals"].append(vals_i) | ||
else: | ||
scovs_map["vals"].append(None) | ||
|
||
if ( | ||
not scovs_map["covs_exist"] | ||
and hasattr(self.model, "n_features_in_") | ||
mabilton marked this conversation as resolved.
Show resolved
Hide resolved
|
||
and (self.model.n_features_in_ is not None) | ||
and (self.model.n_features_in_ > features[0].shape[1]) | ||
): | ||
# for when series in prediction do not have static covariates but some of the training series did | ||
num_static_components = self.model.n_features_in_ - features[0].shape[1] | ||
for i, features_i in enumerate(features): | ||
padding = np.zeros((features_i.shape[0], num_static_components)) | ||
features[i] = np.hstack([features_i, padding]) | ||
elif scovs_map["covs_exist"]: | ||
scov_width = sum(scovs_map["sizes"].values()) | ||
for i, features_i in enumerate(features): | ||
vals = scovs_map["vals"][i] | ||
if vals: | ||
scov_arrays = [] | ||
for name, size in scovs_map["sizes"].items(): | ||
mabilton marked this conversation as resolved.
Show resolved
Hide resolved
|
||
scov_arrays.append( | ||
vals[name] if name in vals else np.zeros((size,)) | ||
) | ||
scov_array = np.concatenate(scov_arrays) | ||
scovs = np.broadcast_to( | ||
scov_array, (features_i.shape[0], scov_width) | ||
) | ||
else: | ||
scovs = np.zeros((features_i.shape[0], scov_width)) | ||
features[i] = np.hstack([features_i, scovs]) | ||
if input_not_list: | ||
features = features[0] | ||
return features | ||
|
||
def _fit_model( | ||
self, | ||
target_series, | ||
|
@@ -362,7 +453,10 @@ def _fit_model( | |
""" | ||
|
||
training_samples, training_labels = self._create_lagged_data( | ||
target_series, past_covariates, future_covariates, max_samples_per_ts | ||
target_series, | ||
past_covariates, | ||
future_covariates, | ||
max_samples_per_ts, | ||
) | ||
|
||
# if training_labels is of shape (n_samples, 1) flatten it to shape (n_samples,) | ||
|
@@ -681,15 +775,13 @@ def predict( | |
|
||
# concatenate retrieved lags | ||
X = np.concatenate(np_X, axis=1) | ||
X = _add_static_covariates( | ||
self, | ||
X, | ||
series, | ||
*self.extreme_lags, | ||
past_covariates=past_covariates, | ||
future_covariates=future_covariates, | ||
max_samples_per_ts=1, | ||
) | ||
# Need to split up `X` into three equally-sized sub-blocks | ||
# corresponding to each timeseries in `series`, so that | ||
# static covariates can be added to each block; valid since | ||
# each block contains same number of observations: | ||
X_blocks = np.split(X, len(series), axis=0) | ||
mabilton marked this conversation as resolved.
Show resolved
Hide resolved
|
||
X_blocks = self._add_static_covariates(X_blocks, series) | ||
X = np.concatenate(X_blocks, axis=0) | ||
|
||
# X has shape (n_series * n_samples, n_regression_features) | ||
prediction = self._predict_and_sample(X, num_samples, **kwargs) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
wouldn't calling _add_static_covariates() better done inside the preceding for loop? in my opinion, this would help simplify the _add_static_covariates() internal logic since, if it only takes a specific target (from the input sequence) and its specific features, than one can avoid looping twice over the series inside the _add_static_covariates(), if I understood the implementation correctly. WDYT?
In the current _add_static_covariates() my assumption was that the function will receive all the features and the function should compute back everything it needs in terms of length and width of features, since I was expecting a change in the _create_lagged_data() outputs, but this might not be potentially relevant anymore with the changes you made.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would still be necessary though to go through all the series in the input sequence once, prior, to collect the static covariates information from all of them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Definitely - in my opinion, the static covariates would ideally be added inside of
create_lagged_data
after each 'block' has been formed, but that would probably be a bit clumsy to implement at the moment since the process of computing the static covariates requires then_features_in_
attribute of theRegressionModel
object. Perhaps something to think about for a future PR?