Improvements for Scalers applied on multiple series #1288

maximilianreimer · 2022-10-13T12:33:40Z

Describe the bug
If a Scaler is fitted on n_fitted sequences at once. It will always only returns n_fitted Sequences but not the number of inputted sequences.

To Reproduce

     scaler = Scaler()
    fitted_n = 2
    predicted_n = 3
    s = TimeSeries.from_times_and_values(pd.date_range("2022-01-01", "2022-01-10"), range(10))

    scaler.fit_transform([s]*fitted_n)

    ss_scaled = scaler.transform([s]*predicted_n)
    ss_scaled_inverted = scaler.inverse_transform(ss_scaled)
    ss_inverted = scaler.inverse_transform([s]*predicted_n)

    assert len(ss_scaled) == predicted_n # fails == fitted_n
    assert len(ss_scaled_inverted) == predicted_n # fails == fitted_n
    assert  len(ss_inverted) == predicted_n # fails == fitted_n

Expected behavior
Should scale all series independently and return the same number as inputed.

System (please complete the following information):

Python version: 3.7
darts version: 0.21.

The text was updated successfully, but these errors were encountered:

maximilianreimer · 2022-10-13T12:35:54Z

Ok after some reading I think this might be intent behavior.

If so I have a question / Suggestion:

As I understand it now, if multiple TimeSeries are passed to fit a FittableDataTransformer (the Scaler is one) it will effectively create multiple FittableDataTransformer for each position of the sequence. On transform the "sub-FittableDataTransformer" are applied based on the position of the TimeSeries in the Sequence.

This makes it not possible to run the FittableDataTransformer if only one TimeSeries is available (e.g. when a mutli-series Model like TFT is used in production). I would suggest selecting the "sub-FittableDataTransformer" based on the static_covariates or switch from a Sequence to a Mappable as input.

And please as a hot-fix add a warning if the length of the sequence of Series to transform is different than expected. It took me ages to figure that out. The current behavior is to just omit the rest if the Sequence is longer.

dennisbader · 2022-10-14T07:19:35Z

Hi @maximilianreimer, thanks for writing.

You are totally right. Our data transformers expect to receive the same input dimensions (and same order of list of time series including their components) for fitting and transformation.

We should definitely raise a warning (or even an exception?) if there is a mismatch in dimensions.

I don't quite follow what the issue is with using transformers in production? Can't you fit transform a new transformer only on the series available?

Regarding "sub-FittableDataTransformer":

we can't rely on static covariates because not all time series have static covariates
the mappable could be interesting, what do you think @hrzn ?

maximilianreimer · 2022-10-14T07:49:22Z

Regarding the production issue: Lets say I have n areas I want to forecast electricity prices for. On training time I have a sequence of time series of the prices I pass through my target pipeline and model but in production I might have the request to just predict a specific time series. Model-wise that's not a problem, but how to use the Scaler in this instance?

# Training time
# each with static covaraites to that the TFTModel can learn to predict differently for different areas
train_series = [
 sereis_area_1,  
 sereis_area_2,
...
 sereis_area_n 
]

target_pipeline = Scaler()
model = TFTModel()

train_transformed = target_pipeline.fit_transform(train_series)
model.fit(train_transformed)

# In production time
# Request: predict n time steps for area 5 for the next week
historical_data_area_5 : TimeSeries = ...

# I would like to run
pred_transformed = target_pipeline.transform(historical_data_area_5) # wont work with just one series
predicted = model.predict(7, pred_transformed)
predicted_rescaled = target_pipeline.invers_transform(predicted)  # wont work with just one series

dennisbader · 2022-10-14T08:00:36Z

Do you have all historical data for the specific series at prediction time? If so, then you can fit/transform with a new scaler just on this single series as you did before training.

maximilianreimer · 2022-10-14T08:03:05Z

So you are suggesting to train different Scaler for each Series? Or one joint for training and afterwards individual ones for prediction time?

dennisbader · 2022-10-14T08:31:17Z

One joint Scaler for training (which should come with a performance boost compared to multiple single Scalers) and afterwards an individual one.
If you have the historical data of the series of interest at prediction time:

you split the series at the same time step that you used for training
you fit a new Scaler on the left side of the split -> like this you get the same transform() output as with the joint Scaler
you can transform any parts of the series with this scaler and use it for prediction

cristof-r · 2022-10-18T09:01:14Z

What approach would you recommend if we don't have the complete historical data (e.g., only the necessary data for input_chunk_length) at prediction time?

hrzn · 2022-10-30T13:33:56Z

Hi @maximilianreimer, thanks for writing.

You are totally right. Our data transformers expect to receive the same input dimensions (and same order of list of time series including their components) for fitting and transformation.

We should definitely raise a warning (or even an exception?) if there is a mismatch in dimensions.

I don't quite follow what the issue is with using transformers in production? Can't you fit transform a new transformer only on the series available?

Regarding "sub-FittableDataTransformer":

we can't rely on static covariates because not all time series have static covariates

the mappable could be interesting, what do you think @hrzn ?

+1 for raising an exception if the number doesn't match, that's a good point.
Supporting a mappable could be a good idea too. We should still support sequences as well though, so it should come extra. Would you be interested to contribute @maximilianreimer ? Even just raising an exception would be a first step, we would be happy to receive a PR.

madtoinou · 2023-03-22T16:36:31Z

This is solved by #1409, which implements a mappable to match the fit() and the transform() series and raises an error in case of mismatch code snippet

maximilianreimer added bug Something isn't working triage Issue waiting for triaging labels Oct 13, 2022

dennisbader mentioned this issue Oct 18, 2022

Make transformers show warning or exception when there is mismatch between number of series used for fit() and and transform() #1300

Closed

hrzn changed the title ~~[BUG] Scalar always only return number of Sequence used during fitting~~ Improvements for Scalers applied on multiple series Oct 30, 2022

hrzn added good first issue Good for newcomers improvement New feature or improvement and removed bug Something isn't working triage Issue waiting for triaging labels Oct 30, 2022

madtoinou mentioned this issue Jan 23, 2023

Feat/improved data-transformers #1508

Closed

madtoinou closed this as completed Mar 22, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improvements for Scalers applied on multiple series #1288

Improvements for Scalers applied on multiple series #1288

maximilianreimer commented Oct 13, 2022

maximilianreimer commented Oct 13, 2022 •

edited

Loading

dennisbader commented Oct 14, 2022

maximilianreimer commented Oct 14, 2022

dennisbader commented Oct 14, 2022

maximilianreimer commented Oct 14, 2022 •

edited

Loading

dennisbader commented Oct 14, 2022

cristof-r commented Oct 18, 2022

hrzn commented Oct 30, 2022

madtoinou commented Mar 22, 2023 •

edited

Loading

Improvements for Scalers applied on multiple series #1288

Improvements for Scalers applied on multiple series #1288

Comments

maximilianreimer commented Oct 13, 2022

maximilianreimer commented Oct 13, 2022 • edited Loading

dennisbader commented Oct 14, 2022

maximilianreimer commented Oct 14, 2022

dennisbader commented Oct 14, 2022

maximilianreimer commented Oct 14, 2022 • edited Loading

dennisbader commented Oct 14, 2022

cristof-r commented Oct 18, 2022

hrzn commented Oct 30, 2022

madtoinou commented Mar 22, 2023 • edited Loading

maximilianreimer commented Oct 13, 2022 •

edited

Loading

maximilianreimer commented Oct 14, 2022 •

edited

Loading

madtoinou commented Mar 22, 2023 •

edited

Loading