Refactor/data_transformers #1409

mabilton · 2022-12-04T22:58:46Z

Summary

This PR makes two major changes to how data transformers work:

Fixed parameters and fitted parameters (if any) are now 'automatically' passed to ts_transform, ts_inverse_transform, and ts_fit by default, without the user having to re-implement _transform_iterator, _inverse_transform_iterator, or _fit_iterator. To accommodate this, each of these ts_* methods now accepts a params (dictionary) argument, where params['fixed'] stores the fixed parameters of the transformation (defined to be all those attributes defined in the child-most class before calling super().__init__) and params['fitted'] stores the fitted parameters (i.e. what ts_fit returned).
component_mask key word arguments will be automatically applied to timeseries inputs given to ts_* and automatically 'unapplied' to timeseries outputs returned by these methods, which means that users don't have to worry about 'manually' dealing with these arguments inside of their implemented ts_* methods. If the user does not wish for component_masks to be automatically applied, they may specify mask_components=False when calling super().__init__; this will cause any component_mask key word argument to be passed via kwargs to the called method (i.e. current behaviour).

To see how these changes can help simplify the work involved in implementing a new transformation, compare the current implementation of BoxCox and with the BoxCox implementation in this PR.

Other Information

Some other minor changes that come with this PR:

I split up the _reshape_in method into two new methods (apply_component_mask and stack_samples), as well as _reshape_out into two new methods (unapply_component_mask and unstack_samples). There are two reasons for this change:
- _reshape_in and _reshape_out were responsible for applying two distinctly different changes to the data: masking component columns and stacking the samples of each component along a single axis. From a user interaction and maintainability perspective, I think it's much cleaner to have these two pieces of functionality separated from one another. The names _reshape_in and _reshape_out are, in my opinion, also a bit vague.
- In the original implementation of _reshape_in/_reshape_out, the 'stacking step' was performed using a for loop; I've changed this so that only np.swapaxes and np.reshape operations are used , which theoretically should speed things up a bit.
- I've made these new methods explicitly public, so that it's explicitly clear to new users looking to write their own transformations that it's "okay" to call these functions.
I've refactored all of the currently implemented transformations so that they conform with these changes, with the exception of StaticCovariatesTransformer, which directly overrides the fit, inverse_transform and transform methods anyways. Similarly, I also had to make some minor adjustments to existing tests.
I've implemented new tests for the refactored BaseDataTransformer , FittableDataTransformer, and InvertableDataTransformer classes.
Some transformations, such as BoxCox, allow for different fixed parameter values to be distributed over different parallel jobs. To facillitate this, I've added a parallel_params argument to the *Transformer classes, which allows the user to specify which parameters should take different values for different parallel jobs.

There are two drawbacks to what I've done here:

Some of these changes are slightly breaking; with that being said, I doubt many users currently have their own private code bases with their own custom-implemented darts transformations (although I could be wrong).

One 'gotcha' with this proposed interface is that the user must only call super().__init__ after initialising the fixed parameters of their transformation. For example, the following will allow the user to access '_my_param' in params['fixed']:

class MyTransform(BaseDataTransformer):
    def __init__(self):
           # Correct: Define fixed parameter *before* calling `super().__init__`
           self._my_param = 1
           super().__init__()

whereas the following will not:

class MyTransform(BaseDataTransformer):
    def __init__(self):
           # Incorrect: Define fixed parameter *after* calling `super().__init__`
           super().__init__()
           self._my_param = 1

Any thoughts/comments on these changes are more than welcome.

Cheers,
Matt.

…var`.

codecov-commenter · 2022-12-04T23:24:13Z

Codecov Report

Patch coverage: 98.04% and no project coverage change

Comparison is base (d2ba591) 94.05% compared to head (93dc878) 94.06%.

📣 This organization is not using Codecov’s GitHub App Integration. We recommend you install it so Codecov can continue to function properly for your repositories. Learn more

Additional details and impacted files

@@           Coverage Diff           @@
##           master    #1409   +/-   ##
=======================================
  Coverage   94.05%   94.06%           
=======================================
  Files         125      125           
  Lines       11185    11248   +63     
=======================================
+ Hits        10520    10580   +60     
- Misses        665      668    +3

Impacted Files	Coverage Δ
darts/dataprocessing/transformers/boxcox.py	`95.91% <90.90%> (-4.09%)`	⬇️
...taprocessing/transformers/base_data_transformer.py	`96.55% <96.20%> (-0.23%)`	⬇️
...sing/transformers/static_covariates_transformer.py	`99.30% <98.95%> (+0.85%)`	⬆️
darts/dataprocessing/transformers/diff.py	`100.00% <100.00%> (ø)`
...ocessing/transformers/fittable_data_transformer.py	`98.36% <100.00%> (+1.69%)`	⬆️
...essing/transformers/invertible_data_transformer.py	`97.14% <100.00%> (+0.98%)`	⬆️
darts/dataprocessing/transformers/mappers.py	`100.00% <100.00%> (ø)`
...taprocessing/transformers/missing_values_filler.py	`100.00% <100.00%> (ø)`
...arts/dataprocessing/transformers/reconciliation.py	`99.04% <100.00%> (-0.04%)`	⬇️
darts/dataprocessing/transformers/scaler.py	`97.29% <100.00%> (-0.27%)`	⬇️
... and 8 more

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report at Codecov.
📢 Do you have feedback about the report comment? Let us know in this issue.

mabilton · 2022-12-05T03:57:31Z

So I see that the following line in cell [40] within the 00-quickstart.ipynb notebook is throwing an error:

# scale back:
pred_air = scaler.inverse_transform(pred_air)

From what I can see, scaler is fitted to two different timeseries in cell [31]:

scaler = Scaler()
train_air_scaled, train_milk_scaled = scaler.fit_transform([train_air, train_milk])

What's the intended behaviour for a data transformation which is trained on two series but is given only a single series to inverse transform? I was under the impression that an error should be thrown in such cases.

…var`.

…our.

mabilton · 2022-12-14T09:11:12Z

Hi there.

This is just a quick update from me: I've fixed the *Transformer classes so that they correctly handle being passed fewer timeseries than the number of timeseries they were trained on; all checks are now passing as a result. I've also added explicit tests for this behaviour.

Once again, any comments or suggestions on what I've done here are more than welcome : ) .

Cheers,
Matt.

hrzn · 2022-12-16T20:30:08Z

Thanks a lot @mabilton ! We just need a bit more time to get to it and review :)

madtoinou

LGTM, I wrote some minor comments about details.

Thank you for this PR that considerably simplify the API of the data transformers and will certainly solve many questions/issues related to the data transformers!

darts/dataprocessing/transformers/boxcox.py

darts/dataprocessing/transformers/diff.py

darts/dataprocessing/transformers/fittable_data_transformer.py

mabilton · 2023-02-15T03:11:09Z

Hey @madtoinou - thanks for all the useful feedback. Just letting you know that I'm pretty busy in my personal life at the moment, so it'll probably take a day or two to implement your suggestions. Apologies for the delay.

Cheers,
Matt.

hrzn

That looks really good to me. We can almost merge it as such IMO :) Only got a few pretty minor comments. Nice job @mabilton! And sorry for the super late review...

darts/dataprocessing/transformers/base_data_transformer.py

darts/dataprocessing/transformers/fittable_data_transformer.py

darts/dataprocessing/transformers/window_transformer.py

darts/tests/dataprocessing/transformers/test_fittable_data_transformer.py

…iterator` method unused.

…ument explicit.

…ton/darts into refactor/data_transformers

mabilton · 2023-03-04T02:28:40Z

Hey @hrzn , @madtoinou.

I've (finally) found some time to implement your suggestions in addition to the following changes:

I've added more detail to the docstrings of the transformer classes/methods.
I've added a 'global_fit' option to the FittableDataTransformer. When global_fit = True, the entire Sequence of TimeSeries passed to the fit method will be passed to ts_fit, thereby allowing the user to fit a single set of parameters to all of the (disjoint) TimeSeries provided by the user, as opposed to independently fitting a different set of parameters to each TimeSeries. At this point in time, only the BoxCox, Scaler, and StaticCovariatesTransformer (can) use this option. Indeed, my motivation for introducing this global_fit option was so I could refactor the StaticCovariatesTransformer, which I'll talk about now.
In the last iteration of this PR, the StaticCovariatesTransformer was basically left untouched and was implemented in a way that did not utilise the refactored fit, transform, and inverse_transform methods. Since fitting the StaticCovariatesTransformer requires first collating the static covariates of all of the provided TimeSeries, I had to first had introduce a 'global fitting' option before refactoring StaticCovariatesTransformer (as mentioned in point 2). With that done, I could then refactor StaticCovariatesTransformer so that only a ts_transform, ts_inverse_transform, and a ts_fit method needed to be implemented (as opposed to directly overriding the transform, inverse_transform, and fit methods).

As an aside, it appears that statsforecast==1.5.0 (which was only released a few days ago) introduces some dependency conflicts, and resulted in some tests failing. It looks like this is a problem on the statsforecast side, so to get around this for the time being, I've specified statsforecast>=1.4,<1.5 in requirements/core.txt.

Hopefully that all makes sense, and please let me know what you guys think. Thanks in advance for any help.

Cheers,
Matt.

hrzn

LGTM, nice job @mabilton !

* Refactored data transformers classes. * Fixed failing data transformer tests. * Fixed minor bug in `test_diff.py` - `~ bool_var` should be `not bool_var`. * Added missing `params` arg to pipeline test mock method. * Added automatic `component_mask`ing of inputs/outputs. * Added tests for data transformer classes. * Refactored data transformers classes. * Fixed failing data transformer tests. * Fixed minor bug in `test_diff.py` - `~ bool_var` should be `not bool_var`. * Added missing `params` arg to pipeline test mock method. * Added automatic `component_mask`ing of inputs/outputs. * Added tests for data transformer classes. * Fixed bug when fewer timeseries specified than training timeseries. * Updated tests to check for 'fewer inputs than training series' behaviour. * Added `global_fit` option to `FittableDataTransformer`. * Refactored `StaticCovariatesTransformer`. * Added `global_fit` option to `BoxCox` and `Scaler` transforms. * Removed `test_window_transformer_iterator` test, since `_transformer_iterator` method unused. * Removed redundant `_*_iterator` methods of data transformers. * Added more data transformer documentation + made `component_mask` argument explicit. * `copy=False` in `apply_component_mask`. * Removed documentation references to `_*_iterators`. * Specified `statsforecast>=1.4,<1.5` to avoid dependency conflict. --------- Co-authored-by: Julien Herzen <[email protected]> Co-authored-by: madtoinou <[email protected]>

mabilton and others added 7 commits December 2, 2022 16:35

Refactored data transformers classes.

6bfb45f

Fixed failing data transformer tests.

a425a60

Fixed minor bug in test_diff.py - ~ bool_var should be `not bool_…

0ae6c71

…var`.

Added missing params arg to pipeline test mock method.

39a2cab

Added automatic component_masking of inputs/outputs.

9c52c1c

Added tests for data transformer classes.

e0aa978

Merge branch 'unit8co:master' into refactor/data_transformers

dc39b54

mabilton requested review from hrzn and dennisbader as code owners December 4, 2022 22:58

mabilton added 9 commits December 13, 2022 23:43

Refactored data transformers classes.

3e96b33

Fixed failing data transformer tests.

511a62a

Fixed minor bug in test_diff.py - ~ bool_var should be `not bool_…

533b5cd

…var`.

Added missing params arg to pipeline test mock method.

e72cb33

Added automatic component_masking of inputs/outputs.

0e34e0b

Added tests for data transformer classes.

a1b4f15

Merge branch 'master' into refactor/data_transformers

159dbb3

Fixed bug when fewer timeseries specified than training timeseries.

9af7823

Updated tests to check for 'fewer inputs than training series' behavi…

8400067

…our.

mabilton and others added 5 commits December 30, 2022 20:04

Merge branch 'master' into refactor/data_transformers

cdbcc74

Merge branch 'master' into refactor/data_transformers

fd98d00

Merge branch 'master' into refactor/data_transformers

2ff9327

Merge branch 'master' into refactor/data_transformers

4cad8bd

Merge branch 'master' into refactor/data_transformers

f1ddf9f

madtoinou reviewed Feb 14, 2023

View reviewed changes

darts/dataprocessing/transformers/boxcox.py Show resolved Hide resolved

darts/dataprocessing/transformers/diff.py Show resolved Hide resolved

darts/dataprocessing/transformers/fittable_data_transformer.py Show resolved Hide resolved

hrzn approved these changes Feb 23, 2023

View reviewed changes

Merge branch 'master' into refactor/data_transformers

3bc42cc

madtoinou mentioned this pull request Feb 23, 2023

Feat/improved data-transformers #1508

Closed

mabilton and others added 12 commits March 3, 2023 01:27

Added global_fit option to FittableDataTransformer.

96a395e

Refactored StaticCovariatesTransformer.

85d3b1f

Added global_fit option to BoxCox and Scaler transforms.

171c2b3

Removed test_window_transformer_iterator test, since `_transformer_…

42336df

…iterator` method unused.

Removed redundant _*_iterator methods of data transformers.

06ac977

Added more data transformer documentation + made component_mask arg…

1245e8d

…ument explicit.

copy=False in apply_component_mask.

6b908ba

Removed documentation references to _*_iterators.

2725ca2

Merge branch 'master' into refactor/data_transformers

e932b69

Merge branch 'master' into refactor/data_transformers

74dc846

Merge branch 'refactor/data_transformers' of https://github.com/mabil…

7e8f827

…ton/darts into refactor/data_transformers

Specified statsforecast>=1.4,<1.5 to avoid dependency conflict.

93dc878

hrzn approved these changes Mar 6, 2023

View reviewed changes

madtoinou added 2 commits March 6, 2023 16:51

Merge branch 'master' into refactor/data_transformers

f2ffbd0

Merge branch 'master' into refactor/data_transformers

6dcd887

madtoinou merged commit 8972d2e into unit8co:master Mar 7, 2023

madtoinou mentioned this pull request Mar 7, 2023

Fix/literal import #1630

Merged

This was referenced Mar 22, 2023

Make transformers show warning or exception when there is mismatch between number of series used for fit() and and transform() #1300

Closed

Improvements for Scalers applied on multiple series #1288

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor/data_transformers #1409

Refactor/data_transformers #1409

mabilton commented Dec 4, 2022

codecov-commenter commented Dec 4, 2022 •

edited

Loading

mabilton commented Dec 5, 2022 •

edited

Loading

mabilton commented Dec 14, 2022 •

edited

Loading

hrzn commented Dec 16, 2022

madtoinou left a comment •

edited

Loading

mabilton commented Feb 15, 2023

hrzn left a comment •

edited

Loading

mabilton commented Mar 4, 2023

hrzn left a comment

Refactor/data_transformers #1409

Refactor/data_transformers #1409

Conversation

mabilton commented Dec 4, 2022

Summary

Other Information

codecov-commenter commented Dec 4, 2022 • edited Loading

Codecov Report

mabilton commented Dec 5, 2022 • edited Loading

mabilton commented Dec 14, 2022 • edited Loading

hrzn commented Dec 16, 2022

madtoinou left a comment • edited Loading

Choose a reason for hiding this comment

mabilton commented Feb 15, 2023

hrzn left a comment • edited Loading

Choose a reason for hiding this comment

mabilton commented Mar 4, 2023

hrzn left a comment

Choose a reason for hiding this comment

codecov-commenter commented Dec 4, 2022 •

edited

Loading

mabilton commented Dec 5, 2022 •

edited

Loading

mabilton commented Dec 14, 2022 •

edited

Loading

madtoinou left a comment •

edited

Loading

hrzn left a comment •

edited

Loading