Automatic `_*_iterator`s and `component_mask`ing for Data Transformers #1407

mabilton · 2022-12-02T23:24:31Z

Is your feature request related to a current problem? Please describe.

In order to implement a new data transformer, users currently need to override the _*_iterator method (where * is transform, fit, and/or inverse_transform) if they wish to pass class attributes and/or fitted parameters to their implemented
ts_transform/ts_fit/ts_inverse_transform methods. Moreover, if users want their transform to be able to 'mask out' particular components using the component_mask keyword argument, they need to manually call _reshape_in and _reshape_out inside of their implemented ts_* method.

For a 'solid' example of what I mean, check out the BoxCox transformer code - notice how _fit_iterator, _transform_iterator, and _inverse_transform_iterator all need to be overridden to pass fitted parameters to their respective methods. Similarly, each method needs to pop the component_mask from kwargs, and then call _reshape_in to apply these masks.

In my view, these two steps should be made 'automatic' by default (I'll explain precisely what I mean by this shortly), for two reasons:

For most transformations, a lot of boilerplate code is required to achieve basic functionality (e.g. passing fixed or fitted parameters to ts_transform)
For new users wishing to implement their own transform, it's quite unintuitive having to override a private method (i.e. _*_iterator) to achieve something as basic as passing fitted parameters to their ts_transform method.

Describe proposed solution

By default, all fixed parameters (i.e. attributes defined in the child-most class) and fitted parameters (if the transformer is fittable) should be passed to a user's implemented ts_* methods. In my view, the easiest way to achieve this is to have ts_transform, ts_fit, and ts_inverse_transform all accept an additional params argument, where params is a dict that contains up to two keys:

fixed, which is another dictionary that stores the child-most class's attributes (i.e. the fixed parameters of the transformer).
fitted, which stores the fitted parameters returned by ts_fit; if the transformer doesn't inherit from FittableDataTransformer or ts_fit has been called, this key shouldn't be present.

As a simple illustration of what I mean by this:

from darts.dataprocessing.transformers.fittable_data_transformer import FittableDataTransformer
from darts.dataprocessing.transformers.invertible_data_transformer import InvertibleDataTransformer

class MyTransformer(FittableDataTransformer, InvertibleDataTransformer):
    def __init__(self, my_fixed_param):
            self._my_fixed_param = my_fixed_param
   
   def ts_fit(series, params):
         fixed_param =  params['fixed']['_my_fixed_param']
         return (fitted_param_1, fitted_param_2)

    def ts_transform(series, params):
           fixed_param = params['fixed']['_my_fixed_param']
           fitted_param_1, fitted_param_2 = params['fitted']
           # Transform code here
          return transformed_ts

    def ts_inverse_transform(series, params):
           fixed_param = params['fixed']['_my_fixed_param']
           fitted_param_1, fitted_param_2 = params['fitted']
           # Inverse Transform code here
          return inverse_transformed_ts

Moreover, if component_mask is supplied to any of these methods, then the series provided to that method will already have the relevant components removed, so the user doesn't need to worry about masking the series themselves. Similarly, after ts_transform/ts_inverse_transform has returned the transformed components, these should automatically be 'added back' to the original timeseries with the unmasked components. If the user doesn't want this 'automatic masking' behaviour to be applied and, instead, have component_masks given to them as a kwarg (i.e. current behaviour), there should obviously be an option allowing that.

Additional context
I have been working on this change, so I'll post a PR soon.

Any comments and/or suggestions are more than welcome.

Cheers,
Matt.

The text was updated successfully, but these errors were encountered:

hrzn · 2023-01-04T12:44:37Z

Nice proposal, that makes a lot of sense to me.

mabilton added the triage Issue waiting for triaging label Dec 2, 2022

mabilton mentioned this issue Dec 4, 2022

Refactor/data_transformers #1409

Merged

hrzn added improvement New feature or improvement and removed triage Issue waiting for triaging labels Jan 4, 2023

madtoinou closed this as completed in #1409 Mar 7, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Automatic `_*_iterator`s and `component_mask`ing for Data Transformers #1407

Automatic `_*_iterator`s and `component_mask`ing for Data Transformers #1407

mabilton commented Dec 2, 2022

hrzn commented Jan 4, 2023

Automatic _*_iterators and component_masking for Data Transformers #1407

Automatic _*_iterators and component_masking for Data Transformers #1407

Comments

mabilton commented Dec 2, 2022

hrzn commented Jan 4, 2023

Automatic `_*_iterator`s and `component_mask`ing for Data Transformers #1407

Automatic `_*_iterator`s and `component_mask`ing for Data Transformers #1407