Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Automatic _*_iterators and component_masking for Data Transformers #1407

Closed
mabilton opened this issue Dec 2, 2022 · 1 comment · Fixed by #1409
Closed

Automatic _*_iterators and component_masking for Data Transformers #1407

mabilton opened this issue Dec 2, 2022 · 1 comment · Fixed by #1409
Labels
improvement New feature or improvement

Comments

@mabilton
Copy link
Contributor

mabilton commented Dec 2, 2022

Is your feature request related to a current problem? Please describe.

In order to implement a new data transformer, users currently need to override the _*_iterator method (where * is transform, fit, and/or inverse_transform) if they wish to pass class attributes and/or fitted parameters to their implemented
ts_transform/ts_fit/ts_inverse_transform methods. Moreover, if users want their transform to be able to 'mask out' particular components using the component_mask keyword argument, they need to manually call _reshape_in and _reshape_out inside of their implemented ts_* method.

For a 'solid' example of what I mean, check out the BoxCox transformer code - notice how _fit_iterator, _transform_iterator, and _inverse_transform_iterator all need to be overridden to pass fitted parameters to their respective methods. Similarly, each method needs to pop the component_mask from kwargs, and then call _reshape_in to apply these masks.

In my view, these two steps should be made 'automatic' by default (I'll explain precisely what I mean by this shortly), for two reasons:

  1. For most transformations, a lot of boilerplate code is required to achieve basic functionality (e.g. passing fixed or fitted parameters to ts_transform)
  2. For new users wishing to implement their own transform, it's quite unintuitive having to override a private method (i.e. _*_iterator) to achieve something as basic as passing fitted parameters to their ts_transform method.

Describe proposed solution

By default, all fixed parameters (i.e. attributes defined in the child-most class) and fitted parameters (if the transformer is fittable) should be passed to a user's implemented ts_* methods. In my view, the easiest way to achieve this is to have ts_transform, ts_fit, and ts_inverse_transform all accept an additional params argument, where params is a dict that contains up to two keys:

  1. fixed, which is another dictionary that stores the child-most class's attributes (i.e. the fixed parameters of the transformer).
  2. fitted, which stores the fitted parameters returned by ts_fit; if the transformer doesn't inherit from FittableDataTransformer or ts_fit has been called, this key shouldn't be present.

As a simple illustration of what I mean by this:

from darts.dataprocessing.transformers.fittable_data_transformer import FittableDataTransformer
from darts.dataprocessing.transformers.invertible_data_transformer import InvertibleDataTransformer

class MyTransformer(FittableDataTransformer, InvertibleDataTransformer):
    def __init__(self, my_fixed_param):
            self._my_fixed_param = my_fixed_param
   
   def ts_fit(series, params):
         fixed_param =  params['fixed']['_my_fixed_param']
         return (fitted_param_1, fitted_param_2)

    def ts_transform(series, params):
           fixed_param = params['fixed']['_my_fixed_param']
           fitted_param_1, fitted_param_2 = params['fitted']
           # Transform code here
          return transformed_ts

    def ts_inverse_transform(series, params):
           fixed_param = params['fixed']['_my_fixed_param']
           fitted_param_1, fitted_param_2 = params['fitted']
           # Inverse Transform code here
          return inverse_transformed_ts

Moreover, if component_mask is supplied to any of these methods, then the series provided to that method will already have the relevant components removed, so the user doesn't need to worry about masking the series themselves. Similarly, after ts_transform/ts_inverse_transform has returned the transformed components, these should automatically be 'added back' to the original timeseries with the unmasked components. If the user doesn't want this 'automatic masking' behaviour to be applied and, instead, have component_masks given to them as a kwarg (i.e. current behaviour), there should obviously be an option allowing that.

Additional context
I have been working on this change, so I'll post a PR soon.

Any comments and/or suggestions are more than welcome.

Cheers,
Matt.

@mabilton mabilton added the triage Issue waiting for triaging label Dec 2, 2022
@hrzn
Copy link
Contributor

hrzn commented Jan 4, 2023

Nice proposal, that makes a lot of sense to me.

@hrzn hrzn added improvement New feature or improvement and removed triage Issue waiting for triaging labels Jan 4, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
improvement New feature or improvement
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants