Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor forecasting models #1590

Merged
merged 12 commits into from
Mar 26, 2023

Conversation

JanFidor
Copy link
Contributor

@JanFidor JanFidor commented Feb 22, 2023

Fixes #1586

Summary

For now I only pushed changes to baseline models

@hrzn Potential changes I wanted to get your feedback on before pushing:

  1. Refactor all __str__ for all models to be their class name with meaningful params captured in parenthesis
    ex.ample: "Auto-ARIMA-Statsforecasts" -> "StatsForecastAutoARIMA"
  2. Add logger to every raise_if and raise_if_not call which don't yet have it
  3. Use fstring formatting wherever possible

@codecov-commenter
Copy link

codecov-commenter commented Feb 25, 2023

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 94.06%. Comparing base (13b84c8) to head (15063a7).
Report is 292 commits behind head on master.

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #1590      +/-   ##
==========================================
- Coverage   94.13%   94.06%   -0.08%     
==========================================
  Files         125      125              
  Lines       11358    11300      -58     
==========================================
- Hits        10692    10629      -63     
- Misses        666      671       +5     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@dennisbader
Copy link
Collaborator

dennisbader commented Feb 28, 2023

Thanks for this!

Actually, it could be super nice if we improved the representation of our models in general (in the __repr__ method). Maybe to something similar to how sklearn does it?

They represent models like this:

  • default model without changing hparams:
    from sklearn.linear_models import LinearRegression
    model = LinearRegression()
    print(model)
    
    >>> LinearRegression()
    
  • model with changed hyperparameters:
    from sklearn.linear_models import LinearRegression
    model = LinearRegression(fit_intercept=False)
    print(model)
    
    >>> LinearRegression(fit_intercept=False)
    

We already save the model parameters that the user manually sets at model creation. Accessible after model creation with:

model.model_params

What do you think? Would you be willing to give this a go?

@JanFidor
Copy link
Contributor Author

Sure thing @dennisbader, btw while working on the other PR I noticed this line which imho looks like a raise_if() in disguise, should I convert it (and other similar asserts if I find them) to raise_if() / raise_if_not()?

@JanFidor
Copy link
Contributor Author

JanFidor commented Mar 1, 2023

@dennisbader what do you think about this implementation? If it's okay, I'll start deleting str overrides in children models

@dennisbader
Copy link
Collaborator

Sure thing @dennisbader, btw while working on the other PR I noticed this line which imho looks like a raise_if() in disguise, should I convert it (and other similar asserts if I find them) to raise_if() / raise_if_not()?

This is okay, raise_log is sometimes better to use, especially in for loops.
When we use formatted strings (f"something {val}") in for loops with raise_if or raisa_if_not, it can slow things down as the string needs to be computed every time.
With raise_log, the string is only evaluated if the condition is met.

Copy link
Collaborator

@dennisbader dennisbader left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very neat, and looks great! Thanks

Had only a minor comment about renaming the method and moving it to the correct place.

You can go ahead and remove the model specific str methods!

@@ -2220,3 +2220,40 @@ def _supress_generate_predict_encoding(self) -> bool:
@property
def extreme_lags(self):
return (-1, 1, None, 0, 0)

def __str__(self):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 Very neat, thanks!

Could you rename the method into __repr__ and move it to the ForecastingModel base class?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 for __repr__, however, it could also be beneficial to have __str__ setup correctly as well I guess (the two are called in different circumstances). WDYT?

Copy link
Contributor Author

@JanFidor JanFidor Mar 6, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I never really used __repr__ but if I understood it's point correctly (pretty much just a more informative __str__), then my intuition would be to make __str__ provide only the changed params while __repr__ provides them all, including the defaults. This way users can both get simple model labels for graph legends when comparing hyper-parameters and get the entire set of model's parameters without having to delve into documentation to check their default values. Would that make sense?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From what I understand the intention behind the magic methods is:

"""
The __str__() method is intended to return information in a readable format for humans to easily understand it. It is perfect for logging or displaying object information. While the __repr__() method's goal is to deliver a string representation of the object that may be used to replicate it, it is also designed to do the opposite.
(source)
"""

So in our case, the __str__ implementation in this PR can actually be used to create a new model object (it's doing what the __repr__ is intended to be used for).

From another source:
"""
If you don’t define a __str__() method for a class, then the built-in object implementation calls the __repr__() method instead.
"""

So I guess we would only have to define __repr__ and not __str__. It is perfectly fine though like you did it, I was just curious myself what the best practices were for these two magic methods :)

Copy link
Contributor

@hrzn hrzn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good!

@@ -2220,3 +2220,40 @@ def _supress_generate_predict_encoding(self) -> bool:
@property
def extreme_lags(self):
return (-1, 1, None, 0, 0)

def __str__(self):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 for __repr__, however, it could also be beneficial to have __str__ setup correctly as well I guess (the two are called in different circumstances). WDYT?

Copy link
Collaborator

@dennisbader dennisbader left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great, thanks a lot for this PR and the patience @JanFidor! 🚀

@JanFidor
Copy link
Contributor Author

Hi @dennisbader , again thanks for the review and help, I'm just happy to make a little impact when it comes to ML : )

@dennisbader dennisbader merged commit ba07326 into unit8co:master Mar 26, 2023
alexcolpitts96 pushed a commit to alexcolpitts96/darts that referenced this pull request May 31, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Refactor forecasting model files (__str__() methods, and assertions)
4 participants