-
Notifications
You must be signed in to change notification settings - Fork 918
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
New alternatives to layer norm #1114
Conversation
These could be added to N-Beats and N-Hits whenever the layerNorm issue gets worked out |
@hrzn Can we retry these tests? All the tests pass locally. |
Hi @gdevos010 and thanks for another PR. Could you have a look at the FeedForward parts that I changed for our TransformerModel? I believe you implemented those so I'd like to have your opinion on it. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, nice initiative! I'm pointing out a few things to fix before it can be merged.
yea, so the only two alternatives to layernorm i could recommend is (1) layernorm, but without the bias (removing biases from transformers has reportedly increased stability) and (2) rmsnorm scalenorm i've heard mixed results. i wouldn't risk using that yet |
I removed ScaleNorm as suggested by lucidrains |
Codecov ReportBase: 93.68% // Head: 93.72% // Increases project coverage by
Additional details and impacted files@@ Coverage Diff @@
## master #1114 +/- ##
==========================================
+ Coverage 93.68% 93.72% +0.04%
==========================================
Files 82 83 +1
Lines 8363 8385 +22
==========================================
+ Hits 7835 7859 +24
+ Misses 528 526 -2
Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here. ☔ View full report at Codecov. |
Thanks! |
@gdevos010 any update on this PR? I think it's not far from being mergeable. |
@hrzn The three variants are now |
Great, thanks! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @gdevos010 !
There are only a few relatively minor things remaining:
- Could we optionally support
nn.Module
(instead ofstr
only) to specify the type of layer norms? - Could you perhaps also extend this to the
TransformerModel
? If that's too much effort we can leave it out of this PR. - A few minor other comments
@hrzn I added it to the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for this iteration. It starts looking good. I would feel a bit better if before merging we could have another couple of unit tests that build & run TransformerModel and TFTModel with at least one non-default norm layer...
@@ -49,6 +49,7 @@ class TransformerModelTestCase(DartsBaseTestClass): | |||
dim_feedforward=2048, | |||
dropout=0.1, | |||
activation="relu", | |||
norm_type=None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Was this needed? Isn't this the default value?
Co-authored-by: Julien Herzen <[email protected]>
Co-authored-by: Julien Herzen <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @gdevos010 !
LGTM. Are we good to merge?
@hrzn We are good to merge! Sorry I took so long adding the tests. We are still trying to move :( |
Looks great! Merging now. Thanks! |
Fixes #1113.
minor typos.
Summary
Other Information