Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New alternatives to layer norm #1114

Merged
merged 29 commits into from
Aug 31, 2022
Merged

New alternatives to layer norm #1114

merged 29 commits into from
Aug 31, 2022

Conversation

gdevos010
Copy link
Contributor

@gdevos010 gdevos010 commented Aug 2, 2022

Fixes #1113.

minor typos.

Summary

Other Information

@gdevos010
Copy link
Contributor Author

These could be added to N-Beats and N-Hits whenever the layerNorm issue gets worked out

@gdevos010 gdevos010 changed the title two new Alternatives to layer norm Two new alternatives to layer norm Aug 2, 2022
@gdevos010
Copy link
Contributor Author

gdevos010 commented Aug 2, 2022

Not exactly the best/biggest improvement, but here is RMSNorm outperforming LayerNorm on the ice cream example
image
image

@gdevos010
Copy link
Contributor Author

@hrzn Can we retry these tests? All the tests pass locally.

@dennisbader
Copy link
Collaborator

Hi @gdevos010 and thanks for another PR.
Tests are failing when you upgrade PyTorch Lightning and PyTorch to the newest versions.
I'm working on a fix in #1124.

Could you have a look at the FeedForward parts that I changed for our TransformerModel? I believe you implemented those so I'd like to have your opinion on it.
The issue was that all our custom FeedForward layers were ignored by the PyTorch torch.nn.Transformer in previous PyTorch versions.

@gdevos010 gdevos010 changed the title Two new alternatives to layer norm Three new alternatives to layer norm Aug 6, 2022
@gdevos010
Copy link
Contributor Author

I also found PowerNorm in my search.
Forecast_Air Passengers TFT (QR) (LayerNorm)

Forecast_Air Passengers TFT (QR) (PowerNorm)

Copy link
Contributor

@hrzn hrzn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, nice initiative! I'm pointing out a few things to fix before it can be merged.

@gdevos010
Copy link
Contributor Author

@hrzn I asked lucidrain about it and here is his response. Let me know what you think and how we want to add it

@lucidrains
Copy link

lucidrains commented Aug 7, 2022

yea, so the only two alternatives to layernorm i could recommend is (1) layernorm, but without the bias (removing biases from transformers has reportedly increased stability) and (2) rmsnorm

scalenorm i've heard mixed results. i wouldn't risk using that yet

@gdevos010
Copy link
Contributor Author

I removed ScaleNorm as suggested by lucidrains

@codecov-commenter
Copy link

codecov-commenter commented Aug 8, 2022

Codecov Report

Base: 93.68% // Head: 93.72% // Increases project coverage by +0.04% 🎉

Coverage data is based on head (2692eac) compared to base (4a522a0).
Patch coverage: 100.00% of modified lines in pull request are covered.

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #1114      +/-   ##
==========================================
+ Coverage   93.68%   93.72%   +0.04%     
==========================================
  Files          82       83       +1     
  Lines        8363     8385      +22     
==========================================
+ Hits         7835     7859      +24     
+ Misses        528      526       -2     
Impacted Files Coverage Δ
darts/models/forecasting/nhits.py 98.55% <ø> (-0.02%) ⬇️
darts/models/components/layer_norm_variants.py 100.00% <100.00%> (ø)
darts/models/forecasting/tft_model.py 97.53% <100.00%> (+0.07%) ⬆️
darts/models/forecasting/tft_submodels.py 91.03% <100.00%> (ø)
darts/models/forecasting/transformer_model.py 100.00% <100.00%> (ø)
darts/timeseries.py 92.23% <0.00%> (-0.07%) ⬇️
...arts/models/forecasting/torch_forecasting_model.py 87.45% <0.00%> (-0.05%) ⬇️
darts/models/forecasting/block_rnn_model.py 98.24% <0.00%> (-0.04%) ⬇️
darts/datasets/__init__.py 100.00% <0.00%> (ø)
... and 1 more

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report at Codecov.
📢 Do you have feedback about the report comment? Let us know in this issue.

@hrzn
Copy link
Contributor

hrzn commented Aug 10, 2022

yea, so the only two alternatives to layernorm i could recommend is (1) layernorm, but without the bias (removing biases from transformers has reportedly increased stability) and (2) rmsnorm

scalenorm i've heard mixed results. i wouldn't risk using that yet

Thanks!
@gdevos010 could we also add the possibility to remove the LayerNorm bias? That'd be nice.

@hrzn
Copy link
Contributor

hrzn commented Aug 22, 2022

@gdevos010 any update on this PR? I think it's not far from being mergeable.

@gdevos010
Copy link
Contributor Author

@hrzn The three variants are now LayerNorm, LayerNormNoBias and RMSNorm. I will post comparison on the sunspot dataset shortly.

@hrzn
Copy link
Contributor

hrzn commented Aug 22, 2022

@hrzn The three variants are now LayerNorm, LayerNormNoBias and RMSNorm. I will post comparison on the sunspot dataset shortly.

Great, thanks!

@gdevos010
Copy link
Contributor Author

gdevos010 commented Aug 24, 2022

I definitely went down a rabbit hole trying to make these graphs. Here are 9 examples using relatively small models (~32k params).

Backtest_batch_size=128,norm_type=LayerNorm
Backtest_batch_size=128,norm_type=LayerNormNoBias
Backtest_batch_size=128,norm_type=RMSNorm

Backtest_batch_size=256,norm_type=LayerNorm
Backtest_batch_size=256,norm_type=LayerNormNoBias
Backtest_batch_size=256,norm_type=RMSNorm

Backtest_batch_size=512,norm_type=LayerNorm
Backtest_batch_size=512,norm_type=LayerNormNoBias
Backtest_batch_size=512,norm_type=RMSNorm

@gdevos010 gdevos010 requested review from hrzn and removed request for tomasvanpottelbergh August 24, 2022 15:01
Copy link
Contributor

@hrzn hrzn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @gdevos010 !
There are only a few relatively minor things remaining:

  • Could we optionally support nn.Module (instead of str only) to specify the type of layer norms?
  • Could you perhaps also extend this to the TransformerModel? If that's too much effort we can leave it out of this PR.
  • A few minor other comments

@gdevos010
Copy link
Contributor Author

@hrzn I added it to the TransformerModel. The implementation is a bit clucky but to support both GLU variants and layer norm individually while maintaining the default behavior was a bit tricky. If you have suggestions on simplifying it, let me know.

Copy link
Contributor

@hrzn hrzn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this iteration. It starts looking good. I would feel a bit better if before merging we could have another couple of unit tests that build & run TransformerModel and TFTModel with at least one non-default norm layer...

@@ -49,6 +49,7 @@ class TransformerModelTestCase(DartsBaseTestClass):
dim_feedforward=2048,
dropout=0.1,
activation="relu",
norm_type=None,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was this needed? Isn't this the default value?

@gdevos010 gdevos010 changed the title Three new alternatives to layer norm New alternatives to layer norm Aug 27, 2022
Copy link
Contributor

@hrzn hrzn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @gdevos010 !
LGTM. Are we good to merge?

@gdevos010
Copy link
Contributor Author

gdevos010 commented Aug 31, 2022

@hrzn We are good to merge! Sorry I took so long adding the tests. We are still trying to move :(

@hrzn
Copy link
Contributor

hrzn commented Aug 31, 2022

@hrzn We are good to merge! Sorry I took so long adding the tests. We are still trying to move :(

Looks great! Merging now. Thanks!

@hrzn hrzn merged commit 2fc99b9 into unit8co:master Aug 31, 2022
@gdevos010 gdevos010 deleted the altLayerNorm branch September 26, 2022 01:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

A couple LayerNorm variants
5 participants