New alternatives to layer norm #1114

gdevos010 · 2022-08-02T22:09:50Z

Fixes #1113.

minor typos.

Summary

Other Information

gdevos010 · 2022-08-02T22:11:55Z

These could be added to N-Beats and N-Hits whenever the layerNorm issue gets worked out

gdevos010 · 2022-08-02T23:35:19Z

Not exactly the best/biggest improvement, but here is RMSNorm outperforming LayerNorm on the ice cream example

gdevos010 · 2022-08-04T14:34:49Z

@hrzn Can we retry these tests? All the tests pass locally.

dennisbader · 2022-08-06T15:55:48Z

Hi @gdevos010 and thanks for another PR.
Tests are failing when you upgrade PyTorch Lightning and PyTorch to the newest versions.
I'm working on a fix in #1124.

Could you have a look at the FeedForward parts that I changed for our TransformerModel? I believe you implemented those so I'd like to have your opinion on it.
The issue was that all our custom FeedForward layers were ignored by the PyTorch torch.nn.Transformer in previous PyTorch versions.

gdevos010 · 2022-08-06T23:31:51Z

I also found PowerNorm in my search.

hrzn

Thanks, nice initiative! I'm pointing out a few things to fix before it can be merged.

darts/models/components/LayerNormVariants.py

CHANGELOG.md

darts/models/components/LayerNormVariants.py

darts/models/components/PowerNorm.py

darts/models/forecasting/tft_submodels.py

darts/tests/models/components/test_LayerNormVariants.py

gdevos010 · 2022-08-07T21:10:18Z

@hrzn I asked lucidrain about it and here is his response. Let me know what you think and how we want to add it

lucidrains · 2022-08-07T22:37:53Z

yea, so the only two alternatives to layernorm i could recommend is (1) layernorm, but without the bias (removing biases from transformers has reportedly increased stability) and (2) rmsnorm

scalenorm i've heard mixed results. i wouldn't risk using that yet

gdevos010 · 2022-08-08T18:11:44Z

I removed ScaleNorm as suggested by lucidrains

codecov-commenter · 2022-08-08T18:40:54Z

Codecov Report

Base: 93.68% // Head: 93.72% // Increases project coverage by +0.04% 🎉

Coverage data is based on head (2692eac) compared to base (4a522a0).
Patch coverage: 100.00% of modified lines in pull request are covered.

Additional details and impacted files

@@            Coverage Diff             @@
##           master    #1114      +/-   ##
==========================================
+ Coverage   93.68%   93.72%   +0.04%     
==========================================
  Files          82       83       +1     
  Lines        8363     8385      +22     
==========================================
+ Hits         7835     7859      +24     
+ Misses        528      526       -2

Impacted Files	Coverage Δ
darts/models/forecasting/nhits.py	`98.55% <ø> (-0.02%)`	⬇️
darts/models/components/layer_norm_variants.py	`100.00% <100.00%> (ø)`
darts/models/forecasting/tft_model.py	`97.53% <100.00%> (+0.07%)`	⬆️
darts/models/forecasting/tft_submodels.py	`91.03% <100.00%> (ø)`
darts/models/forecasting/transformer_model.py	`100.00% <100.00%> (ø)`
darts/timeseries.py	`92.23% <0.00%> (-0.07%)`	⬇️
...arts/models/forecasting/torch_forecasting_model.py	`87.45% <0.00%> (-0.05%)`	⬇️
darts/models/forecasting/block_rnn_model.py	`98.24% <0.00%> (-0.04%)`	⬇️
darts/datasets/__init__.py	`100.00% <0.00%> (ø)`
... and 1 more

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report at Codecov.
📢 Do you have feedback about the report comment? Let us know in this issue.

hrzn · 2022-08-10T12:20:53Z

yea, so the only two alternatives to layernorm i could recommend is (1) layernorm, but without the bias (removing biases from transformers has reportedly increased stability) and (2) rmsnorm

scalenorm i've heard mixed results. i wouldn't risk using that yet

Thanks!
@gdevos010 could we also add the possibility to remove the LayerNorm bias? That'd be nice.

hrzn · 2022-08-22T11:45:29Z

@gdevos010 any update on this PR? I think it's not far from being mergeable.

gdevos010 · 2022-08-22T17:04:42Z

@hrzn The three variants are now LayerNorm, LayerNormNoBias and RMSNorm. I will post comparison on the sunspot dataset shortly.

hrzn · 2022-08-22T18:57:29Z

@hrzn The three variants are now LayerNorm, LayerNormNoBias and RMSNorm. I will post comparison on the sunspot dataset shortly.

Great, thanks!

gdevos010 · 2022-08-24T15:01:05Z

I definitely went down a rabbit hole trying to make these graphs. Here are 9 examples using relatively small models (~32k params).

hrzn

Thanks @gdevos010 !
There are only a few relatively minor things remaining:

Could we optionally support nn.Module (instead of str only) to specify the type of layer norms?
Could you perhaps also extend this to the TransformerModel? If that's too much effort we can leave it out of this PR.
A few minor other comments

darts/models/forecasting/tft_model.py

gdevos010 · 2022-08-25T23:49:00Z

@hrzn I added it to the TransformerModel. The implementation is a bit clucky but to support both GLU variants and layer norm individually while maintaining the default behavior was a bit tricky. If you have suggestions on simplifying it, let me know.

hrzn

Thanks for this iteration. It starts looking good. I would feel a bit better if before merging we could have another couple of unit tests that build & run TransformerModel and TFTModel with at least one non-default norm layer...

darts/models/forecasting/transformer_model.py

hrzn · 2022-08-26T12:10:55Z

darts/tests/models/forecasting/test_transformer_model.py

@@ -49,6 +49,7 @@ class TransformerModelTestCase(DartsBaseTestClass):
            dim_feedforward=2048,
            dropout=0.1,
            activation="relu",
+            norm_type=None,


Was this needed? Isn't this the default value?

darts/models/forecasting/tft_model.py

Co-authored-by: Julien Herzen <[email protected]>

hrzn

Thanks @gdevos010 !
LGTM. Are we good to merge?

gdevos010 · 2022-08-31T02:37:36Z

@hrzn We are good to merge! Sorry I took so long adding the tests. We are still trying to move :(

hrzn · 2022-08-31T09:00:27Z

@hrzn We are good to merge! Sorry I took so long adding the tests. We are still trying to move :(

Looks great! Merging now. Thanks!

Greg DeVos and others added 3 commits July 28, 2022 12:24

layer norm vairants

c9f262b

fixed default

04739da

license, changelog and test

8dbdb5d

gdevos010 requested review from hrzn, tomasvanpottelbergh, dennisbader and brunnedu as code owners August 2, 2022 22:09

gdevos010 changed the title ~~two new Alternatives to layer norm~~ Two new alternatives to layer norm Aug 2, 2022

Merge branch 'master' into altLayerNorm

41d1055

powerNorm

45596cb

gdevos010 changed the title ~~Two new alternatives to layer norm~~ Three new alternatives to layer norm Aug 6, 2022

gdevos010 added 4 commits August 6, 2022 16:40

arxiv link

7d103ea

typos

c9b5a1f

typo

7b8bd77

adding PowerNorm to docs

16f946a

hrzn suggested changes Aug 7, 2022

View reviewed changes

gdevos010 and others added 7 commits August 7, 2022 08:36

PR comments

c68a4a6

PR comments

99a739b

PR comments

c465102

Merge branch 'master' into altLayerNorm

89314d0

Merge branch 'master' into altLayerNorm

b0d6246

PR comments

0e7d381

Merge branch 'master' into altLayerNorm

13c58c1

gdevos010 and others added 2 commits August 18, 2022 18:58

Merge branch 'master' into altLayerNorm

52831de

Merge branch 'master' into altLayerNorm

ef9e9e6

added LayerNormNoBias and removed powernorm

0000e2d

gdevos010 requested review from hrzn and removed request for tomasvanpottelbergh August 24, 2022 15:01

hrzn reviewed Aug 25, 2022

View reviewed changes

darts/models/forecasting/tft_model.py Outdated Show resolved Hide resolved

gdevos010 and others added 3 commits August 25, 2022 14:58

Merge branch 'master' into altLayerNorm

9dc066f

custom norm_type

5354977

new norm layers for transformer model

476d666

fix test

e5b2943

hrzn reviewed Aug 26, 2022

View reviewed changes

gdevos010 changed the title ~~Three new alternatives to layer norm~~ New alternatives to layer norm Aug 27, 2022

gdevos010 and others added 2 commits August 27, 2022 13:27

Update darts/models/forecasting/transformer_model.py

98b0771

Co-authored-by: Julien Herzen <[email protected]>

Update darts/models/forecasting/tft_model.py

f3094e9

Co-authored-by: Julien Herzen <[email protected]>

hrzn approved these changes Aug 30, 2022

View reviewed changes

Greg DeVos and others added 2 commits August 30, 2022 19:35

layer norm test

4518995

Merge branch 'master' into altLayerNorm

2692eac

hrzn approved these changes Aug 31, 2022

View reviewed changes

hrzn merged commit 2fc99b9 into unit8co:master Aug 31, 2022

gdevos010 deleted the altLayerNorm branch September 26, 2022 01:47

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

New alternatives to layer norm #1114

New alternatives to layer norm #1114

gdevos010 commented Aug 2, 2022 •

edited

Loading

gdevos010 commented Aug 2, 2022

gdevos010 commented Aug 2, 2022 •

edited

Loading

gdevos010 commented Aug 4, 2022

dennisbader commented Aug 6, 2022

gdevos010 commented Aug 6, 2022

hrzn left a comment

gdevos010 commented Aug 7, 2022

lucidrains commented Aug 7, 2022 •

edited

Loading

gdevos010 commented Aug 8, 2022

codecov-commenter commented Aug 8, 2022 •

edited

Loading

hrzn commented Aug 10, 2022 •

edited

Loading

hrzn commented Aug 22, 2022

gdevos010 commented Aug 22, 2022

hrzn commented Aug 22, 2022

gdevos010 commented Aug 24, 2022 •

edited

Loading

hrzn left a comment

gdevos010 commented Aug 25, 2022

hrzn left a comment

hrzn Aug 26, 2022

hrzn left a comment

gdevos010 commented Aug 31, 2022 •

edited

Loading

hrzn commented Aug 31, 2022

New alternatives to layer norm #1114

New alternatives to layer norm #1114

Conversation

gdevos010 commented Aug 2, 2022 • edited Loading

Summary

Other Information

gdevos010 commented Aug 2, 2022

gdevos010 commented Aug 2, 2022 • edited Loading

gdevos010 commented Aug 4, 2022

dennisbader commented Aug 6, 2022

gdevos010 commented Aug 6, 2022

hrzn left a comment

Choose a reason for hiding this comment

gdevos010 commented Aug 7, 2022

lucidrains commented Aug 7, 2022 • edited Loading

gdevos010 commented Aug 8, 2022

codecov-commenter commented Aug 8, 2022 • edited Loading

Codecov Report

hrzn commented Aug 10, 2022 • edited Loading

hrzn commented Aug 22, 2022

gdevos010 commented Aug 22, 2022

hrzn commented Aug 22, 2022

gdevos010 commented Aug 24, 2022 • edited Loading

hrzn left a comment

Choose a reason for hiding this comment

gdevos010 commented Aug 25, 2022

hrzn left a comment

Choose a reason for hiding this comment

hrzn Aug 26, 2022

Choose a reason for hiding this comment

hrzn left a comment

Choose a reason for hiding this comment

gdevos010 commented Aug 31, 2022 • edited Loading

hrzn commented Aug 31, 2022

gdevos010 commented Aug 2, 2022 •

edited

Loading

gdevos010 commented Aug 2, 2022 •

edited

Loading

lucidrains commented Aug 7, 2022 •

edited

Loading

codecov-commenter commented Aug 8, 2022 •

edited

Loading

hrzn commented Aug 10, 2022 •

edited

Loading

gdevos010 commented Aug 24, 2022 •

edited

Loading

gdevos010 commented Aug 31, 2022 •

edited

Loading