-
-
Notifications
You must be signed in to change notification settings - Fork 442
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RMSnorm Implementation #101
Comments
@gdevos010 Hi Greg It is a bit subtle, but the only difference is that scale norm has one shared gamma multiplier across the entire feature dimension, while rms norm has gamma in the same dimension as the model dimensions https://github.com/lucidrains/x-transformers/blob/main/x_transformers/x_transformers.py#L352 vs https://github.com/lucidrains/x-transformers/blob/main/x_transformers/x_transformers.py#L363 |
@gdevos010 i would recommend rms norm, as it has been proven in a number of large language models out of deepmind |
Thanks, that makes sense. However looking at the scale norm paper, I'm wondering whether this scaling is needed; it seems that it's 1 in the paper (referring to Eq. (5) here), but I might be missing something of course. |
@hrzn ohh actually yes that appears to be an error on my part! thank you for catching that! |
@hrzn @gdevos010 here is a paper that does some head to head runs of the different types of normalizations https://arxiv.org/abs/2102.11972 may be informative for you two |
Oh nice, thanks. That's a very welcome paper! |
Hi lucidrains,
I was looking at adding the ScaleNorm and RMSNorm to another repo, and the implementations look almost identical. I have linked to the official implementation below. Am I missing something about the implementation? Thanks for all the great work.
https://github.com/bzhangGo/rmsnorm
The text was updated successfully, but these errors were encountered: