[embed norm] switch to apex MixedFusedLayerNorm #262
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
as noticed by @thomasw21 - switching embed layernorm to use
MixedFusedLayerNorm
for consistency with other layer norms.Incidentally, this also fixes a bug with how
torch.nn.LayerNorm
was used until now.the framework was putting
LayerNorm
into the wrong param group here:Megatron-DeepSpeed/megatron/optimizer/__init__.py
Lines 31 to 45 in dd06ea3
it should have been in
no_weight_decay_params
but ended up inweight_decay_params
because in this moduleLayerNorm
is an alias forMixedFusedLayerNorm
, so ifisinstance(module_, LayerNorm)
wasFalse
.So if we want to use
torch.nn.LayerNorm
we have to change the code above to additionally check foror isinstance(module_, torch.nn.LayerNorm).