Fix `moe_normalize_expert_weights` when `top_k=1` #87

152334H · 2024-01-08T06:22:17Z

The router.py function,

    def _top_k(self, scores):
        if self.args.moe_top_k == 1:
            return scores.max(dim=-1) # <-- causes weight shape to become [S]
        return torch.topk(scores, self.args.moe_top_k, dim=-1) # <-- shape is normally [S,K]

caused expert weight norm to be calculated wrong:

        expert_weights, expert_indices = self._top_k(scores)
        if self.args.moe_normalize_expert_weights:
            # this function expects dim=-1 to only contain a single token's weights
            expert_weights = expert_weights / torch.norm(
                expert_weights, p=self.args.moe_normalize_expert_weights,dim=-1, keepdim=True)

After this PR, top-1 models with moe_normalize_expert_weights=1 should always have the final weights become 1 (where previously they would be divided weirdly)

megablocks/layers/router.py

tgale96 · 2024-01-08T15:37:38Z

Thanks for the PR! And great catch on this bug!

megablocks/layers/router.py

tgale96 · 2024-01-10T15:19:48Z

Thanks for the update! One last small comment and then I think we're ok to merge!

tgale96 · 2024-01-10T18:08:01Z

Thanks for the contribution!

normalize router weights *before* squeezing dim on top-k=1

12cd3de

tgale96 reviewed Jan 8, 2024

View reviewed changes

megablocks/layers/router.py Show resolved Hide resolved

keep top-1 optimisation

102d9e7

tgale96 reviewed Jan 10, 2024

View reviewed changes

megablocks/layers/router.py Outdated Show resolved Hide resolved

Update router.py

17fe5a2

tgale96 merged commit 04e4f1f into databricks:main Jan 10, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix `moe_normalize_expert_weights` when `top_k=1` #87

Fix `moe_normalize_expert_weights` when `top_k=1` #87

152334H commented Jan 8, 2024

tgale96 commented Jan 8, 2024

tgale96 commented Jan 10, 2024

tgale96 commented Jan 10, 2024

Fix moe_normalize_expert_weights when top_k=1 #87

Fix moe_normalize_expert_weights when top_k=1 #87

Conversation

152334H commented Jan 8, 2024

tgale96 commented Jan 8, 2024

tgale96 commented Jan 10, 2024

tgale96 commented Jan 10, 2024

Fix `moe_normalize_expert_weights` when `top_k=1` #87

Fix `moe_normalize_expert_weights` when `top_k=1` #87