Enable mixtral 8x7b autotp #5257

Yejing-Lai · 2024-03-12T02:14:30Z

This PR aims to enable mixtral 8x7b (MoE model) autotp.

Yejing-Lai · 2024-03-12T02:15:44Z

Hi @mrwyattii @delock. Please kindly review. Thanks!

loadams · 2024-03-12T22:40:57Z

Hi @mrwyattii @delock. Please kindly review. Thanks!

@delock - do we want this merged in after your CPU autoTP PR?

delock · 2024-03-13T01:08:52Z

Hi @mrwyattii @delock. Please kindly review. Thanks!

@delock - do we want this merged in after your CPU autoTP PR?

Hi @loadams , this can be merged before CPU autoTP workflow PR. I'll keep on working on that PR.

Yejing-Lai · 2024-03-18T07:07:34Z

Hi @loadams. From the failure log it seems an env issue. Could you run the CI again to check if env issue?

loadams · 2024-03-18T14:36:22Z

Hi @loadams. From the failure log it seems an env issue. Could you run the CI again to check if env issue?

Hi @Yejing-Lai - yes we have a known env issue that we are working to resolve and will merge this PR when fixed.

This PR aims to enable mixtral 8x7b (MoE model) autotp. Co-authored-by: Logan Adams <[email protected]>

As title says. Default behavior of arctic model produces shape issues with AutoTP due to the MLP layer performing `w2 * act(w1*w3)`. However, method provided to fix Mixtral-7x8b in #5257 does not work since the MLP for Arctic is also used within a ModuleList for the MoE. This results in MLP weights hiding behind individual experts as layers `#.w#`, which is not caught by the fix in #5257. This adds the check directly within replace, where it can check for actual layer names for the `w2` key in the model to patch with `all_reduce`. --------- Signed-off-by: Daniel Huang <[email protected]> Co-authored-by: Olatunji Ruwase <[email protected]> Co-authored-by: Logan Adams <[email protected]>

As title says. Default behavior of arctic model produces shape issues with AutoTP due to the MLP layer performing `w2 * act(w1*w3)`. However, method provided to fix Mixtral-7x8b in deepspeedai#5257 does not work since the MLP for Arctic is also used within a ModuleList for the MoE. This results in MLP weights hiding behind individual experts as layers `#.w#`, which is not caught by the fix in deepspeedai#5257. This adds the check directly within replace, where it can check for actual layer names for the `w2` key in the model to patch with `all_reduce`. --------- Signed-off-by: Daniel Huang <[email protected]> Co-authored-by: Olatunji Ruwase <[email protected]> Co-authored-by: Logan Adams <[email protected]> Signed-off-by: siqi <[email protected]>

As title says. Default behavior of arctic model produces shape issues with AutoTP due to the MLP layer performing `w2 * act(w1*w3)`. However, method provided to fix Mixtral-7x8b in deepspeedai#5257 does not work since the MLP for Arctic is also used within a ModuleList for the MoE. This results in MLP weights hiding behind individual experts as layers `#.w#`, which is not caught by the fix in deepspeedai#5257. This adds the check directly within replace, where it can check for actual layer names for the `w2` key in the model to patch with `all_reduce`. --------- Signed-off-by: Daniel Huang <[email protected]> Co-authored-by: Olatunji Ruwase <[email protected]> Co-authored-by: Logan Adams <[email protected]>

enable mixtral 7x8b autotp

5ddd977

Yejing-Lai requested review from mrwyattii, awan-10 and arashb as code owners March 12, 2024 02:14

delock mentioned this pull request Mar 12, 2024

(Do not merge) (CPU) aggregation of few recent fixes/optimizations #3920

Closed

25 tasks

Merge branch 'master' into lyj/enable_mixtral

9f029be

Merge branch 'master' into lyj/enable_mixtral

0f6c619

mrwyattii approved these changes Mar 13, 2024

View reviewed changes

Merge branch 'master' into lyj/enable_mixtral

8d2dc62

loadams approved these changes Mar 13, 2024

View reviewed changes

loadams enabled auto-merge March 13, 2024 23:00

Yejing-Lai added 2 commits March 14, 2024 09:57

Merge branch 'master' into lyj/enable_mixtral

02359e1

Merge branch 'master' into lyj/enable_mixtral

890b73b

Merge branch 'master' into lyj/enable_mixtral

891c728

loadams added this pull request to the merge queue Mar 27, 2024

github-merge-queue bot removed this pull request from the merge queue due to failed status checks Mar 27, 2024

tjruwase added this pull request to the merge queue Mar 27, 2024

Merged via the queue into deepspeedai:master with commit 8d98e17 Mar 27, 2024
12 checks passed

rraminen pushed a commit to ROCm/DeepSpeed that referenced this pull request May 9, 2024

Enable mixtral 8x7b autotp (deepspeedai#5257)

877efd4

This PR aims to enable mixtral 8x7b (MoE model) autotp. Co-authored-by: Logan Adams <[email protected]>

dbyoung18 pushed a commit to dbyoung18/DeepSpeed that referenced this pull request Jun 11, 2024

Enable mixtral 8x7b autotp (deepspeedai#5257)

8d0fe28

This PR aims to enable mixtral 8x7b (MoE model) autotp. Co-authored-by: Logan Adams <[email protected]>

pi314ever mentioned this pull request Dec 11, 2024

Add arctic model support by adding w2 to all_reduce #6856

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enable mixtral 8x7b autotp #5257

Enable mixtral 8x7b autotp #5257

Yejing-Lai commented Mar 12, 2024

Yejing-Lai commented Mar 12, 2024

loadams commented Mar 12, 2024

delock commented Mar 13, 2024

Yejing-Lai commented Mar 18, 2024

loadams commented Mar 18, 2024

Enable mixtral 8x7b autotp #5257

Enable mixtral 8x7b autotp #5257

Conversation

Yejing-Lai commented Mar 12, 2024

Yejing-Lai commented Mar 12, 2024

loadams commented Mar 12, 2024

delock commented Mar 13, 2024

Yejing-Lai commented Mar 18, 2024

loadams commented Mar 18, 2024