fix uneven issue & add balance autotp #4697

Yejing-Lai · 2023-11-17T01:11:46Z

This PR aims to balance the shard size of each worker as even as possible.

We refactor the tp_shard logic that can make AutoTP work when split_shape % num_kv_heads != 0.
When num_kv_heads is defined, the attention module relies on it to sharding, but the mlp and lm_head modules can use near even division to get more balance shard. It will get better performance.

Yejing-Lai · 2023-11-17T01:13:19Z

Hi @RezaYazdaniAminabadi @delock. Could you please help review this PR? Thanks~

delock · 2023-11-17T01:37:10Z

Hi @Yejing-Lai can you give some explaination on the need to have grainularity of 64 elements?
https://github.com/microsoft/DeepSpeed/pull/4697/files#diff-214e32993d5440123080193836e988f024771aa4f6931c614ef9ad42a493f398R31

Yejing-Lai · 2023-11-17T03:00:42Z

Hi @Yejing-Lai can you give some explaination on the need to have grainularity of 64 elements? https://github.com/microsoft/DeepSpeed/pull/4697/files#diff-214e32993d5440123080193836e988f024771aa4f6931c614ef9ad42a493f398R31

DNN library favors tensor size in granularity of power of 2, we pick 64 as a common granularity size.

delock · 2023-11-21T01:50:37Z

Hi @RezaYazdaniAminabadi , FYI This PR improves AutoTP sharding when number of heads cannot be divided by number of ranks. MLP layers will have better load balance when running AutoTP on 3 devices or 3 CPU sub-NUMA clusters.

tjruwase · 2023-12-13T20:40:17Z

@Yejing-Lai, please help resolve conflict.

Yejing-Lai · 2023-12-20T14:44:46Z

@Yejing-Lai, please help resolve conflict.

Hi @tjruwase. I resolved the conflict. Can you approve the workflows? Thanks~

Yejing-Lai · 2024-01-10T01:10:22Z

Hi @tjruwase. The conflict had been resolved. Could you please help review this PR? Thanks~

delock · 2024-01-12T01:13:58Z

Hi @tjruwase is this PR under review state or merge state? We are working on Intel Extension for PyTorch release and want to know whether this PR will be included in DeepSpeed next release. Thanks!

tjruwase · 2024-01-12T10:18:40Z

Hi @tjruwase is this PR under review state or merge state? We are working on Intel Extension for PyTorch release and want to know whether this PR will be included in DeepSpeed next release. Thanks!

@delock, sorry for the delay. This should be reviewed soon and will be included in next release.

This PR aims to balance the shard size of each worker as even as possible. 1. We refactor the tp_shard logic that can make AutoTP work when split_shape % num_kv_heads != 0. 2. When num_kv_heads is defined, the attention module relies on it to sharding, but the mlp and lm_head modules can use near even division to get more balance shard. It will get better performance. --------- Co-authored-by: Olatunji Ruwase <[email protected]> Co-authored-by: Lev Kurilenko <[email protected]>

Yejing-Lai requested review from RezaYazdaniAminabadi, jeffra, mrwyattii, awan-10, cmikeh2 and arashb as code owners November 17, 2023 01:11

fix uneven issue & add balance autotp

0c3df2b

Yejing-Lai force-pushed the lyj/uneven branch from 8c09539 to 0c3df2b Compare November 21, 2023 10:19

delock mentioned this pull request Nov 27, 2023

(Do not merge) (CPU) aggregation of few recent fixes/optimizations #3920

Closed

25 tasks

Yejing-Lai added 2 commits December 14, 2023 09:57

Merge branch 'master' into lyj/uneven

1a7a168

Merge branch 'master' into lyj/uneven

7f6426f

Yejing-Lai and others added 3 commits January 8, 2024 16:28

Merge branch 'master' into lyj/uneven

3be743e

fix name issue

9d6864b

Merge branch 'master' into lyj/uneven

fc71e63

Merge branch 'master' into lyj/uneven

ec5e561

Merge branch 'master' into lyj/uneven

c1266da

lekurile approved these changes Jan 12, 2024

View reviewed changes

awan-10 approved these changes Jan 12, 2024

View reviewed changes

awan-10 added this pull request to the merge queue Jan 12, 2024

Merged via the queue into deepspeedai:master with commit 29417ab Jan 13, 2024
12 checks passed

This was referenced May 9, 2024

[BUG] Uneven work distribution caused by get_shard_size changes #5513

Closed

[BUG] Uneven work distribution caused by get_shard_size changes #5515

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix uneven issue & add balance autotp #4697

fix uneven issue & add balance autotp #4697

Yejing-Lai commented Nov 17, 2023

Yejing-Lai commented Nov 17, 2023

delock commented Nov 17, 2023 •

edited

Loading

Yejing-Lai commented Nov 17, 2023

delock commented Nov 21, 2023 •

edited

Loading

tjruwase commented Dec 13, 2023

Yejing-Lai commented Dec 20, 2023

Yejing-Lai commented Jan 10, 2024

delock commented Jan 12, 2024

tjruwase commented Jan 12, 2024

fix uneven issue & add balance autotp #4697

fix uneven issue & add balance autotp #4697

Conversation

Yejing-Lai commented Nov 17, 2023

Yejing-Lai commented Nov 17, 2023

delock commented Nov 17, 2023 • edited Loading

Yejing-Lai commented Nov 17, 2023

delock commented Nov 21, 2023 • edited Loading

tjruwase commented Dec 13, 2023

Yejing-Lai commented Dec 20, 2023

Yejing-Lai commented Jan 10, 2024

delock commented Jan 12, 2024

tjruwase commented Jan 12, 2024

delock commented Nov 17, 2023 •

edited

Loading

delock commented Nov 21, 2023 •

edited

Loading