-
Notifications
You must be signed in to change notification settings - Fork 4.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix uneven issue & add balance autotp #4697
Conversation
Hi @RezaYazdaniAminabadi @delock. Could you please help review this PR? Thanks~ |
Hi @Yejing-Lai can you give some explaination on the need to have grainularity of 64 elements? |
DNN library favors tensor size in granularity of power of 2, we pick 64 as a common granularity size. |
Hi @RezaYazdaniAminabadi , FYI This PR improves AutoTP sharding when number of heads cannot be divided by number of ranks. MLP layers will have better load balance when running AutoTP on 3 devices or 3 CPU sub-NUMA clusters. |
8c09539
to
0c3df2b
Compare
@Yejing-Lai, please help resolve conflict. |
Hi @tjruwase. I resolved the conflict. Can you approve the workflows? Thanks~ |
Hi @tjruwase. The conflict had been resolved. Could you please help review this PR? Thanks~ |
Hi @tjruwase is this PR under review state or merge state? We are working on Intel Extension for PyTorch release and want to know whether this PR will be included in DeepSpeed next release. Thanks! |
This PR aims to balance the shard size of each worker as even as possible. 1. We refactor the tp_shard logic that can make AutoTP work when split_shape % num_kv_heads != 0. 2. When num_kv_heads is defined, the attention module relies on it to sharding, but the mlp and lm_head modules can use near even division to get more balance shard. It will get better performance. --------- Co-authored-by: Olatunji Ruwase <[email protected]> Co-authored-by: Lev Kurilenko <[email protected]>
This PR aims to balance the shard size of each worker as even as possible.