-
Notifications
You must be signed in to change notification settings - Fork 286
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
coll: Add multileader allreduce composition #5921
Conversation
test:mpich/ch4/most |
9e18a2d
to
d8acaac
Compare
test:mpich/ch4/most |
@yfguo The testing passed successfully on this and it is ready for review. |
d8acaac
to
8432cc1
Compare
test:mpich/ch4/most |
The failed test is a timeout |
test:mpich/ch4/most |
Co-authored-by: Surabhi Jain <[email protected]>
Multi-leaders based composition: It has `num_leaders` per node, which reduce the data within sub-node_comm. It is followed by intra_node reduce and inter_node allreduce on the piece of data the leader is responsible for. A shared memory buffer is allocated per leader. If size of message exceeds this shm buffer, the message is chunked. Constraints: For a comm, all nodes should have same number of ranks per node, op should be commutative. Co-authored-by: Surabhi Jain <[email protected]>
Co-authored-by: Surabhi Jain <[email protected]>
8432cc1
to
8e9504a
Compare
test:mpich/ch4/most |
test:mpich/ch4/ofi |
Pull Request Description
Multi-leaders based composition: It has
num_leaders
per node, which reduce the data within sub-node_comm. It is followed by intra_node reduce and inter_node allreduce on the piece of data the leader is responsible for. A shared memory buffer is allocated per leader. If size of message exceeds this shm buffer, the message is chunked.Constraints: For a comm, all nodes should have same number of ranks per node, op should be commutative.
Author Checklist
Particularly focus on why, not what. Reference background, issues, test failures, xfail entries, etc.
Commits are self-contained and do not do two things at once.
Commit message is of the form:
module: short description
Commit message explains what's in the commit.
Whitespace checker. Warnings test. Additional tests via comments.
For non-Argonne authors, check contribution agreement.
If necessary, request an explicit comment from your companies PR approval manager.