Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mpi/coll: add TSP-based scatter/ring-based-allgather algorithm for Ibcast #5834

Merged
merged 1 commit into from
Feb 9, 2022

Conversation

zhenggb72
Copy link
Collaborator

@zhenggb72 zhenggb72 commented Feb 7, 2022

This algorithm is similar to the existing scatter followed by recursive exchange-based allgather Ibcast algorithm.
This ring-based algorithm is similar to the idea used in Baidu's ring allreduce (https://www.tomshardware.com/news/baidu-svail-ring-allreduce-library,33691.html)

Pull Request Description

Author Checklist

  • Provide Description
    Particularly focus on why, not what. Reference background, issues, test failures, xfail entries, etc.
  • Commits Follow Good Practice
    Commits are self-contained and do not do two things at once.
    Commit message is of the form: module: short description
    Commit message explains what's in the commit.
  • Passes All Tests
    Whitespace checker. Warnings test. Additional tests via comments.
  • Contribution Agreement
    For non-Argonne authors, check contribution agreement.
    If necessary, request an explicit comment from your companies PR approval manager.

@zhenggb72 zhenggb72 requested a review from yfguo February 7, 2022 16:14
@zhenggb72
Copy link
Collaborator Author

test:mpich/ch4/ofi

@yfguo
Copy link
Contributor

yfguo commented Feb 7, 2022

@zhenggb72 Please take a look at the scatterv and bcast timeouts.

@zhenggb72 zhenggb72 force-pushed the ibcast_scatterv_ring_allgatherv branch 2 times, most recently from 3cad084 to e1bfd3c Compare February 8, 2022 19:07
@zhenggb72
Copy link
Collaborator Author

test:mpich/ch4/ofi

@zhenggb72
Copy link
Collaborator Author

@zhenggb72 Please take a look at the scatterv and bcast timeouts.

Fixed. Looks like all tests are passing now. I think it is ready for review.

@zhenggb72 zhenggb72 force-pushed the ibcast_scatterv_ring_allgatherv branch from e1bfd3c to a0f1e29 Compare February 9, 2022 07:10
@zhenggb72 zhenggb72 merged commit b6599fb into pmodels:main Feb 9, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants