Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Coll: Add CVAR to select composition for a few blocking collectives #5823

Merged
merged 7 commits into from
Feb 7, 2022

Conversation

zhenggb72
Copy link
Collaborator

@zhenggb72 zhenggb72 commented Feb 3, 2022

Adds CVARs to select composition for a few supported collectives including Barrier, Bcast, Allreduce, Reduce, Allgather and Alltoall.

Pull Request Description

Author Checklist

  • Provide Description
    Particularly focus on why, not what. Reference background, issues, test failures, xfail entries, etc.
  • Commits Follow Good Practice
    Commits are self-contained and do not do two things at once.
    Commit message is of the form: module: short description
    Commit message explains what's in the commit.
  • Passes All Tests
    Whitespace checker. Warnings test. Additional tests via comments.
  • Contribution Agreement
    For non-Argonne authors, check contribution agreement.
    If necessary, request an explicit comment from your companies PR approval manager.

@zhenggb72
Copy link
Collaborator Author

test:mpich/ch4/ofi

@zhenggb72
Copy link
Collaborator Author

rebased

@zhenggb72 zhenggb72 merged commit aa48d69 into pmodels:main Feb 7, 2022
Select composition (inter_node + intra_node) for Barrier
0 Auto selection
1 NM + SHM
2 NM only
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahh, that makes sense. Why don't we change _alpha and _beta from the algorithm names to reflect this semantic directly?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, instead of int, why don't we use an enum so we can use the semantic meaningful algorithm name, e.g. "auto, nm_only, nm_and_shm"?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, I am not sure which way is better. I think the original author borrowed a scheme from MVAPICH. the little complexity here is that even nm_and_shm may have two variants in bcast. It is clumsy to describe in the name string. And hopefully the user never have to worry about using this CVAR (json file should take care of it). This is probably only for an experienced user or for testing/autotuning purpose.
I agree we can discuss for a better scheme later, and we can re-visit this issue after we have all the pieces merged and the clean up would be more straightforward at that time.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It doesn't need to be perfect, but some partial semantic names will be better than alpha, beta, ..., or 0, 1, 2. It is not just for users. Mostly is for the sanity of maintainers, who need to understand the code without navigating the cryptic chains.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand the concern. But to some extent the comment block above already explain the meaning of 0, 1, and 2. So it is not totally confusing. But a better scheme would certainly be desirable. If this is not urgent, we can revisit this CVAR naming scheme once we are ready.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants