-
Notifications
You must be signed in to change notification settings - Fork 286
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Coll: Add CVAR to select composition for a few blocking collectives #5823
Conversation
a21a07e
to
4ad0f79
Compare
test:mpich/ch4/ofi |
4ad0f79
to
f047662
Compare
rebased |
Select composition (inter_node + intra_node) for Barrier | ||
0 Auto selection | ||
1 NM + SHM | ||
2 NM only |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ahh, that makes sense. Why don't we change _alpha
and _beta
from the algorithm names to reflect this semantic directly?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, instead of int, why don't we use an enum so we can use the semantic meaningful algorithm name, e.g. "auto, nm_only, nm_and_shm"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, I am not sure which way is better. I think the original author borrowed a scheme from MVAPICH. the little complexity here is that even nm_and_shm may have two variants in bcast. It is clumsy to describe in the name string. And hopefully the user never have to worry about using this CVAR (json file should take care of it). This is probably only for an experienced user or for testing/autotuning purpose.
I agree we can discuss for a better scheme later, and we can re-visit this issue after we have all the pieces merged and the clean up would be more straightforward at that time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It doesn't need to be perfect, but some partial semantic names will be better than alpha, beta, ...
, or 0, 1, 2
. It is not just for users. Mostly is for the sanity of maintainers, who need to understand the code without navigating the cryptic chains.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understand the concern. But to some extent the comment block above already explain the meaning of 0, 1, and 2. So it is not totally confusing. But a better scheme would certainly be desirable. If this is not urgent, we can revisit this CVAR naming scheme once we are ready.
Adds CVARs to select composition for a few supported collectives including Barrier, Bcast, Allreduce, Reduce, Allgather and Alltoall.
Pull Request Description
Author Checklist
Particularly focus on why, not what. Reference background, issues, test failures, xfail entries, etc.
Commits are self-contained and do not do two things at once.
Commit message is of the form:
module: short description
Commit message explains what's in the commit.
Whitespace checker. Warnings test. Additional tests via comments.
For non-Argonne authors, check contribution agreement.
If necessary, request an explicit comment from your companies PR approval manager.