Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ch4: return node_id -1 for remote processes #5440

Merged
merged 1 commit into from
Jul 20, 2021
Merged

Conversation

hzhou
Copy link
Contributor

@hzhou hzhou commented Jul 16, 2021

Pull Request Description

Remote processes, coming from different MPI_COMM_WORLD, can't use shared
memory because the necessary initialization is not done. Let's return
node_id -1 MPIDU_get_node_id as an indication that the process is
remote. This will prevent MPIR_Find_local to include remote processes as
part of local processes.

Fixes #5372

[skip warnings]

Author Checklist

  • Provide Description
    Particularly focus on why, not what. Reference background, issues, test failures, xfail entries, etc.
  • Commits Follow Good Practice
    Commits are self-contained and do not do two things at once.
    Commit message is of the form: module: short description
    Commit message explains what's in the commit.
  • Passes All Tests
    Whitespace checker. Warnings test. Additional tests via comments.
  • Contribution Agreement
    For non-Argonne authors, check contribution agreement.
    If necessary, request an explicit comment from your companies PR approval manager.

@hzhou
Copy link
Contributor Author

hzhou commented Jul 16, 2021

test:mpich/ch4/ofi

@hzhou
Copy link
Contributor Author

hzhou commented Jul 16, 2021

test:mpich/ch4/ucx

Remote processes, coming from different MPI_COMM_WORLD, can't use shared
memory because the necessary initialization is not done. Let's return
node_id -1 MPIDU_get_node_id as an indication that the process is
remote. This will prevent MPIR_Find_local to include remote processes as
part of local processes.
@hzhou hzhou merged commit bd09628 into pmodels:main Jul 20, 2021
@hzhou hzhou deleted the 2107_nodemap branch July 20, 2021 19:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

MPI_Allreduce Segmentation fault Docker
2 participants