Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Address Github issue #11532 by translating legacy parameters for direct launches #11854

Merged
merged 1 commit into from
Sep 5, 2023

Conversation

qkoziol
Copy link
Contributor

@qkoziol qkoziol commented Aug 14, 2023

Borrow code from the OMPI schizo module in PRRTE that translates legacy MCA parameters when an application is direct launched (PRRTE will translate legacy parameters when natively launched).

Addresses issue #11532

@qkoziol
Copy link
Contributor Author

qkoziol commented Aug 14, 2023

Requires PRRTE PR: openpmix/prrte#1777 and OpenPMIX PR: openpmix/openpmix#3133

@gpaulsen
Copy link
Member

bot:ibm:retest

@rhc54
Copy link
Contributor

rhc54 commented Aug 15, 2023

Hmmmm....why would an MPI process care about PRRTE params? It has no involvement with PRRTE, so the params will just be ignored.

You certainly would want the PMIx values as the MPI proc uses that library - but I cannot see any reason to handle PRRTE params.

@qkoziol
Copy link
Contributor Author

qkoziol commented Aug 15, 2023

Hmmmm....why would an MPI process care about PRRTE params? It has no involvement with PRRTE, so the params will just be ignored.

You certainly would want the PMIx values as the MPI proc uses that library - but I cannot see any reason to handle PRRTE params.

OK - I had a feeling that might be the case. I'll update this PR to just look for PMIX params and remove the PRRTE part. I'll also update the PR for PRRTE to just have the refactor of the schizo ompi file and remove the other work that creates the PRRTE framework header.

@qkoziol qkoziol force-pushed the issue_11532 branch 2 times, most recently from b587628 to f552b2c Compare August 15, 2023 17:48
@qkoziol
Copy link
Contributor Author

qkoziol commented Aug 15, 2023

Revised to just check PMIX params, and to use the existing PMIX framework header.

@qkoziol
Copy link
Contributor Author

qkoziol commented Aug 15, 2023

bot:nvidia:retest

1 similar comment
@lrbison
Copy link
Contributor

lrbison commented Aug 17, 2023

bot:nvidia:retest

@lrbison
Copy link
Contributor

lrbison commented Aug 17, 2023

CI cannot find the file:

base/pmix_base_fns.c:43:10: fatal error: pmix/src/include/pmix_frameworks.h: No such file or directory
   43 | #include "pmix/src/include/pmix_frameworks.h"
      |          ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
compilation terminated.

I saw in openpmix/openpmix#3133 we expected it to be in ${PREFIX}/include/pmix/src/include. I'm not sure what the disconnect is here, do we search PREFIX?

@qkoziol
Copy link
Contributor Author

qkoziol commented Aug 17, 2023

CI cannot find the file:

base/pmix_base_fns.c:43:10: fatal error: pmix/src/include/pmix_frameworks.h: No such file or directory
   43 | #include "pmix/src/include/pmix_frameworks.h"
      |          ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
compilation terminated.

I saw in openpmix/openpmix#3133 we expected it to be in ${PREFIX}/include/pmix/src/include. I'm not sure what the disconnect is here, do we search PREFIX?

🤔 It's installed when I compile against the latest OpenPMIX. I'll try against the internal PMIX submodule and see what I find. We might need to bump the submodule pointer?

@lrbison
Copy link
Contributor

lrbison commented Aug 17, 2023

It looks like pmix_frameworks.h should be generated by autogen in openpmix.

https://github.com/openpmix/openpmix/blob/9b09a2d3ee67d2dd1009dd732a629247a7a23087/autogen.pl#L292

That link should correspond to the pinned version on ompi main, in fact we are currently on the commit that introduced that change (openpmix/openpmix@22fe51c)

@rhc54
Copy link
Contributor

rhc54 commented Aug 18, 2023

43 | #include "pmix/src/include/pmix_frameworks.h"

You are probably hitting an issue between when you build OMPI against an internal vs external version of PMIx. The external reference should be just src/include/pmix_frameworks.h. The internal reference would be pmix/src/include/pmix_frameworks.h unless you are adding a -I option that takes care of the pmix/ prefix for you.

@rhc54
Copy link
Contributor

rhc54 commented Aug 18, 2023

Yeah, I confirmed that the problem is in the include statement - it should be:

#include "src/include/pmix_frameworks.h"

Your configure code already adds the necessary -I flag to point you at the correct place. It was working for @qkoziol solely because there is another -I that points to the OMPI include location that allowed the pmix/src/include path to be resolved. Doesn't work if you use an external PMIx installation.

…for direct launches

Borrow code from the OMPI schizo module in PRRTE that translates legacy
MCA parameters when an application is direct launched (PRRTE will translate
legacy parameters when natively launched).

Signed-off-by: Quincey Koziol <[email protected]>
@awlauria
Copy link
Contributor

@rhc54 could you review this ?

@qkoziol qkoziol merged commit 3519822 into open-mpi:main Sep 5, 2023
@qkoziol qkoziol deleted the issue_11532 branch September 5, 2023 14:49
@janjust
Copy link
Contributor

janjust commented Sep 7, 2023

@qkoziol can you open up the v5.0 cherrypick

@qkoziol
Copy link
Contributor Author

qkoziol commented Sep 9, 2023

@qkoziol can you open up the v5.0 cherrypick

#11921

hppritcha added a commit to hppritcha/ompi that referenced this pull request Jan 18, 2024
PR open-mpi#11854 introduced a big that causes singleton runs to
segfault at startup in some cases

this bug was rooted out by the github action in
PR open-mpi#12217

Signed-off-by: Howard Pritchard <[email protected]>
hppritcha added a commit to hppritcha/ompi that referenced this pull request Jan 18, 2024
PR open-mpi#11854 introduced a big that causes singleton runs to
segfault at startup in some cases

this bug was rooted out by the github action in
PR open-mpi#12217

Signed-off-by: Howard Pritchard <[email protected]>
hppritcha added a commit to hppritcha/ompi that referenced this pull request Jan 19, 2024
PR open-mpi#11854 introduced a big that causes singleton runs to
segfault at startup in some cases

this bug was rooted out by the github action in
PR open-mpi#12217

Signed-off-by: Howard Pritchard <[email protected]>
hppritcha added a commit to hppritcha/ompi that referenced this pull request Jan 19, 2024
PR open-mpi#11854 introduced a big that causes singleton runs to
segfault at startup in some cases

this bug was rooted out by the github action in
PR open-mpi#12217

Signed-off-by: Howard Pritchard <[email protected]>
(cherry picked from commit 86a05c1)
hppritcha added a commit to hppritcha/ompi that referenced this pull request Jan 23, 2024
PR open-mpi#11854 introduced a big that causes singleton runs to
segfault at startup in some cases

this bug was rooted out by the github action in
PR open-mpi#12217

Signed-off-by: Howard Pritchard <[email protected]>
(cherry picked from commit 86a05c1)

bot:notacherrypick
hppritcha added a commit to hppritcha/ompi that referenced this pull request Jan 23, 2024
PR open-mpi#11854 introduced a big that causes singleton runs to
segfault at startup in some cases

this bug was rooted out by the github action in
PR open-mpi#12217

Signed-off-by: Howard Pritchard <[email protected]>
(cherry picked from commit 86a05c1)

bot:notacherrypick
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants