Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cleanup handling of OPAL_PREFIX vs PMIX_INSTALL_PREFIX #4052

Merged
merged 5 commits into from
Aug 15, 2017
Merged

Cleanup handling of OPAL_PREFIX vs PMIX_INSTALL_PREFIX #4052

merged 5 commits into from
Aug 15, 2017

Conversation

rhc54
Copy link
Contributor

@rhc54 rhc54 commented Aug 8, 2017

Do a better job of detecting mismatches between location directives for OPAL and PMIx. Provide a more helpful error message and error out if we find a mismatch. If any OPAL values are set and the PMIx equivalent is not, then transfer it.

Update to track PMIx v2.0.1rc

Fixes #3980
Closes #4007
Closes #3985

Signed-off-by: Ralph Castain [email protected]

@rhc54 rhc54 added the bug label Aug 8, 2017
@rhc54 rhc54 added this to the v3.0.0 milestone Aug 8, 2017
@rhc54 rhc54 self-assigned this Aug 8, 2017
@rhc54 rhc54 requested review from jjhursey and artpol84 August 8, 2017 14:15
@rhc54
Copy link
Contributor Author

rhc54 commented Aug 8, 2017

@bwbarrett This replaces the prior PR on the matter, and represents a rollup of all the discussion and proposed changes. Sorry for the confusion.

@ibm-ompi
Copy link

ibm-ompi commented Aug 8, 2017

The IBM CI (PGI Compiler) build failed! Please review the log, linked below.

Gist: https://gist.github.com/40e76ee2ddf9bc2b93c53c656874df75

@jjhursey
Copy link
Member

jjhursey commented Aug 8, 2017

I think the pgi failure might have been a file system issue. Let's retry.
bot:ibm:pgi:retest

@ibm-ompi
Copy link

ibm-ompi commented Aug 8, 2017

The IBM CI (PGI Compiler) build failed! Please review the log, linked below.

Gist: https://gist.github.com/548f9d030235992401b4dbc9551a3cc3

@jjhursey
Copy link
Member

jjhursey commented Aug 8, 2017

Humm the PGI compiler thing might be real. It looks like the pmix2x component's configure is failing in it's asm test. I'm wondering if we missed something from upstream when we ported it back to PMIx. I'll get some more information after the call.

@rhc54
Copy link
Contributor Author

rhc54 commented Aug 8, 2017

Hmmm...let me check. I updated asm in PMIx master, but I bet the changes weren't ported to the branch prior to making this update.

@jjhursey
Copy link
Member

jjhursey commented Aug 8, 2017

Here is a capture of the configure output from opal/mca/pmix/pmix2x/pmix:

*** Assembler
checking dependency style of pgcc... pgcc
checking for perl... /usr/bin/perl
checking for __atomic builtin atomics... no
checking for processor support of __atomic builtin atomic compare-and-swap on 128-bit values... no
checking for __atomic builtin atomic compare-and-swap on 128-bit values with -mcx16 flag... no
checking for __sync builtin atomics... no
checking for 64-bit __sync builtin atomics... no
checking for processor support of __sync builtin atomic compare-and-swap on 128-bit values... no
checking for __sync builtin atomic compare-and-swap on 128-bit values with -mcx16 flag... no
configure: error: __sync builtin atomics requested but not found.

@rhc54
Copy link
Contributor Author

rhc54 commented Aug 8, 2017

I just checked and the atomics on the PMIx v2.0 branch matches the PMIx master, so it looks like something from OPAL didn't come over?

@rhc54
Copy link
Contributor Author

rhc54 commented Aug 8, 2017

I didn't find any substantive change in OPAL atomics that was missing from PMIx master. However, the opal_config_asm.m4 did change - notably, it removed that error line! So I'm guessing that is the "issue" here. I'm updating PMIx master and will bring it across to the PMIx 2.0 branch so it can land here.

@rhc54
Copy link
Contributor Author

rhc54 commented Aug 8, 2017

Wait a minute - we have PR #3757 waiting which adds that error so we abort, and folks indicate they want that behavior. So I'm guessing that PMIx is actually up-to-date minus the recent renaming of the opal_atomic_init stuff (which we'll pickup in the next round when something substantive changes in the atomics). So whatever it is, it looks like there actually is something broken in the PGI-based atomics

Copy link
Member

@jjhursey jjhursey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Until we resolve the PMIx+PGI issue I'll put a hold on this. It's otherwise fine.

Copy link
Member

@jjhursey jjhursey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now that we sorted out the PGI CI issue, I think this is good to go.

@artpol84
Copy link
Contributor

@rhc54 should 71da0fc go here then?

@artpol84
Copy link
Contributor

Or it is not needed anymore?

@rhc54
Copy link
Contributor Author

rhc54 commented Aug 14, 2017

We should probably revert 71da0fc from master to be consistent - we want to handle the OPAL_PREFIX case for the internal PMIx code, but leave anything else alone.

artpol84 added a commit to artpol84/ompi that referenced this pull request Aug 14, 2017
artpol84 added a commit to artpol84/ompi that referenced this pull request Aug 14, 2017
This reverts commit 71da0fc.
(per open-mpi#4052).
Refs: open-mpi#3980

Signed-off-by: Artem Polyakov <[email protected]>
Ralph Castain added 5 commits August 14, 2017 16:47
Signed-off-by: Ralph Castain <[email protected]>
…hes between location directives for OPAL and PMIx. Provide a more helpful error message and error out if we find a mismatch. If any OPAL values are set and the PMIx equivalent is not, then transfer it.

Do not clear PMIX_INSTALL_PREFIX from the daemon's launch environment

Fixes #3980
Closes #4007
Refs #3985

Signed-off-by: Ralph Castain <[email protected]>
(cherry picked from commit a239b4c)
Signed-off-by: Ralph Castain <[email protected]>
(cherry picked from commit 53c9270)
Signed-off-by: Ralph Castain <[email protected]>
@open-mpi open-mpi deleted a comment from ibm-ompi Aug 15, 2017
@open-mpi open-mpi deleted a comment from ibm-ompi Aug 15, 2017
@open-mpi open-mpi deleted a comment from ibm-ompi Aug 15, 2017
@rhc54
Copy link
Contributor Author

rhc54 commented Aug 15, 2017

MTT results:

+-------------+-----------------+-------------+----------+------+------+----------+------+---------------------------------------------------------------------------+
| Phase       | Section         | MPI Version | Duration | Pass | Fail | Time out | Skip | Detailed report                                                           |
+-------------+-----------------+-------------+----------+------+------+----------+------+---------------------------------------------------------------------------+
| MPI Install | my installation | 3.0.0rc3    | 00:00    | 1    |      |          |      | MPI_Install-my_installation-my_installation-3.0.0rc3-my_installation.html |
| Test Build  | trivial         | 3.0.0rc3    | 00:02    | 1    |      |          |      | Test_Build-trivial-my_installation-3.0.0rc3-my_installation.html          |
| Test Build  | ibm             | 3.0.0rc3    | 00:48    | 1    |      |          |      | Test_Build-ibm-my_installation-3.0.0rc3-my_installation.html              |
| Test Build  | intel           | 3.0.0rc3    | 01:19    | 1    |      |          |      | Test_Build-intel-my_installation-3.0.0rc3-my_installation.html            |
| Test Build  | java            | 3.0.0rc3    | 00:02    | 1    |      |          |      | Test_Build-java-my_installation-3.0.0rc3-my_installation.html             |
| Test Build  | orte            | 3.0.0rc3    | 00:01    | 1    |      |          |      | Test_Build-orte-my_installation-3.0.0rc3-my_installation.html             |
| Test Run    | trivial         | 3.0.0rc3    | 00:06    | 8    |      |          |      | Test_Run-trivial-my_installation-3.0.0rc3-my_installation.html            |
| Test Run    | ibm             | 3.0.0rc3    | 10:05    | 508  |      |          |      | Test_Run-ibm-my_installation-3.0.0rc3-my_installation.html                |
| Test Run    | spawn           | 3.0.0rc3    | 00:09    | 7    |      |          | 1    | Test_Run-spawn-my_installation-3.0.0rc3-my_installation.html              |
| Test Run    | loopspawn       | 3.0.0rc3    | 10:06    | 1    |      |          |      | Test_Run-loopspawn-my_installation-3.0.0rc3-my_installation.html          |
| Test Run    | intel           | 3.0.0rc3    | 16:42    | 474  |      |          | 4    | Test_Run-intel-my_installation-3.0.0rc3-my_installation.html              |
| Test Run    | intel_skip      | 3.0.0rc3    | 12:59    | 431  |      |          | 47   | Test_Run-intel_skip-my_installation-3.0.0rc3-my_installation.html         |
| Test Run    | java            | 3.0.0rc3    | 00:01    | 1    |      |          |      | Test_Run-java-my_installation-3.0.0rc3-my_installation.html               |
| Test Run    | orte            | 3.0.0rc3    | 00:43    | 19   |      |          |      | Test_Run-orte-my_installation-3.0.0rc3-my_installation.html               |
+-------------+-----------------+-------------+----------+------+------+----------+------+---------------------------------------------------------------------------+


    Total Tests:    1455
    Total Failures: 0
    Total Passed:   1455
    Total Duration: 3183 secs. (53:03)

@bwbarrett bwbarrett merged commit 0ca7c77 into open-mpi:v3.0.x Aug 15, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants