Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OMPI v3.0: pmix install dirs info is not propagated to orteds #3980

Closed
artpol84 opened this issue Jul 30, 2017 · 32 comments
Closed

OMPI v3.0: pmix install dirs info is not propagated to orteds #3980

artpol84 opened this issue Jul 30, 2017 · 32 comments

Comments

@artpol84
Copy link
Contributor

artpol84 commented Jul 30, 2017

Open MPI version

gitclone ompi v3.0.x (2f13cce)

Details of the problem

If OMPI installation is moved to a different location OPAL_PREFIX env var is used to identify that and installdirs/env is handling this correctly.
PMIx also has MCA infrastructure and similar installdirs component. in PMIx PMIX_INSTALL_PREFIX playing the role of OPAL_PREFIX. The important difference is that OPAL_PREFIX is an OMPI variable and it gets propagated to orte daemons, i.e.:

[cn01:17570] [[10714,0],0] plm:rsh: final template argv:
/usr/bin/ssh <template> \
	OPAL_PREFIX=<ompi-path>/ompi-v3.0.x ; export OPAL_PREFIX; \
	PATH=<ompi-path>/ompi-v3.0.x/bin:$PATH ; export PATH ; \
	LD_LIBRARY_PATH=<ompi-path>/ompi-v3.0.x/lib:$LD_LIBRARY_PATH ; export LD_LIBRARY_PATH ; \
	DYLD_LIBRARY_PATH=<ompi-path>/ompi-v3.0.x/lib:$DYLD_LIBRARY_PATH ; export DYLD_LIBRARY_PATH ;   \
	<ompi-path>/ompi-v3.0.x/bin/orted -mca orte_debug_daemons "1" -mca ess "env" -mca ess_base_jobid "702152704" \
	-mca ess_base_vpid "<template>" -mca ess_base_num_procs "2" -mca orte_node_regex "cn01,cn02@0(2)" \
	-mca orte_hnp_uri "702152704.0;tcp://<IP1>,<IP2>;ud://<UD>" -mca coll_hcoll_enable "1" -mca pml "yalla" \
	--mca plm_base_verbose "100" -mca plm "rsh" -mca rmaps_base_mapping_policy "node" \
	-mca hwloc_base_binding_policy "core" -mca rmaps_base_display_map "1"

As it can be seen there PMIX_INSTALL_PREFIX is not propagated so orteds are failing with:

--------------------------------------------------------------------------
Sorry!  You were supposed to get help about:
    listener-thread-start
But I couldn't open the help file:
    <OLD-INSTALLATION_PATH>/ompi-v3.0.x/share/pmix/help-pmix-server.txt: No such file or directory.  Sorry!

Because they can't find PMIx mca ptl components.

@artpol84
Copy link
Contributor Author

artpol84 commented Jul 30, 2017

@rhc54 I know that on the dev meeting we had discussion about variable propagation. Do we have any news on that?

@artpol84 artpol84 changed the title OMPI v3.0+ pmix install dirs info is not propagated to orteds OMPI v3.0: pmix install dirs info is not propagated to orteds Jul 30, 2017
@artpol84
Copy link
Contributor Author

One of the obvious solutions is to check if OPAL_PREFIX is set in pmix2x_server_init and if PMIX_INSTALL_PREFIX is not set - export necessary PMIX variables.
Something like this may needs to be done for the client as well.
However I wonder if there is more generic approach to this.

@rhc54
Copy link
Contributor

rhc54 commented Jul 31, 2017

We said at the meeting that we would create some kind of registration mechanism to cover these things, but that won't be ready for awhile and certainly wouldn't go into 3.0. For now, the only real solution is to manually do these things in the schizo/ompi component. I can add some code to cover it.

@artpol84
Copy link
Contributor Author

Ok, thank you.

@rhc54
Copy link
Contributor

rhc54 commented Jul 31, 2017

Actually, one correction: we only forward OPAL_PREFIX for the plm/rsh component. So the addition will occur there.

@artpol84
Copy link
Contributor Author

I only checked with pml/rsh but I it would be good to make sure that others are working fine as well.

@rhc54
Copy link
Contributor

rhc54 commented Jul 31, 2017

There actually isn't any way to forward something in the other methods - we have to rely solely on the configuration of the resource manager. Usually that config will forward nearly everything - i.e., the config generally doesn't scan for particular envar patterns like OPAL_. However, you are correct that users know to set the OPAL envar and expect it to cover everything. Probably the easiest solution is to just special case it and see if the daemon spots it, and then add the corresponding envar.

Ugly, but likely the only current solution.

@artpol84
Copy link
Contributor Author

artpol84 commented Aug 1, 2017

To clarify: in the issue description PMIX_INSTALL_PREFIX variable was manually set by me.

@gpaulsen
Copy link
Member

gpaulsen commented Aug 1, 2017

Discussed at the call today. Adding Blocker label to this for v3.0

Expect there is a similar issue with hwloc components directory as well, except that hwloc doesn't build mca components by default (?)

@artpol84
Copy link
Contributor Author

artpol84 commented Aug 1, 2017

@bwbarrett we have some discussion of the current solution in #3985.

@gpaulsen
Copy link
Member

gpaulsen commented Aug 1, 2017

What about a convention that the PMIX mca components and the HWLOC mca components are always in a known relative subdirectory of the OPAL_PREFIX IF the specific PMIX_PREFIX or HWLOC_PREFIX (if that's a thing) isn't set?
For example if there is ONLY OPAL_PREFIX set, and no other relevant env variables set, then look for PMIX mca components in OPAL_PREFIX/lib//.
This is a bit weird to put OPAL PREFIX in the hwloc and pmix component loading code like this, but it would be a last case.

@artpol84
Copy link
Contributor Author

artpol84 commented Aug 1, 2017

This is fine unless we are using external components.

@artpol84
Copy link
Contributor Author

artpol84 commented Aug 1, 2017

Here is the update to a problem description as it seems that there might be a misunderstanding:

We have 2 problems now:
a) At least not all PMIX_ variables are propagated
b) Some PMIX_ variables should be set automatically (derived from OPAL_) if we are using internal PMIx for better user experience.

This issue is mostly about a) but it is fint to fix b) as part of it.

With respect to a) here is what I observe:

export PMIX_INSTALL_PREFIX mpirun -np 2 -H cn01 hostname

Will work fine as PMIX_INSTALL_PREFIX helps pmix to find its components.

export PMIX_INSTALL_PREFIX mpirun -np 2 -H cn01,cn02 hostname

is not working because OMPI does not propagate PMIX_INSTALL_PREFIX to the orted on cn02. As per original issue description:

[cn01:17570] [[10714,0],0] plm:rsh: final template argv:
/usr/bin/ssh <template> \
	OPAL_PREFIX=<ompi-path>/ompi-v3.0.x ; export OPAL_PREFIX; \
	PATH=<ompi-path>/ompi-v3.0.x/bin:$PATH ; export PATH ; \
	LD_LIBRARY_PATH=<ompi-path>/ompi-v3.0.x/lib:$LD_LIBRARY_PATH ; export LD_LIBRARY_PATH ; \
	DYLD_LIBRARY_PATH=<ompi-path>/ompi-v3.0.x/lib:$DYLD_LIBRARY_PATH ; export DYLD_LIBRARY_PATH ;   \
	<ompi-path>/ompi-v3.0.x/bin/orted -mca orte_debug_daemons "1" -mca ess "env" -mca ess_base_jobid "702152704" \
	-mca ess_base_vpid "<template>" -mca ess_base_num_procs "2" -mca orte_node_regex "cn01,cn02@0(2)" \
	-mca orte_hnp_uri "702152704.0;tcp://<IP1>,<IP2>;ud://<UD>" -mca coll_hcoll_enable "1" -mca pml "yalla" \
	--mca plm_base_verbose "100" -mca plm "rsh" -mca rmaps_base_mapping_policy "node" \
	-mca hwloc_base_binding_policy "core" -mca rmaps_base_display_map "1"

@rhc54
Copy link
Contributor

rhc54 commented Aug 2, 2017

Whoa there, partners - you are blowing this way out of proportion. First off, it is the users responsibility to set the paths for both OMPI and any secondary libraries on the backend nodes. We cannot take responsibility for forwarding paths for everything - the command line has stringent length limits.

There was a lot of argument about excepting OPAL_PREFIX back when we first did it, and it wasn't clear that we should be doing so as it became an exception to the rule. However, it was felt that it was convenient enough - and a special enough use-case - to warrant making an exception. The most compelling rationale was that it dealt specifically with OMPI internal libraries, and so we were only supporting what we ship.

HWLOC doesn't have any plugins, and so it doesn't need "prefix" support. Ditto for libevent. So please leave them out of this discussion.

It isn't clear to me at all that we should be forwarding PMIX_INSTALL_DIR for someone that is using an external PMIx library. This isn't an OMPI library issue - it is the users responsibility to ensure the backend is correctly setup.

The only issue we have is that the user community expects OPAL_PREFIX to result in a fully functional OMPI installation. This therefore must include pointing to the location of the internal PMIx plugin directory, which is based on OPAL_PREFIX. Since the PMIx library cannot see OPAL_PREFIX, we must manually make the translation to ensure the user gets what they expect.

So again, to be clear: we bear no responsibility for forwarding things on behalf of external libraries, else we would need to do so for UCX, PSM, and every other library we link against. We only bear responsibility to ensure that the OMPI-included libraries work correctly together when someone sets OPAL_PREFIX. My commit does that - I see no reason for us to be doing anything different.

@artpol84
Copy link
Contributor Author

artpol84 commented Aug 2, 2017

@rhc54 one comment

It isn't clear to me at all that we should be forwarding PMIX_INSTALL_DIR for someone that is 
using an external PMIx library. This isn't an OMPI library issue - it is the users responsibility to 
ensure the backend is correctly setup.

I'm not sure I understand. If we don't forward PMIX env how user can communicate this to the server-side part of the external PMIx sitting in orted's.

@rhc54
Copy link
Contributor

rhc54 commented Aug 2, 2017

We always require that users take responsibility for ensuring that their environment (path and library path) is properly setup on the backend to reach any libraries they linked against OMPI. This includes external copies of hwloc, libevent, and pmix. If they installed their external pmix in a location that requires PMIX_INSTALL_DIR, then they need to set that in their backend environment - if they are using rsh, the normal method is to put it in their bashrc or equivalent.

We only take care of our own internal code. In this case, our concern has to be that setting OPAL_PREFIX should ensure that all our internal code is pointed to the correct locations. So it makes sense that we set PMIX_INSTALL_DIR under the covers when OPAL_PREFIX has been set.

The only caveats to this hinges on what happens if the user has PMIX_INSTALL_DIR set in their environment for some other reason (e.g., when working with something that used an external PMIx library). There are two cases of concern here:

  1. if set when we build OMPI, will this cause our internal PMIx plugins to be relocated? If so, that could pollute their external PMIx library, which would be really bad. I don't know if this will happen - might be worth checking, though it may be a moot point.

  2. assuming our internal PMIx plugins went to the right place, will this cause us to look in the wrong place for them when OMPI is executed? In other words, say PMIX_INSTALL_DIR was not set when we configured/built OMPI, and so our plugins go where they should. Now suppose the user sets that envar and runs their application - will we look in the wrong place? Again, that would be bad, though at least we didn't overwrite anything. I am pretty sure this will happen, though someone is welcome to check.

Given those situations, I'm beginning to think that we should do the following when we execute:

  • if OPAL_PREFIX is set and we built the internal PMIx library, then we should automatically set PMIX_INSTALL_DIR to the OPAL_PREFIX location and override anything set in the users environment. This should be propagated

  • if OPAL_PREFIX is set and we built the external PMIx library, then do nothing - it is their responsibility

  • if OPAL_PREFIX is not set and PMIX_INSTALL_DIR is not set, then do nothing

  • if OPAL_PREFIX is not set and PMIX_INSTALL_DIR is set, and we built the internal PMIx library, then override their PMIX_INSTALL_DIR envar with the OPAL_PREFIX value to maintain consistency.We probably should generate a warning (with a param to silence it) in case their application was based on some other PMIx installation just so they know we are redirecting them. If we didn't build the internal PMIx library, then do nothing.

I suppose one could argue about direct launch vs mpirun for that last case, but the problem is that the app will still be linked against libopen-pal, which points to our internal PMIx library. So letting the direct launched app grab plugins from some other library is going to lead to trouble.

At configure time, we should check to see if PMIX_INSTALL_DIR is set. If we are building the internal PMIx library, then we should error out with a message indicating that this will cause problems.

HTH

@rhc54
Copy link
Contributor

rhc54 commented Aug 2, 2017

BTW: that last point should only be done if having PMIX_INSTALL_DIR set actually causes us to relocate the PMIx plugins when we build the internal PMIx library. If not, then we can just ignore that envar.

@jjhursey
Copy link
Member

jjhursey commented Aug 2, 2017

@rhc54 I don't see a PMIX_INSTALL_DIR in either repo. I think you mean PMIX_INSTALL_PREFIX. Just for clarity in the conversation. It doesn't change your argument.

@artpol84
Copy link
Contributor Author

artpol84 commented Aug 2, 2017

if set when we build OMPI, will this cause our internal PMIx plugins to be relocated? 
If so, that could pollute their external PMIx library, which would be really bad. I don't
know if this will happen - might be worth checking, though it may be a moot point.

I believe that PMIX_INSTALL_PREFIX makes sense only to the installdirs plugin so it shouldn't affect the build/installation process.

@artpol84
Copy link
Contributor Author

artpol84 commented Aug 2, 2017

if OPAL_PREFIX is set and we built the internal PMIx library, then we should automatically set 
PMIX_INSTALL_DIR to the OPAL_PREFIX location and override anything set in the users environment. 
This should be propagated

I agree with that,the recent commit does that differently:
https://github.com/open-mpi/ompi/blob/master/opal/mca/pmix/pmix2x/pmix2x_server_south.c#L113
I will open another PR to fix that.

@jjhursey
Copy link
Member

jjhursey commented Aug 2, 2017

Our (IBM) original thinking on this topic was:

If OPAL_PREFIX is set and PMIX_INSTALL_PREFIX is not set then set PMIX_INSTALL_PREFIX = OPAL_PREFIX. Which assumes that they installed PMIX in the same prefix as Open MPI.

The argument against this would be that if there is a PMIx installed in the default search path, then we don't want to force the sysadmin to set PMIX_INSTALL_PREFIX just to keep us from doing so. So I'm ok with pushing that responsibility to the end user, and we can work around this.

In any case we should be propagating PMIX_INSTALL_PREFIX if it is set in the mpirun environment. And if the user wants PMIX_INSTALL_PREFIX = OPAL_PREFIX then they can set it before launching mpirun.

Do we also need to propagate PMIX_LIBDIR if it is set?
This is where my recall is rusty - if we set PMIX_INSTALL_PREFIX will PMIx search for components and libraries under ${PMIX_INSTALL_PREFIX}/lib/pmix or does PMIX_LIBDIR also need to be set to ${PMIX_INSTALL_PREFIX}/lib/ to have it search correctly?

@artpol84
Copy link
Contributor Author

artpol84 commented Aug 2, 2017

@jjhursey

This is where my recall is rusty - if we set PMIX_INSTALL_PREFIX will PMIx search for components 
and libraries under ${PMIX_INSTALL_PREFIX}/lib/pmix or does PMIX_LIBDIR also need to be set 
to ${PMIX_INSTALL_PREFIX}/lib/ to have it search correctly?

install prefix is enough - I verified that at runtime.

However I'm not sure I understand your point about the handling of existing/nonexisting install prefix.
I was going to do the following: #4007. Does that sounds ok with everybody?

@artpol84
Copy link
Contributor Author

artpol84 commented Aug 2, 2017

@jjhursey re-reading your comment it seems like you suggesting to revert all the changes in pmix2 component (that auto-set `PMIx install prefix) and keep only the rsh part that propagates install prefix envar.
Is that correct?

@rhc54
Copy link
Contributor

rhc54 commented Aug 2, 2017 via email

@jjhursey
Copy link
Member

jjhursey commented Aug 4, 2017

Some notes from our teleconf.

Case A: Internal PMIx component

  • If any PMIX_ installdir value is set (regardless of OPAL_PREFIX)
    • Error out with a descriptive message
      • Make sure to include values of all PMIX_ installdir values (since, for example, PMIX_SHAREDSTATEDIR might be set in an odd way as well). This way a user can unset these values if, for example, the sysadmin on the system accidentally set one.
    • Doing anything more involved here will make the pmix internal component code pretty complex, and we will start adding lots of MCA parameters for what is really a corner installation case.
    • The end user has work arounds:
      • Rebuild with the external PMIx component pointing to the right thing.
      • Unset those environment variables in their startup environment scripts on that machine.
      • Buy the sysadmin coffee and ask them to adjust the environment variables.
  • If OPAL_PREFIX is not set and PMIX_INSTALL_PREFIX is not set
    • Do nothing
  • If OPAL_PREFIX is set and PMIX_INSTALL_PREFIX is not set
    • Set PMIX_INSTALL_PREFIX = OPAL_PREFIX

Action Item: Ralph is working on a PR for these cases. Will be an extension of the work started in PR #4012.

Case B: External PMIx component

  • User is responsible for setting PMIX_INSTALL_PREFIX and friends in their environment.
  • Add an MCA parameter to the external component that allows the user to:
    • Set PMIX_INSTALL_PREFIX = OPAL_PREFIX
    • Set PMIX_INSTALL_PREFIX to another path.
  • MCA parameter will only handle PMIX_INSTALL_PREFIX, if the user needs more fine grained control over the other installdir variables then they need to adjust their environment directly.

Action Item: Josh is Working on a PR for the external component MCA parameter to setPMIX_INSTALL_PREFIX per above.

Slightly better error checking for bad installations

  • This all started by PMIx issuing a cryptic error message about not being able to start the listener thread. This was because it could not find any PTL components. The framework should have issued an error message during selection.
  • Other (All?) PMIx frameworks should have logic in their component section such that they error out with an informative message if they have no eligible components.
  • Additionally some deeper checks for both OMPI and PMIx were suggested:
    • If OPAL_PREFIX is pointing to a directory that does not exist then print an error message.
    • If none of the component search paths exist then print an error message
      • If any one of the paths exist, then we are ok (frameworks responsible to tell if they have enough components).
      • This check only happens in component_open, so not exercised for static builds.

Action item: Ralph will work on PRs for the error handling bullets above.

@rhc54 @artpol84 Did I miss anything?

@jjhursey
Copy link
Member

jjhursey commented Aug 5, 2017

Note: I did get started on the external component MCA parameter. There are a couple snags that I'll need to come back to. Plan to have more on Monday.

@artpol84
Copy link
Contributor Author

The issue was closed incorrectly.
@rhc54 as I understand we close the issue when all targets are addressed.
Not all of the masters changes are PR'd - you don't seem to cherry-pick 71da0fc into #3985

@artpol84 artpol84 reopened this Aug 13, 2017
@artpol84
Copy link
Contributor Author

artpol84 commented Aug 13, 2017

@jjhursey has your external component changes went in? If not - I guess that this issue is not fully solved in master as well

@rhc54
Copy link
Contributor

rhc54 commented Aug 13, 2017

There is nothing to do in the external components. OPAL_PREFIX doesn't impact them, and it is the user's responsibility to propagate any PMIx prefix requirements.

@artpol84
Copy link
Contributor Author

artpol84 commented Aug 13, 2017

I have a question on that btw.
User responsibility to setup environment on compute nodes is ok for the single environment.
The regular way to manage multiple environment is modules. In this case you usually load the module on the head or launch node and that is all you can do. I don't see the way to recreate this environment on all other nodes other than propagate it. OMPI provides -x option to propagate arbitrary env to application procs, but it seems that now we need a way to propagate to orteds as well.
Slurm will do this, but rsh - won't.

@rhc54
Copy link
Contributor

rhc54 commented Aug 14, 2017

There are many things that one environment provides and others don't, so this isn't the only difference users encounter. All managed systems generally pass a fair range of envars, depending on configuration, so most of those will likely be okay. Users running on unmanaged systems are most likely to build against the internal PMIx, or against one installed by the system that is going to be in a standard location - and thus they should also be okay.

The corner-case scenarios will undoubtedly appear, but we can't solve everything - the cmd line limitations won't let us.

artpol84 added a commit to artpol84/ompi that referenced this issue Aug 14, 2017
This reverts commit 71da0fc.
(per open-mpi#4052).
Refs: open-mpi#3980

Signed-off-by: Artem Polyakov <[email protected]>
@bwbarrett
Copy link
Member

Merged #4076, so closing this ticket.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants