Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue 175 chapter 5 8 rebase #387

Closed
wants to merge 22 commits into from

Conversation

dsolt
Copy link
Contributor

@dsolt dsolt commented Jan 19, 2022

This very large change set covers a thorough review of chapters 5-8 of the PMIx standard. The resulting text actually creates new chapters, renames chapters and moves content between chapters. Our intention is to divide this PR up into at least 3 separate PR's that will be easier to digest.


Host environments may also opt to define their own custom keys. However, \ac{PMIx} implementations are unlikely to recognize such host-defined keys and will therefore treat them according to the \emph{non-reserved} rules described in Chapter \ref{chap:nrkeys}. Users
Host environments may opt to define non-standardized reserved keys.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you saying a host environment is allowed to define a key that violates the "pmix" prefix rule - i.e., it starts with "pmix"?? This seems like a risky decision as there is no way to guarantee it won't conflict with something in an implementation. I would strongly advise against it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We thought that we were simplifying the existing text. It was our understanding that "custom keys" was referring to "non-standardized reserved keys". We took it to mean that because this whole selection is talking about reserved keys. I guess you are saying that the old text assumes that jsm_foo or srun_bar are custom key which are not reserved (the definition of reserved is clear... starts with "pmix_") but they are made available at the start of execution and cannot be set or changed by the client? We assumed that custom keys would be something like "pmix_jsm_foo" or "pmix_srun_bar", though I can see now that the text about "for a list of any custom prefixes" gave a hint to what was being said that we missed.

Assuming my above explanation is accurate, then we have a few choices... 1) We can revert back to the old text 2) We can explain the old text better to avoid the confusion we experienced in thinking a custom-key was a non-standardized reserved key 3) We can move in the direction of the simplified explanation we gave possibly adding some a warning that using non-standarized reserved keys is dangerous because it may conflict with future standardization.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, it probably needs clarification. "Custom keys" cannot start with "pmix". The point I was trying to make is that PMIx itself won't understand them, so you cannot use them to direct PMIx behavior. Pretty much like any non-reserved key, in hindsight, so maybe just saying that "custom keys must be non-reserved" (in better wording) would suffice.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll just add that, the new text has the problem that non-standardized keys beginning with pmix_ can be used by implementations and 1) that could be confusing for user's who would expect them to be able to look them up in the standard 2) they might get defined by the standard later to behave differently than the implementation originally used them.
The old text has the problem that 1) It attempts to define this class of keys that are neither reserved nor non-reserved. They don't follow reserved retrieval rules, but they shouldn't be set by the user either. It's really a not well defined set of keys 2) User's have to constantly be on the look for conflicts with implementations. Their old code may not run on a new version of the host environment because the host environment added a custom key that conflicted with the client's use of a non-reserved key.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps the best way to describe this is:

  • host environments, just like applications, may define their own non-reserved keys
  • such keys must obey the non-reserved key rules regarding prefixing, and will be retrieved based on the rules for non-reserved keys
  • host environments and applications are strongly advised to use appropriate prefixing on any non-reserved keys they define. You can even provide some guidance - e.g., it wouldn't be very smart for LSF to define keys starting with "ompi". It would, however, make sense for them to define keys starting with "lsf".

Users don't have to worry about conflicts with implementations as any PMIx implementation should be sticking with the reserved keys. You do indeed have to worry about conflicts with the host environment - but that is why users should be sensibly prefixing their keys. For example, you might expect an app named "foo" to prefix any key they define with "foo". Or to do like MPICH does and stick process IDs at the beginning.

As for an app not defining a value for a host environment key - I'm not sure we can/should say anything about it. Some environments might well define a key for apps to use when, for example, communicating a request to the RM. I don't see why PMIx cares either way. If the environment prohibits it, then they (a) should say so and (b) protect themselves against the inevitable violation of that prohibition.

@@ -118,7 +118,8 @@ \subsection{Session realm attributes}
}
%
\declareAttribute{PMIX_TDIR_RMCLEAN}{"pmix.tdir.rmclean"}{bool}{
Resource Manager will cleanup assigned temporary directory trees.
The Resource Manager will remove any directories or files it creates in
\refattr{PMIX_TMPDIR}.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not quite correct, I believe. It means the RM will cleanup all the dirs and/or files that are placed in the PMIx session directories. In other words, the RM is going to cleanup the PMIx temp directory tree (i.e., the files PMIx creates, not the RM), so PMIx itself doesn't need to worry about it.

Copy link
Contributor Author

@dsolt dsolt Jan 24, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We were going for something more definitive than "cleanup", but admit we were not sure what exactly happens. Would this be better (or can you suggest what would be most accurate?):

The Resource Manager will remove any directories or files in \refattr{PMIX_TMPDIR}.

The RM doesn't know which files PMIx placed there, it only knows which files it didn't create. So maybe this is more accurate:

The Resource Manager will remove any directories or files in \refattr{PMIX_TMPDIR} that are not managed by the Resource Manager.

It might be nice to say when this happens too, but I'm not sure when it does, so I don't want to guess at something.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So what happens is this: the host environment passes down the PMIX_TMPDIR which gives the PMIx server a root directory that is known "safe". The server then puts things like its shared memory segment and rendezvous files under that location. When the server finalizes, it normally will delete everything it put under PMIX_TMPDIR.

What this attribute declares is that the RM (i.e., the host) will delete this for us, usually by deleting the entire PMIX_TMPDIR directory tree. Thus, the PMIx server doesn't bother to delete anything - it just finalizes, leaving whatever files were created there.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So should we say, "The Resource Manager will remove the directory \refattr{PMIX_TMPDIR} at server finalization time"?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well...not exactly. The RM will remove the directory whenever it decides to do it. It probably would do it once it finalizes the PMIx server library - but that's totally at their discretion. I would just leave it as "the RM will remove the directory".

Implementations are encouraged to standardize any non-standardized, reserved keys that they provide to avoid conflicting with other implementations or efforts to standardize the same key.

Host environments are permitted to provide non-reserved keys and non-standardized reserved keys.
However there are important caveats to providing either of these categories of keys
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Err - I don't think this is a good idea. High potential for conflicts between the host and implementation on reserved keys. Hosts can certainly provide non-reserved keys if they choose - advise them to prefix accordingly (e.g., LSF_FOO and lsf.foo).

\refapi{PMIx_Get} and \refapi{PMIx_Get_nb} API.

The publish/lookup data realm is accessed through a separate set of API's.
\refapi{PMIx_Put} cannot add or modify key-values within the publish/lookup realm and
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PMIx_Put has nothing to do with publish/lookup - I find this confusing mainly because of reuse of the "realm" term. You might want to reserve "realm" for the session/job/app/proc discussion to avoid confusion.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have talked about this. I tried to extend the concept of data realms to all keys. At the time, its seemed clearer to say that all keys belong to a realm and realms do not overlap. This seems easier to present rather than saying that "keys that are accessed via Get or Query are in realms, but keys accessed by publish/lookup are not in a realm". However, it does have the undesirable side effect that you point out that you are sometimes forced to talk about publish/lookup at times when it would be more convenient to not address those functions. I can go either way, it's a pretty minor change to just make realms about non-published data.

\refapi{PMIx_Put} cannot add or modify key-values within the publish/lookup realm and
\refapi{PMIx_Lookup} cannot access key-values outside the publish/lookup realm.
This data realm is described in detail is chapter [Reference to publish/lookup chapter].
Although PMIx_Publish and PMIx_Lookup are analogous to PMIx_Put and PMIx_Get in that both pairs
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not just leave publish/lookup to the chapter on that subject? It gets confusing here as it starts mixing pub/lookup with put/get, and those are fundamentally very different.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I hear you. See my previous comment. I'm ok with what your saying. Anyone else have thoughts?

where one \ac{PMIx} client shares its connectivity information, then other \ac{PMIx} clients access that information to establish a connection with that client.
In some environments that support ``instant-on'' all connectivity information for \ac{PMIx} clients is stored in the job-level information at process creation time and is accessible to the clients without the need to perform any additional key-value exchange.

Keys can exist in multiple data realms, possibly with different values.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This feels redundant to prior sections. Perhaps I'm missing the point of this chapter - how is what is presented different from what is in the reserved and non-reserved sections?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have some cleanup to do and we are going to divide this up into 3 PRs that will make this clearer. The file API_nonreserved_keys.txt is going away (or being renamed to Chap_API_Sharing_Basics.tex. So this is only presented once. When I repost this I will use the correct git mv to move it and make it clearer.

@jjhursey
Copy link
Member

Please use emoji reactions ON THIS COMMENT to indicate your position on this proposal.

  • You do not need to vote on every proposal
  • If you have no opinion, don't vote - that is also useful data
  • If you've already commented on this issue, please still vote so
    we know your current thoughts
  • Not all proposals solve exactly the same problem, so we may end
    up accepting proposals that appear to have some overlap
    This is not a binding majority-rule vote, but it will be a very
    significant input into the corresponding ASC decision.

Here are the meanings for the emojis:

  • Hooray or Rocket: I support this so strongly that I
    want to be an advocate for it
  • Heart: I think this is an ideal solution
  • Thumbs up: I'd be happy with this solution
  • Confused: I'd rather we not do this, but I can tolerate it
  • Thumbs down: I'd be actively unhappy, and may even consider
    other technologies instead
    If you want to explain in more detail, feel free to add another
    comment, but please also vote on this comment.

@dsolt
Copy link
Contributor Author

dsolt commented Jan 26, 2022

I am working on dividing this PR up into several smaller PR's, but will leave this open until that is done and until the comments here are all resolved. For now I marked this as "Withdrawn", though it is not really, it is just being reposted as separate changes.

@dsolt dsolt added the Withdrawn Withdrawn from consideration by authors label Jan 26, 2022
@jjhursey
Copy link
Member

This PR was broken into 4 other PRs to improve our ability to review. See tickets listed below:

@jjhursey jjhursey closed this Mar 7, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Withdrawn Withdrawn from consideration by authors
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants