-
Notifications
You must be signed in to change notification settings - Fork 286
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ch4/ofi: Improved noncontiguous put/get #4475
Conversation
fd93b66
to
04cfb52
Compare
test:jenkins/ch4/ofi |
test:mpich/ch4/ofi |
test:jenkins/ch4/ofi |
@raffenet While reviewing the Should this PR be expanded to improve the CH4 AM code too? Or perhaps that should be a separate PR? |
I would like to keep any AM code changes separate from this PR. I believe @hzhou might already have a PR to address some of the AM code issues. |
I agree that we should keep the PR separate. A likely arrangement is to hold this PR until the AM PR works out and gets merged first. That assumes we have a solution for ch4 AM (and I don't have a PR for AM RMA yet). Let's discuss it -- For ch4 AM RMA, what we needed is to design a new protocol. The current protocol is to send an "IOV" packet in cases of non-contig |
Maybe of potential interest, I just created this draft PR #4487, essentially enable EDIT: actually, we don't need string, we just need a serialization of the datatype or dataloop for replacing IOVs. EDIT2: if yaksa provide a serialization api, then we can use that too. |
test:mpich/ch4/ofi |
test:mpich/ch4/ofi |
test:mpich/ch4/ofi |
Yaksa and dataloop (and typerep) provide serialization APIs. This a core part of the datatype management code. There's no reason to pass around strings. |
05286a0
to
cf797f3
Compare
test:jenkins/ch4/ofi |
test:jenkins/ch4/ofi |
test:mpich/ch4/ofi |
test:jenkins/ch4/ofi |
test:mpich/ch4/ofi |
test:mpich/ch4/ofi |
Indexing logic was made a little simpler using % operator. I also added a iov load helper in this version. |
test:mpich/ch4/ofi |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. I'd have personally preferred the if
branch be in the calling function (i.e., MPIDI_OFI_nopack_putget
), rather than inside the MPIDI_OFI_load_iov
function. But I'll not hold up this PR for it.
Would you mind also testing it with a hack patch that changes |
test:mpich/ch4/most |
test:mpich/ch4/ofi |
Assuming tests are successful, will pop off the HACK patches and merge. |
There's a bunch of failures in the latest test run that I believe are related to yaksa. https://jenkins-pmrs.cels.anl.gov/job/mpich-review-ch4-ofi/1230/ If I revert the yaksa patch and run some tests by hand, they pass. |
@pavanbalaji are the yaksa failures as expected? Just double checking before merge. |
This implementation complicates datatype optimization, and also has correctness issues. See pmodels#4468.
These were accidentally copied over in [e6ef83b].
Use this macro to determine the density of a datatype. The intent is to use this information in communication protocol selection.
The current function provides a byte-based offset for IOV conversion routines. This is very expensive for yaksa. The new routine matches what yaksa provides more closely and is also a generally cleaner interface.
Limit the number of iovecs that can be allocated for data description. In case both datatypes are noncontiguous, the amount of memory allocated for iovecs is capped at roughly 1MB.
Utilize native path RMA when origin and targtet datatype are high density. Create iovecs representing each type, and issue a series of contiguous operations.
@raffenet Not expected, but I'll work on them separately from this PR. No need to hold up this PR for them. |
Main merge ww42
Pull Request Description
Noncontiguous RMA has several issues regarding datatype optimization and correctness. This PR removes the old, complex native implementation entirely. A new native implementation is added for when origin and target datatypes are both high density. See #4468.
Expected Impact
Simpler code. Improved performance for many RMA use-cases.
Author Checklist
module: short description
and follows good practice