Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ch4/ofi: Improved noncontiguous put/get #4475

Merged
merged 6 commits into from
May 18, 2020
Merged

Conversation

raffenet
Copy link
Contributor

@raffenet raffenet commented Apr 28, 2020

Pull Request Description

Noncontiguous RMA has several issues regarding datatype optimization and correctness. This PR removes the old, complex native implementation entirely. A new native implementation is added for when origin and target datatypes are both high density. See #4468.

Expected Impact

Simpler code. Improved performance for many RMA use-cases.

Author Checklist

  • Reference appropriate issues (with "Fixes" or "See" as appropriate)
  • Remove xfail from the test suite when fixing a test
  • Commits are self-contained and do not do two things at once
  • Commit message is of the form: module: short description and follows good practice
  • Passes whitespace checkers
  • Passes warning tests
  • Passes all tests
  • Add comments such that someone without knowledge of the code could understand
  • You or your company has a signed contributor's agreement on file with Argonne
  • For non-Argonne authors, request an explicit comment from your companies PR approval manager

@raffenet raffenet added the WIP label Apr 28, 2020
@raffenet raffenet force-pushed the iovec-util branch 2 times, most recently from fd93b66 to 04cfb52 Compare April 30, 2020 15:51
@raffenet
Copy link
Contributor Author

test:jenkins/ch4/ofi

@raffenet
Copy link
Contributor Author

test:mpich/ch4/ofi

@raffenet
Copy link
Contributor Author

raffenet commented May 1, 2020

test:jenkins/ch4/ofi

@pavanbalaji pavanbalaji mentioned this pull request May 4, 2020
10 tasks
@pavanbalaji
Copy link
Contributor

@raffenet While reviewing the ch4 code, I noticed that the AM layer doesn't perform packing either. It constructs IOVs in all cases too. It's a bit better than the ch4:ofi code because it constructs the full IOV instead of one element at a time, but that's not optimal either.

Should this PR be expanded to improve the CH4 AM code too? Or perhaps that should be a separate PR?

@raffenet
Copy link
Contributor Author

raffenet commented May 4, 2020

@raffenet While reviewing the ch4 code, I noticed that the AM layer doesn't perform packing either. It constructs IOVs in all cases too. It's a bit better than the ch4:ofi code because it constructs the full IOV instead of one element at a time, but that's not optimal either.

Should this PR be expanded to improve the CH4 AM code too? Or perhaps that should be a separate PR?

I would like to keep any AM code changes separate from this PR. I believe @hzhou might already have a PR to address some of the AM code issues.

@hzhou
Copy link
Contributor

hzhou commented May 4, 2020

I would like to keep any AM code changes separate from this PR. I believe @hzhou might already have a PR to address some of the AM code issues.

I agree that we should keep the PR separate. A likely arrangement is to hold this PR until the AM PR works out and gets merged first.

That assumes we have a solution for ch4 AM (and I don't have a PR for AM RMA yet). Let's discuss it --

For ch4 AM RMA, what we needed is to design a new protocol. The current protocol is to send an "IOV" packet in cases of non-contig put/get. What is the alternative solution? More specifically, how do we let the remote end know the datatype of another process?

@hzhou
Copy link
Contributor

hzhou commented May 4, 2020

Maybe of potential interest, I just created this draft PR #4487, essentially enable DTPools to recreate the datatype from a dtpools-description-string. It may be feasible in mpich to have this "datatype-string" conversion routines then we can communicate with remote end using datatype description string in stead of IOV arrays.

EDIT: actually, we don't need string, we just need a serialization of the datatype or dataloop for replacing IOVs.

EDIT2: if yaksa provide a serialization api, then we can use that too.

@raffenet
Copy link
Contributor Author

raffenet commented May 4, 2020

test:mpich/ch4/ofi

@raffenet
Copy link
Contributor Author

raffenet commented May 4, 2020

test:mpich/ch4/ofi

@raffenet
Copy link
Contributor Author

raffenet commented May 4, 2020

test:mpich/ch4/ofi

@pavanbalaji
Copy link
Contributor

@raffenet and @hzhou If you are following the discussion with Intel, this PR has become high priority now because of yaksa's dependency on it. So whatever we decide for this PR and the AM PR would need to be soon.

@pavanbalaji
Copy link
Contributor

EDIT2: if yaksa provide a serialization api, then we can use that too.

Yaksa and dataloop (and typerep) provide serialization APIs. This a core part of the datatype management code. There's no reason to pass around strings.

@raffenet raffenet force-pushed the iovec-util branch 2 times, most recently from 05286a0 to cf797f3 Compare May 7, 2020 20:42
@raffenet
Copy link
Contributor Author

raffenet commented May 7, 2020

test:jenkins/ch4/ofi

@raffenet
Copy link
Contributor Author

raffenet commented May 7, 2020

test:jenkins/ch4/ofi

@raffenet
Copy link
Contributor Author

raffenet commented May 7, 2020

test:mpich/ch4/ofi

@raffenet
Copy link
Contributor Author

raffenet commented May 8, 2020

test:jenkins/ch4/ofi

@raffenet
Copy link
Contributor Author

raffenet commented May 8, 2020

test:mpich/ch4/ofi

@raffenet raffenet changed the title ch4/ofi: Rewrite handling of noncontiguous RMA ch4/ofi: Improved noncontiguous put/get May 17, 2020
@raffenet
Copy link
Contributor Author

test:mpich/ch4/ofi

@raffenet
Copy link
Contributor Author

Indexing logic was made a little simpler using % operator. I also added a iov load helper in this version.

@raffenet
Copy link
Contributor Author

test:mpich/ch4/ofi

Copy link
Contributor

@pavanbalaji pavanbalaji left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. I'd have personally preferred the if branch be in the calling function (i.e., MPIDI_OFI_nopack_putget), rather than inside the MPIDI_OFI_load_iov function. But I'll not hold up this PR for it.

@pavanbalaji
Copy link
Contributor

Would you mind also testing it with a hack patch that changes --with-datatype-engine to auto, so the yaksa code gets tested too?

@raffenet
Copy link
Contributor Author

test:mpich/ch4/most

@raffenet
Copy link
Contributor Author

test:mpich/ch4/ofi

@raffenet
Copy link
Contributor Author

Assuming tests are successful, will pop off the HACK patches and merge.

@raffenet
Copy link
Contributor Author

There's a bunch of failures in the latest test run that I believe are related to yaksa. https://jenkins-pmrs.cels.anl.gov/job/mpich-review-ch4-ofi/1230/

If I revert the yaksa patch and run some tests by hand, they pass.

@raffenet
Copy link
Contributor Author

@pavanbalaji are the yaksa failures as expected? Just double checking before merge.

raffenet and others added 6 commits May 18, 2020 12:00
This implementation complicates datatype optimization, and also has
correctness issues. See pmodels#4468.
These were accidentally copied over in [e6ef83b].
Use this macro to determine the density of a datatype. The intent is
to use this information in communication protocol selection.
The current function provides a byte-based offset for IOV conversion
routines.  This is very expensive for yaksa.  The new routine matches
what yaksa provides more closely and is also a generally cleaner
interface.
Limit the number of iovecs that can be allocated for data
description. In case both datatypes are noncontiguous, the amount of
memory allocated for iovecs is capped at roughly 1MB.
Utilize native path RMA when origin and targtet datatype are high
density. Create iovecs representing each type, and issue a series of
contiguous operations.
@pavanbalaji
Copy link
Contributor

@raffenet Not expected, but I'll work on them separately from this PR. No need to hold up this PR for them.

@raffenet raffenet merged commit 92fcbcb into pmodels:master May 18, 2020
@raffenet raffenet deleted the iovec-util branch May 18, 2020 17:23
rithwiktom pushed a commit to rithwiktom/mpich that referenced this pull request Jul 3, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants