-
Notifications
You must be signed in to change notification settings - Fork 578
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WIP Tpetra: persistent MPI requests #4782
Conversation
Status Flag 'Pre-Test Inspection' - Auto Inspected - Inspection Is Not Necessary for this Pull Request. |
Status Flag 'Pull Request AutoTester' - Testing Jenkins Projects: Pull Request Auto Testing STARTING (click to expand)Build InformationTest Name: Trilinos_pullrequest_gcc_4.8.4
Jenkins Parameters
Build InformationTest Name: Trilinos_pullrequest_intel_17.0.1
Jenkins Parameters
Build InformationTest Name: Trilinos_pullrequest_gcc_4.9.3_SERIAL
Jenkins Parameters
Build InformationTest Name: Trilinos_pullrequest_gcc_7.2.0
Jenkins Parameters
Build InformationTest Name: Trilinos_pullrequest_cuda_9.2
Jenkins Parameters
Using Repos:
Pull Request Author: cgcgcg |
@@ -210,6 +210,11 @@ class MpiCommRequestBase : public CommRequest<OrdinalType> { | |||
return tmp_rawMpiRequest; | |||
} | |||
|
|||
MPI_Request getRawMpiRequest() | |||
{ | |||
return rawMpiRequest_; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should add the right abstraction to CommRequest, instead of exposing this implementation detail. It looks like start
(which you added below) doesn't need this method.
*out_ << os.str (); | ||
} | ||
|
||
int count = persistentRequests_.size(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Use of raw MPI needs to be protected by the
HAVE_TPETRACORE_MPI
macro. - Just call
Teuchos::waitAll
; it does exactly what lines 578-586 do. - We should talk about better encapsulations (e.g., a "CommRequests" interface with an
MpiCommRequests
base class that has an array of rawMPI_Request
in it).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As far as I can see, 'waitAll' actually frees the requests. I can add an optional parameter to take care of that. And add a 'startAll' function.
@cgcgcg Thanks for working on this! :-D I think it's a great idea. See also the following Tpetra issues: |
Status Flag 'Pull Request AutoTester' - Jenkins Testing: 1 or more Jobs FAILED Note: Testing will normally be attempted again in approx. 2 Hrs 30 Mins. If a change to the PR source branch occurs, the testing will be attempted again on next available autotester run. Pull Request Auto Testing has FAILED (click to expand)Build InformationTest Name: Trilinos_pullrequest_gcc_4.8.4
Jenkins Parameters
Build InformationTest Name: Trilinos_pullrequest_intel_17.0.1
Jenkins Parameters
Build InformationTest Name: Trilinos_pullrequest_gcc_4.9.3_SERIAL
Jenkins Parameters
Build InformationTest Name: Trilinos_pullrequest_gcc_7.2.0
Jenkins Parameters
Build InformationTest Name: Trilinos_pullrequest_cuda_9.2
Jenkins Parameters
Console Output (last 100 lines) : Trilinos_pullrequest_gcc_4.8.4 # 3082 (click to expand)
Console Output (last 100 lines) : Trilinos_pullrequest_intel_17.0.1 # 2912 (click to expand)
Console Output (last 100 lines) : Trilinos_pullrequest_gcc_4.9.3_SERIAL # 1361 (click to expand)
Console Output (last 100 lines) : Trilinos_pullrequest_gcc_7.2.0 # 1049 (click to expand)
Console Output (last 100 lines) : Trilinos_pullrequest_cuda_9.2 # 779 (click to expand)
|
e62faf1
to
d1dcc48
Compare
Status Flag 'Pull Request AutoTester' - User Requested Retest - Label AT: RETEST will be reset after testing. |
Status Flag 'Pre-Test Inspection' - Auto Inspected - Inspection Is Not Necessary for this Pull Request. |
Status Flag 'Pull Request AutoTester' - Testing Jenkins Projects: Pull Request Auto Testing STARTING (click to expand)Build InformationTest Name: Trilinos_pullrequest_gcc_4.8.4
Jenkins Parameters
Build InformationTest Name: Trilinos_pullrequest_intel_17.0.1
Jenkins Parameters
Build InformationTest Name: Trilinos_pullrequest_gcc_4.9.3_SERIAL
Jenkins Parameters
Build InformationTest Name: Trilinos_pullrequest_gcc_7.2.0
Jenkins Parameters
Build InformationTest Name: Trilinos_pullrequest_cuda_9.2
Jenkins Parameters
Using Repos:
Pull Request Author: cgcgcg |
/// does nothing and returns null) or an RCP of a valid CommRequest | ||
/// instance representing an communication request. | ||
virtual void | ||
free (const Ptr<RCP<CommRequest<Ordinal> > >& request) const = 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you actually need an explicit free
method? CommRequest objects know how to free themselves already.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The MpiCommRequestBase
has a destructor that calls MPI_Cancel
, not MPI_Request_free
. I think that is is reasonable in the case of persistent requests to ask the user to explicitly free them..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@cgcgcg Then we should fix the destructor :-)
@trilinos/tpetra @cgcgcg |
Status Flag 'Pull Request AutoTester' - Jenkins Testing: 1 or more Jobs FAILED Note: Testing will normally be attempted again in approx. 2 Hrs 30 Mins. If a change to the PR source branch occurs, the testing will be attempted again on next available autotester run. Pull Request Auto Testing has FAILED (click to expand)Build InformationTest Name: Trilinos_pullrequest_gcc_4.8.4
Jenkins Parameters
Build InformationTest Name: Trilinos_pullrequest_intel_17.0.1
Jenkins Parameters
Build InformationTest Name: Trilinos_pullrequest_gcc_4.9.3_SERIAL
Jenkins Parameters
Build InformationTest Name: Trilinos_pullrequest_gcc_7.2.0
Jenkins Parameters
Build InformationTest Name: Trilinos_pullrequest_cuda_9.2
Jenkins Parameters
Console Output (last 100 lines) : Trilinos_pullrequest_gcc_4.8.4 # 3096 (click to expand)
Console Output (last 100 lines) : Trilinos_pullrequest_intel_17.0.1 # 2926 (click to expand)
Console Output (last 100 lines) : Trilinos_pullrequest_gcc_4.9.3_SERIAL # 1375 (click to expand)
Console Output (last 100 lines) : Trilinos_pullrequest_gcc_7.2.0 # 1063 (click to expand)
Console Output (last 100 lines) : Trilinos_pullrequest_cuda_9.2 # 792 (click to expand)
|
@kddevin I am running MiniEM on Cori, but I am getting a lot of noise. Some runs look very good, other runs show no improvement over standard Send-Ireceive. I'll keep investigating. We should definitely not make this the default before it has been thoroughly tested! |
Status Flag 'Pull Request AutoTester' - Testing Jenkins Projects: Pull Request Auto Testing STARTING (click to expand)Build InformationTest Name: Trilinos_pullrequest_gcc_4.8.4
Jenkins Parameters
Build InformationTest Name: Trilinos_pullrequest_intel_17.0.1
Jenkins Parameters
Build InformationTest Name: Trilinos_pullrequest_gcc_4.9.3_SERIAL
Jenkins Parameters
Build InformationTest Name: Trilinos_pullrequest_gcc_7.2.0
Jenkins Parameters
Build InformationTest Name: Trilinos_pullrequest_cuda_9.2
Jenkins Parameters
Using Repos:
Pull Request Author: cgcgcg |
Status Flag 'Pull Request AutoTester' - Jenkins Testing: all Jobs PASSED Pull Request Auto Testing has PASSED (click to expand)Build InformationTest Name: Trilinos_pullrequest_gcc_4.8.4
Jenkins Parameters
Build InformationTest Name: Trilinos_pullrequest_intel_17.0.1
Jenkins Parameters
Build InformationTest Name: Trilinos_pullrequest_gcc_4.9.3_SERIAL
Jenkins Parameters
Build InformationTest Name: Trilinos_pullrequest_gcc_7.2.0
Jenkins Parameters
Build InformationTest Name: Trilinos_pullrequest_cuda_9.2
Jenkins Parameters
|
Status Flag 'Pre-Merge Inspection' - - This Pull Request Requires Inspection... The code must be inspected by a member of the Team before Testing/Merging |
All Jobs Finished; status = PASSED, However Inspection must be performed before merge can occur... |
1 similar comment
All Jobs Finished; status = PASSED, However Inspection must be performed before merge can occur... |
@cgcgcg This actually passed -- how do you feel about this? Do you want to keep working on it? |
@mhoemmen I'm not surprised, we did not change the default behavior, so it should pass. I want to get some timings before pushing it in.. |
All Jobs Finished; status = PASSED, However PR is now STALE, and must be retested. Set the AT: RETEST Label to force retest.... |
17 similar comments
All Jobs Finished; status = PASSED, However PR is now STALE, and must be retested. Set the AT: RETEST Label to force retest.... |
All Jobs Finished; status = PASSED, However PR is now STALE, and must be retested. Set the AT: RETEST Label to force retest.... |
All Jobs Finished; status = PASSED, However PR is now STALE, and must be retested. Set the AT: RETEST Label to force retest.... |
All Jobs Finished; status = PASSED, However PR is now STALE, and must be retested. Set the AT: RETEST Label to force retest.... |
All Jobs Finished; status = PASSED, However PR is now STALE, and must be retested. Set the AT: RETEST Label to force retest.... |
All Jobs Finished; status = PASSED, However PR is now STALE, and must be retested. Set the AT: RETEST Label to force retest.... |
All Jobs Finished; status = PASSED, However PR is now STALE, and must be retested. Set the AT: RETEST Label to force retest.... |
All Jobs Finished; status = PASSED, However PR is now STALE, and must be retested. Set the AT: RETEST Label to force retest.... |
All Jobs Finished; status = PASSED, However PR is now STALE, and must be retested. Set the AT: RETEST Label to force retest.... |
All Jobs Finished; status = PASSED, However PR is now STALE, and must be retested. Set the AT: RETEST Label to force retest.... |
All Jobs Finished; status = PASSED, However PR is now STALE, and must be retested. Set the AT: RETEST Label to force retest.... |
All Jobs Finished; status = PASSED, However PR is now STALE, and must be retested. Set the AT: RETEST Label to force retest.... |
All Jobs Finished; status = PASSED, However PR is now STALE, and must be retested. Set the AT: RETEST Label to force retest.... |
All Jobs Finished; status = PASSED, However PR is now STALE, and must be retested. Set the AT: RETEST Label to force retest.... |
All Jobs Finished; status = PASSED, However PR is now STALE, and must be retested. Set the AT: RETEST Label to force retest.... |
All Jobs Finished; status = PASSED, However PR is now STALE, and must be retested. Set the AT: RETEST Label to force retest.... |
All Jobs Finished; status = PASSED, However PR is now STALE, and must be retested. Set the AT: RETEST Label to force retest.... |
This Pull Request has been marked for closure due to inactivity. |
All of the four issues mentioned above were "closed to due lack of activity." None of them were actually implemented. |
This Pull Request has been marked for closure due to inactivity. |
This Pull Request has been automatically closed due to 395 days of inactivity. |
@trilinos/tpetra
Description
This PR adds the option to use persistent MPI communication in Tpetra. The defaults communication option is not changed. I tested this using MueLu, where this can be switched on for the solve phase only. There are probably quite a lot of cases where this will not work correctly, so I appreciate feedback!