-
Notifications
You must be signed in to change notification settings - Fork 578
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Teuchos: SyncTimeMonitor #6452
Teuchos: SyncTimeMonitor #6452
Conversation
Status Flag 'Pre-Test Inspection' - Auto Inspected - Inspection Is Not Necessary for this Pull Request. |
Status Flag 'Master Merge AutoTester' - Testing Jenkins Projects: Pull Request Auto Testing STARTING (click to expand)Build InformationTest Name: Trilinos_pullrequest_gcc_4.8.4
Jenkins Parameters
Build InformationTest Name: Trilinos_pullrequest_intel_17.0.1
Jenkins Parameters
Build InformationTest Name: Trilinos_pullrequest_gcc_4.9.3_SERIAL
Jenkins Parameters
Build InformationTest Name: Trilinos_pullrequest_gcc_7.2.0
Jenkins Parameters
Build InformationTest Name: Trilinos_pullrequest_cuda_9.2
Jenkins Parameters
Using Repos:
Pull Request Author: cgcgcg |
Status Flag 'Master Merge AutoTester' - Jenkins Testing: 1 or more Jobs FAILED Note: Testing will normally be attempted again in approx. 15 Mins. If a change to the PR source branch occurs, the testing will be attempted again on next available autotester run. Pull Request Auto Testing has FAILED (click to expand)Build InformationTest Name: Trilinos_pullrequest_gcc_4.8.4
Jenkins Parameters
Build InformationTest Name: Trilinos_pullrequest_intel_17.0.1
Jenkins Parameters
Build InformationTest Name: Trilinos_pullrequest_gcc_4.9.3_SERIAL
Jenkins Parameters
Build InformationTest Name: Trilinos_pullrequest_gcc_7.2.0
Jenkins Parameters
Build InformationTest Name: Trilinos_pullrequest_cuda_9.2
Jenkins Parameters
Console Output (last 100 lines) : Trilinos_pullrequest_gcc_4.8.4 # 5185 (click to expand)
Console Output (last 100 lines) : Trilinos_pullrequest_intel_17.0.1 # 5010 (click to expand)
Console Output (last 100 lines) : Trilinos_pullrequest_gcc_4.9.3_SERIAL # 3439 (click to expand)
Console Output (last 100 lines) : Trilinos_pullrequest_gcc_7.2.0 # 3287 (click to expand)
Console Output (last 100 lines) : Trilinos_pullrequest_cuda_9.2 # 2841 (click to expand)
|
Status Flag 'Pull Request AutoTester' - Testing Jenkins Projects: Pull Request Auto Testing STARTING (click to expand)Build InformationTest Name: Trilinos_pullrequest_gcc_4.8.4
Jenkins Parameters
Build InformationTest Name: Trilinos_pullrequest_intel_17.0.1
Jenkins Parameters
Build InformationTest Name: Trilinos_pullrequest_gcc_4.9.3_SERIAL
Jenkins Parameters
Build InformationTest Name: Trilinos_pullrequest_gcc_7.2.0
Jenkins Parameters
Build InformationTest Name: Trilinos_pullrequest_cuda_9.2
Jenkins Parameters
Build InformationTest Name: Trilinos_pullrequest_python_2
Jenkins Parameters
Build InformationTest Name: Trilinos_pullrequest_python_3
Jenkins Parameters
Using Repos:
Pull Request Author: cgcgcg |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks reasonble to me. The only request that I have is to update a test in Teuchos that uses this new timer and checks it in some meaningful way.
@bartlettroscoe Sounds good! Will do next year. |
Status Flag 'Pull Request AutoTester' - Jenkins Testing: 1 or more Jobs FAILED Note: Testing will normally be attempted again in approx. 2 Hrs 30 Mins. If a change to the PR source branch occurs, the testing will be attempted again on next available autotester run. Pull Request Auto Testing has FAILED (click to expand)Build InformationTest Name: Trilinos_pullrequest_gcc_4.8.4
Jenkins Parameters
Build InformationTest Name: Trilinos_pullrequest_intel_17.0.1
Jenkins Parameters
Build InformationTest Name: Trilinos_pullrequest_gcc_4.9.3_SERIAL
Jenkins Parameters
Build InformationTest Name: Trilinos_pullrequest_gcc_7.2.0
Jenkins Parameters
Build InformationTest Name: Trilinos_pullrequest_cuda_9.2
Jenkins Parameters
Build InformationTest Name: Trilinos_pullrequest_python_2
Jenkins Parameters
Build InformationTest Name: Trilinos_pullrequest_python_3
Jenkins Parameters
Console Output (last 100 lines) : Trilinos_pullrequest_gcc_4.8.4 # 5186 (click to expand)
Console Output (last 100 lines) : Trilinos_pullrequest_intel_17.0.1 # 5011 (click to expand)
Console Output (last 100 lines) : Trilinos_pullrequest_gcc_4.9.3_SERIAL # 3440 (click to expand)
Console Output (last 100 lines) : Trilinos_pullrequest_gcc_7.2.0 # 3288 (click to expand)
Console Output (last 100 lines) : Trilinos_pullrequest_cuda_9.2 # 2842 (click to expand)
Console Output (last 100 lines) : Trilinos_pullrequest_python_2 # 1181 (click to expand)
Console Output (last 100 lines) : Trilinos_pullrequest_python_3 # 1181 (click to expand)
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
Are you really sure you want to have a collective in a destructor? That would make the destructor blocking. |
@mhoemmen The destructor of the parent class stops the timer. Since we want to have the barrier before the timer stops, this seems like the way to go. Is there any issue with having a blocking destructor? |
If an exception is thrown on one process and the stack unwound, the collective in the dtor will cause the code to hang. |
Don't know if a possible hang in a destructor is very different than regular C++ code. For example, consider the function:
If the function Do we really have rational approach for addressing hangs in C++ programs where only a subset of processes throw exceptions? |
@bartlettroscoe wrote:
No ;-P (Seriously, the only way to do that is to maintain separate local and global error states, and "go through the motions" locally until you get to a collective that permits exchange of error state. It's hard to write and hard to test.) |
Actually wait -- MueLu could piggyback error handling on top of this, no? Just maintain the local error state until the timing region closes, then all-reduce instead of doing a barrier. If you're not doing a barrier, you could just call |
If an exception can unroll all the way up it can at least terminate the other mpi processes as it exits. We just hit this bug last week with hdf5 calling a collective in the dtor that hung large runs in empire. |
A TimeMonitor that waits at a MPI barrier before destruction.
Status Flag 'Pull Request AutoTester' - User Requested Retest - Label AT: RETEST will be reset after testing. |
Status Flag 'Pre-Test Inspection' - Auto Inspected - Inspection Is Not Necessary for this Pull Request. |
Status Flag 'Pull Request AutoTester' - Testing Jenkins Projects: Pull Request Auto Testing STARTING (click to expand)Build InformationTest Name: Trilinos_pullrequest_gcc_4.8.4
Jenkins Parameters
Build InformationTest Name: Trilinos_pullrequest_intel_17.0.1
Jenkins Parameters
Build InformationTest Name: Trilinos_pullrequest_gcc_4.9.3_SERIAL
Jenkins Parameters
Build InformationTest Name: Trilinos_pullrequest_gcc_7.2.0
Jenkins Parameters
Build InformationTest Name: Trilinos_pullrequest_cuda_9.2
Jenkins Parameters
Build InformationTest Name: Trilinos_pullrequest_python_2
Jenkins Parameters
Build InformationTest Name: Trilinos_pullrequest_python_3
Jenkins Parameters
Using Repos:
Pull Request Author: cgcgcg |
Status Flag 'Pull Request AutoTester' - Error: Jenkins Jobs - Timed out waiting for ALL builds to complete: Total Wait = 25260; Max Timeout = 25200 |
Status Flag 'Pull Request AutoTester' - Jenkins Testing: 1 or more Jobs FAILED Note: Testing will normally be attempted again in approx. 2 Hrs 30 Mins. If a change to the PR source branch occurs, the testing will be attempted again on next available autotester run. Pull Request Auto Testing has FAILED (click to expand)Build InformationTest Name: Trilinos_pullrequest_gcc_4.8.4
Jenkins Parameters
Build InformationTest Name: Trilinos_pullrequest_intel_17.0.1
Jenkins Parameters
Build InformationTest Name: Trilinos_pullrequest_gcc_4.9.3_SERIAL
Jenkins Parameters
Build InformationTest Name: Trilinos_pullrequest_gcc_7.2.0
Jenkins Parameters
Build InformationTest Name: Trilinos_pullrequest_cuda_9.2
Jenkins Parameters
Build InformationTest Name: Trilinos_pullrequest_python_2
Jenkins Parameters
Build InformationTest Name: Trilinos_pullrequest_python_3
Jenkins Parameters
Console Output (last 100 lines) : Trilinos_pullrequest_gcc_4.8.4 # 5323 (click to expand)
Console Output (last 100 lines) : Trilinos_pullrequest_intel_17.0.1 # 5148 (click to expand)
Console Output (last 100 lines) : Trilinos_pullrequest_gcc_4.9.3_SERIAL # 3577 (click to expand)
Console Output (last 100 lines) : Trilinos_pullrequest_gcc_7.2.0 # 3425 (click to expand)
Console Output (last 100 lines) : Trilinos_pullrequest_cuda_9.2 # 2979 (click to expand)
Console Output (last 100 lines) : Trilinos_pullrequest_python_2 # 1285 (click to expand)
Console Output (last 100 lines) : Trilinos_pullrequest_python_3 # 1285 (click to expand)
|
Status Flag 'Pull Request AutoTester' - Testing Jenkins Projects: Pull Request Auto Testing STARTING (click to expand)Build InformationTest Name: Trilinos_pullrequest_gcc_4.8.4
Jenkins Parameters
Build InformationTest Name: Trilinos_pullrequest_intel_17.0.1
Jenkins Parameters
Build InformationTest Name: Trilinos_pullrequest_gcc_4.9.3_SERIAL
Jenkins Parameters
Build InformationTest Name: Trilinos_pullrequest_gcc_7.2.0
Jenkins Parameters
Build InformationTest Name: Trilinos_pullrequest_cuda_9.2
Jenkins Parameters
Build InformationTest Name: Trilinos_pullrequest_python_2
Jenkins Parameters
Build InformationTest Name: Trilinos_pullrequest_python_3
Jenkins Parameters
Using Repos:
Pull Request Author: cgcgcg |
Status Flag 'Pull Request AutoTester' - Error: Jenkins Jobs - Error: [Jenkins] Cannot retrieve build running status on build (FATAL: ERROR : [jwrap: build.is_running():756:./support/jenkins_support.py] - General Exception: (HTTPSConnectionPool(host='ascic-jenkins.sandia.gov', port=443): Max retries exceeded with url: /job/trilinos-folder/job/Trilinos_pullrequest_gcc_4.8.4/5328/api/python?depth=1&tree=building (Caused by ReadTimeoutError("HTTPSConnectionPool(host='ascic-jenkins.sandia.gov', port=443): Read timed out. (read timeout=60)",)))) |
Status Flag 'Pull Request AutoTester' - Jenkins Testing: 1 or more Jobs FAILED Note: Testing will normally be attempted again in approx. 2 Hrs 30 Mins. If a change to the PR source branch occurs, the testing will be attempted again on next available autotester run. Pull Request Auto Testing has FAILED (click to expand)Build InformationTest Name: Trilinos_pullrequest_gcc_4.8.4
Jenkins Parameters
Build InformationTest Name: Trilinos_pullrequest_intel_17.0.1
Jenkins Parameters
Build InformationTest Name: Trilinos_pullrequest_gcc_4.9.3_SERIAL
Jenkins Parameters
Build InformationTest Name: Trilinos_pullrequest_gcc_7.2.0
Jenkins Parameters
Build InformationTest Name: Trilinos_pullrequest_cuda_9.2
Jenkins Parameters
Build InformationTest Name: Trilinos_pullrequest_python_2
Jenkins Parameters
Build InformationTest Name: Trilinos_pullrequest_python_3
Jenkins Parameters
Console Output (last 100 lines) : Trilinos_pullrequest_gcc_4.8.4 # 5328 (click to expand)
Console Output (last 100 lines) : Trilinos_pullrequest_intel_17.0.1 # 5153 (click to expand)
Console Output (last 100 lines) : Trilinos_pullrequest_gcc_4.9.3_SERIAL # 3582 (click to expand)
Console Output (last 100 lines) : Trilinos_pullrequest_gcc_7.2.0 # 3430 (click to expand)
Console Output (last 100 lines) : Trilinos_pullrequest_cuda_9.2 # 2984 (click to expand)
Console Output (last 100 lines) : Trilinos_pullrequest_python_2 # 1289 (click to expand)
Console Output (last 100 lines) : Trilinos_pullrequest_python_3 # 1289 (click to expand)
|
Status Flag 'Pull Request AutoTester' - User Requested Retest - Label AT: RETEST will be reset after testing. |
Status Flag 'Pull Request AutoTester' - Testing Jenkins Projects: Pull Request Auto Testing STARTING (click to expand)Build InformationTest Name: Trilinos_pullrequest_gcc_4.8.4
Jenkins Parameters
Build InformationTest Name: Trilinos_pullrequest_intel_17.0.1
Jenkins Parameters
Build InformationTest Name: Trilinos_pullrequest_gcc_4.9.3_SERIAL
Jenkins Parameters
Build InformationTest Name: Trilinos_pullrequest_gcc_7.2.0
Jenkins Parameters
Build InformationTest Name: Trilinos_pullrequest_cuda_9.2
Jenkins Parameters
Build InformationTest Name: Trilinos_pullrequest_python_2
Jenkins Parameters
Build InformationTest Name: Trilinos_pullrequest_python_3
Jenkins Parameters
Using Repos:
Pull Request Author: cgcgcg |
Status Flag 'Pull Request AutoTester' - Jenkins Testing: all Jobs PASSED Pull Request Auto Testing has PASSED (click to expand)Build InformationTest Name: Trilinos_pullrequest_gcc_4.8.4
Jenkins Parameters
Build InformationTest Name: Trilinos_pullrequest_intel_17.0.1
Jenkins Parameters
Build InformationTest Name: Trilinos_pullrequest_gcc_4.9.3_SERIAL
Jenkins Parameters
Build InformationTest Name: Trilinos_pullrequest_gcc_7.2.0
Jenkins Parameters
Build InformationTest Name: Trilinos_pullrequest_cuda_9.2
Jenkins Parameters
Build InformationTest Name: Trilinos_pullrequest_python_2
Jenkins Parameters
Build InformationTest Name: Trilinos_pullrequest_python_3
Jenkins Parameters
|
Status Flag 'Pre-Merge Inspection' - - This Pull Request Requires Inspection... The code must be inspected by a member of the Team before Testing/Merging |
All Jobs Finished; status = PASSED, However Inspection must be performed before merge can occur... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Test lgtm.
Status Flag 'Pre-Merge Inspection' - SUCCESS: The last commit to this Pull Request has been INSPECTED AND APPROVED by [ jhux2 ]! |
Status Flag 'Pull Request AutoTester' - AutoMerge IS ENABLED, but the Label AT: AUTOMERGE is not set. Either set Label AT: AUTOMERGE or manually merge the PR... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for adding the unit test! It shows the value of this class.
|
||
private: | ||
// \brief Communicator on which barrier will be called. | ||
Ptr<const Comm<int> > comm_; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you use a Ptr
and not const Comm<int>&
, then it should be possible for comm_
to be null. However, it looks like you always assume that comm_
is nonnull.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @cgcgcg !
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for adding a test!
@trilinos/teuchos @trilinos/muelu
Motivation