Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Zoltan2: Fixes for #6440 #6514

Merged
merged 3 commits into from
Jan 7, 2020
Merged

Conversation

MicheldeMessieres
Copy link
Contributor

This will resolve indeterminate behavior in the Multijagged calculation.
This is expected to resolve most of the cases where 6440 failed. However in one case, cdash showed a segfault and it's not clear yet why that would occur or if it would be impacted by this fix. This will at least get the test running in a consistent manner.

@trilinos/zoltan2

Motivation

Eliminates indeterminate behavior. Will fix most but perhaps not all intermittent failures reported in #6440.

Stakeholder Feedback

Testing

This will resolve an indeterminancy in the MJ calculation.
Mj tests can now detect indeterminate results and
return an error. Cdash showed a seg fault in one case and
it's not clear if that is related. But this will at least
get the test running in a consistent manner and should clear
most of the random failures taking down auto PR testing.
@MicheldeMessieres MicheldeMessieres requested review from kddevin and removed request for kddevin January 2, 2020 19:37
@trilinos-autotester
Copy link
Contributor

Status Flag 'Pre-Test Inspection' - - This Pull Request Requires Inspection... The code must be inspected by a member of the Team before Testing/Merging
WARNING: NO REVIEWERS HAVE BEEN REQUESTED FOR THIS PULL REQUEST!

@MicheldeMessieres MicheldeMessieres added the AT: WIP Causes the PR autotester to not test the PR. (Remove to allow testing to occur.) label Jan 2, 2020
// the last member is utility used for atomically inserting the values.
// Sorting here avoids potential indeterminancy in the partitioning results
auto track_on_cuts_sort = Kokkos::subview(track_on_cuts,
std::pair<mj_lno_t, mj_lno_t>(0, track_on_cuts.size() - 1));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Kokkos::subview takes a half-exclusive range (start, end+1). Also, what if track_on_cuts.size() is zero?

@MicheldeMessieres
Copy link
Contributor Author

@mhoemmen Thanks. This array is always size 1 or greater and the last element is used for index tracking so not sorted. Though I could add a check to skip the sort when size = 1 and no sorting is necessary. However I've temporarily closed the PR as the bug just stopped replicating for me in develop. I need to investigate this further.

The optimizations don't work properly with OpenMP due to
the statics and caused the randomly generated coords to change
run to run. Since this code is just for the tests it is not
performance critical.
Leave view at size 0 if not used at all.
Skip sort call when view is not used.
@MicheldeMessieres
Copy link
Contributor Author

@kddevin PR fixes indeterminate result causing the MultiVector pass and BasicVectorAdapter pass to give different results on the cuts and fail the comparison part of the test.

I've also fixed another issue in the GeometricGenerator random number generation. The statics were not handled in a thread safe way and the original coordinate distribution was changing run to run for OpenMP. I think performance for GeometricGenerator is not very important since it's for tests only so I've simplified this and just removed the statics.

@kddevin
Copy link
Contributor

kddevin commented Jan 6, 2020

Thanks, @MicheldeMessieres.
You are correct that performance of the GeometricGenerator does not matter. If its use of OpenMP is problematic, it is OK to serialize it all.

@kddevin kddevin removed the AT: WIP Causes the PR autotester to not test the PR. (Remove to allow testing to occur.) label Jan 6, 2020
@trilinos-autotester
Copy link
Contributor

Status Flag 'Pre-Test Inspection' - SUCCESS: The last commit to this Pull Request has been INSPECTED AND APPROVED by [ kddevin ]!

@trilinos-autotester
Copy link
Contributor

Status Flag 'Pull Request AutoTester' - Testing Jenkins Projects:

Pull Request Auto Testing STARTING (click to expand)

Build Information

Test Name: Trilinos_pullrequest_gcc_4.8.4

  • Build Num: 5315
  • Status: STARTED

Jenkins Parameters

Parameter Name Value
COMPILER_MODULE sems-gcc/4.8.4
JENKINS_BUILD_TYPE Release
JENKINS_COMM_TYPE MPI
JENKINS_DO_COMPLEX OFF
JENKINS_JOB_TYPE Experimental
MPI_MODULE sems-openmpi/1.8.7
PULLREQUESTNUM 6514
TEST_REPO_ALIAS TRILINOS
TRILINOS_SOURCE_BRANCH fix6440
TRILINOS_SOURCE_REPO https://github.com/Tech-XCorp/Trilinos
TRILINOS_SOURCE_SHA 62d4cab
TRILINOS_TARGET_BRANCH develop
TRILINOS_TARGET_REPO https://github.com/trilinos/Trilinos
TRILINOS_TARGET_SHA 4b0fef0

Build Information

Test Name: Trilinos_pullrequest_intel_17.0.1

  • Build Num: 5140
  • Status: STARTED

Jenkins Parameters

Parameter Name Value
PULLREQUESTNUM 6514
TEST_REPO_ALIAS TRILINOS
TRILINOS_SOURCE_BRANCH fix6440
TRILINOS_SOURCE_REPO https://github.com/Tech-XCorp/Trilinos
TRILINOS_SOURCE_SHA 62d4cab
TRILINOS_TARGET_BRANCH develop
TRILINOS_TARGET_REPO https://github.com/trilinos/Trilinos
TRILINOS_TARGET_SHA 4b0fef0

Build Information

Test Name: Trilinos_pullrequest_gcc_4.9.3_SERIAL

  • Build Num: 3569
  • Status: STARTED

Jenkins Parameters

Parameter Name Value
PULLREQUESTNUM 6514
TEST_REPO_ALIAS TRILINOS
TRILINOS_SOURCE_BRANCH fix6440
TRILINOS_SOURCE_REPO https://github.com/Tech-XCorp/Trilinos
TRILINOS_SOURCE_SHA 62d4cab
TRILINOS_TARGET_BRANCH develop
TRILINOS_TARGET_REPO https://github.com/trilinos/Trilinos
TRILINOS_TARGET_SHA 4b0fef0

Build Information

Test Name: Trilinos_pullrequest_gcc_7.2.0

  • Build Num: 3417
  • Status: STARTED

Jenkins Parameters

Parameter Name Value
PULLREQUESTNUM 6514
TEST_REPO_ALIAS TRILINOS
TRILINOS_SOURCE_BRANCH fix6440
TRILINOS_SOURCE_REPO https://github.com/Tech-XCorp/Trilinos
TRILINOS_SOURCE_SHA 62d4cab
TRILINOS_TARGET_BRANCH develop
TRILINOS_TARGET_REPO https://github.com/trilinos/Trilinos
TRILINOS_TARGET_SHA 4b0fef0

Build Information

Test Name: Trilinos_pullrequest_cuda_9.2

  • Build Num: 2971
  • Status: STARTED

Jenkins Parameters

Parameter Name Value
JENKINS_JOB_TYPE Experimental
PULLREQUESTNUM 6514
TEST_REPO_ALIAS TRILINOS
TRILINOS_SOURCE_BRANCH fix6440
TRILINOS_SOURCE_REPO https://github.com/Tech-XCorp/Trilinos
TRILINOS_SOURCE_SHA 62d4cab
TRILINOS_TARGET_BRANCH develop
TRILINOS_TARGET_REPO https://github.com/trilinos/Trilinos
TRILINOS_TARGET_SHA 4b0fef0

Build Information

Test Name: Trilinos_pullrequest_python_2

  • Build Num: 1278
  • Status: STARTED

Jenkins Parameters

Parameter Name Value
PULLREQUESTNUM 6514
TEST_REPO_ALIAS TRILINOS
TRILINOS_SOURCE_BRANCH fix6440
TRILINOS_SOURCE_REPO https://github.com/Tech-XCorp/Trilinos
TRILINOS_SOURCE_SHA 62d4cab
TRILINOS_TARGET_BRANCH develop
TRILINOS_TARGET_REPO https://github.com/trilinos/Trilinos
TRILINOS_TARGET_SHA 4b0fef0

Build Information

Test Name: Trilinos_pullrequest_python_3

  • Build Num: 1278
  • Status: STARTED

Jenkins Parameters

Parameter Name Value
PULLREQUESTNUM 6514
TEST_REPO_ALIAS TRILINOS
TRILINOS_SOURCE_BRANCH fix6440
TRILINOS_SOURCE_REPO https://github.com/Tech-XCorp/Trilinos
TRILINOS_SOURCE_SHA 62d4cab
TRILINOS_TARGET_BRANCH develop
TRILINOS_TARGET_REPO https://github.com/trilinos/Trilinos
TRILINOS_TARGET_SHA 4b0fef0

Using Repos:

Repo: TRILINOS (Tech-XCorp/Trilinos)
  • Branch: fix6440
  • SHA: 62d4cab
  • Mode: TEST_REPO

Pull Request Author: MicheldeMessieres

@trilinos-autotester
Copy link
Contributor

Status Flag 'Pull Request AutoTester' - Jenkins Testing: all Jobs PASSED

Pull Request Auto Testing has PASSED (click to expand)

Build Information

Test Name: Trilinos_pullrequest_gcc_4.8.4

  • Build Num: 5315
  • Status: PASSED

Jenkins Parameters

Parameter Name Value
COMPILER_MODULE sems-gcc/4.8.4
JENKINS_BUILD_TYPE Release
JENKINS_COMM_TYPE MPI
JENKINS_DO_COMPLEX OFF
JENKINS_JOB_TYPE Experimental
MPI_MODULE sems-openmpi/1.8.7
PULLREQUESTNUM 6514
TEST_REPO_ALIAS TRILINOS
TRILINOS_SOURCE_BRANCH fix6440
TRILINOS_SOURCE_REPO https://github.com/Tech-XCorp/Trilinos
TRILINOS_SOURCE_SHA 62d4cab
TRILINOS_TARGET_BRANCH develop
TRILINOS_TARGET_REPO https://github.com/trilinos/Trilinos
TRILINOS_TARGET_SHA 4b0fef0

Build Information

Test Name: Trilinos_pullrequest_intel_17.0.1

  • Build Num: 5140
  • Status: PASSED

Jenkins Parameters

Parameter Name Value
PULLREQUESTNUM 6514
TEST_REPO_ALIAS TRILINOS
TRILINOS_SOURCE_BRANCH fix6440
TRILINOS_SOURCE_REPO https://github.com/Tech-XCorp/Trilinos
TRILINOS_SOURCE_SHA 62d4cab
TRILINOS_TARGET_BRANCH develop
TRILINOS_TARGET_REPO https://github.com/trilinos/Trilinos
TRILINOS_TARGET_SHA 4b0fef0

Build Information

Test Name: Trilinos_pullrequest_gcc_4.9.3_SERIAL

  • Build Num: 3569
  • Status: PASSED

Jenkins Parameters

Parameter Name Value
PULLREQUESTNUM 6514
TEST_REPO_ALIAS TRILINOS
TRILINOS_SOURCE_BRANCH fix6440
TRILINOS_SOURCE_REPO https://github.com/Tech-XCorp/Trilinos
TRILINOS_SOURCE_SHA 62d4cab
TRILINOS_TARGET_BRANCH develop
TRILINOS_TARGET_REPO https://github.com/trilinos/Trilinos
TRILINOS_TARGET_SHA 4b0fef0

Build Information

Test Name: Trilinos_pullrequest_gcc_7.2.0

  • Build Num: 3417
  • Status: PASSED

Jenkins Parameters

Parameter Name Value
PULLREQUESTNUM 6514
TEST_REPO_ALIAS TRILINOS
TRILINOS_SOURCE_BRANCH fix6440
TRILINOS_SOURCE_REPO https://github.com/Tech-XCorp/Trilinos
TRILINOS_SOURCE_SHA 62d4cab
TRILINOS_TARGET_BRANCH develop
TRILINOS_TARGET_REPO https://github.com/trilinos/Trilinos
TRILINOS_TARGET_SHA 4b0fef0

Build Information

Test Name: Trilinos_pullrequest_cuda_9.2

  • Build Num: 2971
  • Status: PASSED

Jenkins Parameters

Parameter Name Value
JENKINS_JOB_TYPE Experimental
PULLREQUESTNUM 6514
TEST_REPO_ALIAS TRILINOS
TRILINOS_SOURCE_BRANCH fix6440
TRILINOS_SOURCE_REPO https://github.com/Tech-XCorp/Trilinos
TRILINOS_SOURCE_SHA 62d4cab
TRILINOS_TARGET_BRANCH develop
TRILINOS_TARGET_REPO https://github.com/trilinos/Trilinos
TRILINOS_TARGET_SHA 4b0fef0

Build Information

Test Name: Trilinos_pullrequest_python_2

  • Build Num: 1278
  • Status: PASSED

Jenkins Parameters

Parameter Name Value
PULLREQUESTNUM 6514
TEST_REPO_ALIAS TRILINOS
TRILINOS_SOURCE_BRANCH fix6440
TRILINOS_SOURCE_REPO https://github.com/Tech-XCorp/Trilinos
TRILINOS_SOURCE_SHA 62d4cab
TRILINOS_TARGET_BRANCH develop
TRILINOS_TARGET_REPO https://github.com/trilinos/Trilinos
TRILINOS_TARGET_SHA 4b0fef0

Build Information

Test Name: Trilinos_pullrequest_python_3

  • Build Num: 1278
  • Status: PASSED

Jenkins Parameters

Parameter Name Value
PULLREQUESTNUM 6514
TEST_REPO_ALIAS TRILINOS
TRILINOS_SOURCE_BRANCH fix6440
TRILINOS_SOURCE_REPO https://github.com/Tech-XCorp/Trilinos
TRILINOS_SOURCE_SHA 62d4cab
TRILINOS_TARGET_BRANCH develop
TRILINOS_TARGET_REPO https://github.com/trilinos/Trilinos
TRILINOS_TARGET_SHA 4b0fef0


CDash Test Results for PR# 6514.

@trilinos-autotester
Copy link
Contributor

Status Flag 'Pre-Merge Inspection' - SUCCESS: The last commit to this Pull Request has been INSPECTED AND APPROVED by [ kddevin ]!

@trilinos-autotester
Copy link
Contributor

Status Flag 'Pull Request AutoTester' - AutoMerge IS ENABLED, but the Label AT: AUTOMERGE is not set. Either set Label AT: AUTOMERGE or manually merge the PR...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants