-
Notifications
You must be signed in to change notification settings - Fork 578
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Tpetra: use created subviews of communication buffers #6598
Conversation
a MultiVector, we created subviews for comm buffers, but did not store them. This commit stores them. It also offsets the buffers by the vector j requested from the MultiVector.
Status Flag 'Pre-Test Inspection' - Auto Inspected - Inspection Is Not Necessary for this Pull Request. |
Status Flag 'Pull Request AutoTester' - Testing Jenkins Projects: Pull Request Auto Testing STARTING (click to expand)Build InformationTest Name: Trilinos_pullrequest_gcc_4.8.4
Jenkins Parameters
Build InformationTest Name: Trilinos_pullrequest_intel_17.0.1
Jenkins Parameters
Build InformationTest Name: Trilinos_pullrequest_gcc_4.9.3_SERIAL
Jenkins Parameters
Build InformationTest Name: Trilinos_pullrequest_gcc_7.2.0
Jenkins Parameters
Build InformationTest Name: Trilinos_pullrequest_cuda_9.2
Jenkins Parameters
Build InformationTest Name: Trilinos_pullrequest_python_2
Jenkins Parameters
Build InformationTest Name: Trilinos_pullrequest_python_3
Jenkins Parameters
Using Repos:
Pull Request Author: kddevin |
Status Flag 'Pull Request AutoTester' - Jenkins Testing: all Jobs PASSED Pull Request Auto Testing has PASSED (click to expand)Build InformationTest Name: Trilinos_pullrequest_gcc_4.8.4
Jenkins Parameters
Build InformationTest Name: Trilinos_pullrequest_intel_17.0.1
Jenkins Parameters
Build InformationTest Name: Trilinos_pullrequest_gcc_4.9.3_SERIAL
Jenkins Parameters
Build InformationTest Name: Trilinos_pullrequest_gcc_7.2.0
Jenkins Parameters
Build InformationTest Name: Trilinos_pullrequest_cuda_9.2
Jenkins Parameters
Build InformationTest Name: Trilinos_pullrequest_python_2
Jenkins Parameters
Build InformationTest Name: Trilinos_pullrequest_python_3
Jenkins Parameters
|
Status Flag 'Pre-Merge Inspection' - - This Pull Request Requires Inspection... The code must be inspected by a member of the Team before Testing/Merging |
All Jobs Finished; status = PASSED, However Inspection must be performed before merge can occur... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll be happy to approve if you want to move on, but this is not a complete fix. We also need to take subviews in the constructors that view a contiguous subset of columns. In theory also, we might want to fix this for the constructors that view a noncontiguous subset of columns, but that's harder -- we would need indirect indexing into imports_
and exports_
in that case.
@@ -3395,15 +3395,23 @@ namespace Tpetra { | |||
// be exactly what we need, so we won't have to resize them later. | |||
{ | |||
const size_t newSize = X.imports_.extent (0) / numCols; | |||
const size_t offset = j*newSize; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
j
isn't necessarily the correct offset. We should actually use jj
(see above) here, in case *this*
views a noncontiguous set of columns of another MultiVector.
Actually, if Y views a contiguous proper subset of columns of X, and Z views a noncontiguous proper subset of columns of Y, then jj
is not even the right entry of imports_
. We can fix that in the "view contiguous subset of columns" constructors and methods by adjusting the offset of imports_
and exports_
accordingly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So I should switch j to jj? Note that previously, the subviews took the first newSize entries of the buffer always. If the buffers are strictly used for communication, that might be fine, but I thought indexing into them would be better.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kddevin wrote:
If the buffers are strictly used for communication, ....
That's correct. They are used for nothing other than communication of packed data.
So I should switch
j
tojj
?
Yes. I actually studied the example in my above comments, and I think this is a complete fix, given how MultiVector::subView*
work.
If we switch j
to jj
, that will fix the case where different single Vectors view distinct columns of one MultiVector. In this case, your change will mean that different Vectors will view different parts of the buffer. Thus, you'll be able to do concurrent asynchronous doExport or doImport correctly with the Vectors.
Consider the following, more complicated example:
using MV = Tpetra::MultiVector<>;
MV X (map, 10);
Teuchos::Range1D Y_colRange (1, 5); // inclusive: 1, 2, 3, 4, 5
RCP<MV> Y = X.subViewNonConst (Y_colRange);
Teuchos::Array<size_t> whichVecsZ {{0, 2, 4}};
Teuchos::Array<size_t> whichVecsW {{0, 2, 4}};
RCP<MV> Z = X.subViewNonConst (whichVecsZ ());
RCP<MV> W = Y.subViewNonConst (whichVecsW ());
Y views columns 1, 2, 3, 4, and 5 of X. Z views columns 0, 2, and 4 of X. W views columns 0, 2, and 4 of Y -- thus, columns 1, 3, and 5 of X. Users would expect it to be safe to do concurrent communication with Z and W.
In this case, they are right, because neither overload of subViewNonConst
reuses their parent MultiVector's buffers. Thus, Z and W each would start with zero-length imports_
and exports_
buffers, and would each reallocate them on demand.
Thus, this case is already correct, and your change doesn't break it. If we decide to make subView*
reuse communication buffers in the future, we only need to use the correct offset. That will make the above example continue to be correct.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The only way we could write a reliable test for this use case, though, would be if we had asynchronous doExport / doImport. We don't have that right now.
An alternate approach would be to remove the subview construction and let the code operate as it does currently. Would that be preferable to the changes in this PR? |
Status Flag 'Pre-Merge Inspection' - - This Pull Request Requires Inspection... The code must be inspected by a member of the Team before Testing/Merging |
All Jobs Finished; status = PASSED, However Inspection must be performed before merge can occur... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've convinced myself that this is correct and is a complete fix for the issue. Thanks!
Status Flag 'Pre-Merge Inspection' - SUCCESS: The last commit to this Pull Request has been INSPECTED AND APPROVED by [ mhoemmen ]! |
Status Flag 'Pull Request AutoTester' - Pull Request will be Automerged |
Merge on Pull Request# 6598: IS A SUCCESS - Pull Request successfully merged |
Automatically Merged using Trilinos Pull Request AutoTester PR Title: Tpetra: index change from #6598 PR Author: kddevin
…s:develop' (d17489d). * trilinos-develop: tpetra: In trilinos#6598, @mhoemmen recommended this change of offset SEACAS: go back to lib:fmt 6.0.0 until fix issue on vortex xl/cuda build Disable Teko_testdriver_tpetra_MPI_4 in all atdm 'waterman' builds (trilinos#6463) zoltan2: add missing include file for non-ETI builds Tpetra: Missed ifdef guard zoltan2: name change to prevent shadow warnings zoltan2: Change logic for determining gno types to use in tests Simplified now that Trilinos builds only one gno_t Tpetra: More stacked timer fixes Tpetra: Fix overflow for TpetraCore_MatrixMatrix_UnitTests tpetra: removed unused field from FixedHashTable This removes some warnings about calling host functions from host device functions trilinos#5698; E.g., warning: calling a __host__ function("std::basic_string<char, std::char_traits<char>, std::allocator<char> >::~basic_string") from a __host__ __device__ function("Tpetra::Details::LocalMap<int, long long, ::Kokkos::Device< ::Kokkos::Serial, ::Kokkos::HostSpace> > ::~LocalMap [subobject]") is not allowed tpetra: when looking at trilinos#6158, I saw that, when creating a Vector from a MultiVector, we created subviews for comm buffers, but did not store them. This commit stores them. It also offsets the buffers by the vector j requested from the MultiVector.
…s:develop' (d17489d). * trilinos-develop: tpetra: In trilinos#6598, @mhoemmen recommended this change of offset Tpetra::CrsMatrix: Add Kokkos kernel labels; expose debug code Tpetra::CrsMatrix: Remove values2D_ Tpetra::CrsGraph: Remove gblInds2D_ Tpetra::CrsGraph: Remove lclInds2D_ Tpetra::CrsMatrix: Remove unused method allocateValues2D Tpetra: Use verbosePrintCountThreshold in copyOffsets Tpetra::Details::Behavior: Add longRowMinNumEntries Tpetra::Details::Behavior: Factor out size_t reading SEACAS: go back to lib:fmt 6.0.0 until fix issue on vortex xl/cuda build Disable Teko_testdriver_tpetra_MPI_4 in all atdm 'waterman' builds (trilinos#6463) zoltan2: add missing include file for non-ETI builds Tpetra: Missed ifdef guard zoltan2: name change to prevent shadow warnings zoltan2: Change logic for determining gno types to use in tests Simplified now that Trilinos builds only one gno_t Tpetra: More stacked timer fixes Tpetra: Fix overflow for TpetraCore_MatrixMatrix_UnitTests tpetra: removed unused field from FixedHashTable This removes some warnings about calling host functions from host device functions trilinos#5698; E.g., warning: calling a __host__ function("std::basic_string<char, std::char_traits<char>, std::allocator<char> >::~basic_string") from a __host__ __device__ function("Tpetra::Details::LocalMap<int, long long, ::Kokkos::Device< ::Kokkos::Serial, ::Kokkos::HostSpace> > ::~LocalMap [subobject]") is not allowed Framework: updating the autotester env to remove dependency on atdm-env tpetra: when looking at trilinos#6158, I saw that, when creating a Vector from a MultiVector, we created subviews for comm buffers, but did not store them. This commit stores them. It also offsets the buffers by the vector j requested from the MultiVector.
…s:develop' (d17489d). * trilinos-develop: (29 commits) Tpetra: Correcting unecessary extraction of remotes for A matrix in Jacobi Tpetra: MMM Modifications to avoid remotemap construction in serial Ifpack2: pass correct "symmetric" flag to MTSGS setup MueLu: Adding timer granularity to SaP Fix const_cast in Tpetra, Amesos and Ifpack2 SEACAS: Remove locale setting Teuchos utils: fix another stacked timer plotting bug tpetra: In trilinos#6598, @mhoemmen recommended this change of offset Tpetra::CrsMatrix: Add Kokkos kernel labels; expose debug code Tpetra::CrsMatrix: Remove values2D_ Tpetra::CrsGraph: Remove gblInds2D_ Tpetra::CrsGraph: Remove lclInds2D_ Tpetra::CrsMatrix: Remove unused method allocateValues2D Tpetra: Use verbosePrintCountThreshold in copyOffsets Tpetra::Details::Behavior: Add longRowMinNumEntries Tpetra::Details::Behavior: Factor out size_t reading SEACAS: go back to lib:fmt 6.0.0 until fix issue on vortex xl/cuda build Disable Teko_testdriver_tpetra_MPI_4 in all atdm 'waterman' builds (trilinos#6463) zoltan2: add missing include file for non-ETI builds Tpetra: Missed ifdef guard ...
…s:develop' (d17489d). * trilinos-develop: (29 commits) Tpetra: Correcting unecessary extraction of remotes for A matrix in Jacobi Tpetra: MMM Modifications to avoid remotemap construction in serial Ifpack2: pass correct "symmetric" flag to MTSGS setup MueLu: Adding timer granularity to SaP Fix const_cast in Tpetra, Amesos and Ifpack2 SEACAS: Remove locale setting Teuchos utils: fix another stacked timer plotting bug tpetra: In trilinos#6598, @mhoemmen recommended this change of offset Tpetra::CrsMatrix: Add Kokkos kernel labels; expose debug code Tpetra::CrsMatrix: Remove values2D_ Tpetra::CrsGraph: Remove gblInds2D_ Tpetra::CrsGraph: Remove lclInds2D_ Tpetra::CrsMatrix: Remove unused method allocateValues2D Tpetra: Use verbosePrintCountThreshold in copyOffsets Tpetra::Details::Behavior: Add longRowMinNumEntries Tpetra::Details::Behavior: Factor out size_t reading SEACAS: go back to lib:fmt 6.0.0 until fix issue on vortex xl/cuda build Disable Teko_testdriver_tpetra_MPI_4 in all atdm 'waterman' builds (trilinos#6463) zoltan2: add missing include file for non-ETI builds Tpetra: Missed ifdef guard ...
@trilinos/tpetra
Motivation
When looking at #6158 , I noticed that the constructor of a vector from a multivector created subviews of the multivector's communication buffer, but never kept/used the subviews. Communication routines allocate memory if needed, so the vectors worked fine. But using the subviews should reduce the number of memory allocations.
I discussed these changes with @mhoemmen before implementing the changes.
Stakeholder Feedback
Testing
Tested on linux with Tpetra tests and downstream packages Belos, Ifpack2, MueLu, Zoltan2, and Teko.
Results: