Skip to content

Commit

Permalink
Tpetra MultiVector and BlockMultiVector refactor to remove UVM requir…
Browse files Browse the repository at this point in the history
…ement (#8821)

* Tpetra: add new user-friendly MV view access

Also add new "owningView_" DualView member that refers to
the actual original DV (not a subview of anything else). This
is the DualView to sync in order to maintain consistency regardless
of how MultiVectors alias each other.

4 new view accessor functions: getLocalView[Host|Device][Non]Const()

- Respect constness
- Manage syncs and modifies for the user
- Prevent taking out a view in one space while any view in the other
space is live.
- Existing getLocalView()/getLocalViewHost()/getLocalViewDevice() just
have the reference count checking added (no sync/modify). This has no
effect for HostSpace or CudaUVMSpace since those host mirrors match the
device views.

* Tpetra - fix MV test 14.

* Tpetra - fix item 17

* Tpetra - fix item 20

* Tpetra - fix item 23

* Tpetra - fix item 28

* Tpetra - fix item 29

* Tpetra - fix item 35

* Tpetra - workaround for item 30

* Tpetra: Modifying Bug7758 test to use the new getLocalViewHostConst (which will make sure things are actually sync'd)

* Tpetra: fix MV [un]pack to respect host/device refcounts

* fix nonconst in Bug7745

* Tpetra: stashing

* Tpetra - issue 354 fix

* Tpetra: refactor sameObject so it doesn't simultaneously ask for host and device views

* Tpetra: remove static_assert, fix getLocalView() ret type

Remove bad static_assert that tripped for Cuda/CudaUVMSpace build.
Correct MultiVector::getLocalView() return type to be exactly consistent with
DualView::view().

* tpetra:  fixed error in MultiVector pack that caused failures with UVM=ON

* tpetra:  Fix for FEMultivector -- rather than take the subview of a
DualView and create a new vector with it, use the MultiVector
constructor that gets "offset" views of a vector (in which
@brian-kelley has the owningView_ working correctly).
While I was at it, I added a swap of the owningView_ to the MultiVector
swap() function.

* Tpetra: Fixing ImportExport/Issue3968:  The tests uses sync_to* without changing the modify flags, which mucks up our internal tracking

* tpetra:  fix to work without UVM

* tpetra:  changed getLocalViewHost/Device to new Const/NonConst versions
as appropriate. #8591
Did not change getLocalView as the Const/NonConst versions of
getLocalView do not exist yet
Did not change MV_reduce_strided to avoid creating conflicts for
@brian-kelley

* tpetra: change getLocalViewHost to appropriate Const/NonConst version #8591

* Tpetra: Modifying MultiVector to remove all references to old getLocalViewX functions

* Tpetra: More getLocalView mods

* Tpetra: Lots and lots of fixes to tests to use the new getLocalView<thing>Const/NonConst functions

* Tpetra: Fixing scaleBlockDiagonal signature as per Brian

* Tpetra: Fixes to the BlockView test to work correctly with UVM=OFF

* Tpetra: Fixing MultiVector print outs for help with non-unified memory debugging

* Tpetra - missing getlocal view "device"

* Tpetra: public Access:: ReadOnly/ReadWrite/WriteOnly

Make WithLocalAccess use these tags instead of internal Details:: ones.
These will also be used for the new MultiVector view access interface.

* moving from getLocalView... to getLocalView...(Tpetra::Accesspattern)

* Tpetra - get1dview logic change

* Tpetra, WIP: using new tagged view access

* Tpetra: use new interface for all MV getLocalView

* tpetra:  removed unneeded include file

* Tpetra: Tags!

* Tpetra: Tags!

* Tpetra: Fixing more tests

* Tpetra: Fixing more tests

* Tpetra: Fixing more tests

* Tpetra: Fixing more tests

* Tpetra: Fixing more tests

* Tpetra: Fixing tests

* Tpetra: Fixing tests

* Tpetra: Fixing tests

* tpetra: copied implementation of getLocalViewHost and getLocalViewDevice
from templated getLocalView, as the getLocalView version does not work.
This commit may be temporary, but it allows us to make progress on other
bugs while someone figures out the template-fu.
Sorry for the debugging statements; we'll get rid of those eventually.

* adding localview tests

* tpetra:  getLocalView<template> now works.
cleaned up my obnoxious print statements
kept Host and Device implementations that do NOT use getLocalView.

* tpetra:  added Tpetra::Access to many getLocalView<> instances
Tests still pass with UVM=ON.

* Tpetra: Removing the dreaded parantheses from the Access tags

* Manually intercept UVM allocations, throw exception

Effectively makes it impossible for any UVM allocations to
exist (except for Stokhos, which calls cudaMallocManaged directly)

* Tpetra: Deprecate old getLocalView functions

* Allow UVM allocations when Kokkos_ENABLE_CUDA_UVM=ON

* tpetra:  changed getLocalView to use access tags and getLocalViewDevice

* tpetra:  added access tags to getLocalView(); fixed scope of some pointers

* xpetra:  fixes to allow compilation

* WIP: deprecate getLocalBlock and start adding tagged overloads

* Tpetra: rewrite allReduceView to work with non-UVM

allReduceView had one bug and one sub-optimal thing:
- Tried to make a view copy with both layout and device different -
  Kokkos can't do that in a single deep_copy
- If a LayoutStride -> contiguous copy needed to be made, it always used
  LayoutLeft. If one of the input/output views was LayoutStride and the
  other was LayoutRight, they would both be copied to LayoutLeft. Now, use
  LayoutRight in this case.

Some utilities to help manage layouts and MPI + Kokkos views in general
are in the new file temporaryViewUtils.hpp: layout unification,
making a contiguous view, and making an MPI-safe view.
In the future these can be used to clean up idot and
iallreduce without losing efficiency.

* Tpetra:  Block MultiVector correctly uses getLocalView; removed stored pointer

* fix host device type for const_little_host_vec_type

* tpetra:  clean up of BlockMultiVector fixes

* Tpetra:  deprecated held pointer mvData_

* tpetra:  removed modifies without syncs; fixed MueLu tests

* Tpetra - removing sync in ScaleAndAssign test

* Tpetra - unit test is okay without modify and sync flags

* Tpetra - test passes without modify and sync operations

* Tpetra - remove unnecessary sync modify clear state flags

* Tpetra - remove multi vector sync/modify/ things

* Tpetra - remove sync modify things in other places

* Tpetra: remove withLocalAccess, for_each, transform

The new MV::getLocalView interface is a simpler substitute for these.

* Issue 8391. Switched to C++17 standard for GCC 8.3 build.

* FROSch: Convert enum NullSpaceType to scoped enum

By converting the enum to an enum class NullSpaceType, one is forced to
use the enum class and cannot replace it with integers anymore. This
guarantees, that the expressive enum class is used in implementations
rather than the implicitly encoded integers.

* Patch in KokkosKernels #872

(fix #8727, TeamPolicy team size too large in sort_crs_*)
Adds the KokkosKernels unit test that replicated this issue.

* MueLu: Adding Aggregate size percentiles to AggregateQuality

* Moved Tpetra CRS GS into Ifpack2 Relaxation

* Moved BlockCrs GS functionality into Relaxation

* Enabled new local GS code for CRS

* Reduce redundant code in CRS (GS/SGS use same fn)

* Using refactored block CRS local apply, unify GS/SGS

* More refactoring to get rid of redundant functions

* Added required syncs/modifies for vectors

* Removed unneeded !constantStride paths

* Use cached MV to replace getColumnMapMV from CrsMatrix

* Ifpack2: remove unneeded includes

* Ifpack2: undo some find-and-replace in comments

Undoing some "Node" -> "node_type"

* MueLu: undo CMake change, should be its own PR

* MueLu: in configure, print out missing ETI setting

During configure, MueLu prints out the type combinations to ETI.
Add <complex, int, long long> to this, since it was missing.

* tpetra:  treat WriteOnly of subviews as ReadOnly.

* Ifpack2: in RBILUK, use tagged BMV::getLocalBlock

* Tpetra: add comment with caveat

on BMV::getLocalBlock(i, j, WriteOnly)

* tpetra: separated BugTests.cpp into separate test files so that we can
disable them separately (since they exercise different classes).

* Ifpack2: update BMV getLocalBlock calls

to use tagged access, and not use manual sync/modify (which has been
removed). With UVM, all Tpetra,Belos,Ifpack2,MueLu tests pass.

* more test changes

* mv localview tests

* wrapped up 6 tests for new behaviors

* tpetra:  scoping fix for Bug7234.cpp;
more output from getLocalView* when error occurs, as in parallel runs,
throw messages weren't always printed (e.g., from doExport when only
3/4 processors failed)

* Tpetra: add MV::aliases(const MV& other)

This allows a user to see if two MVs overlap, without actually getting
the local views and possibly hitting the reference count checker.

* Ifpack2: const correctness, use new getLocalView

- Throughout Ifpack2, remove manual sync/modify and calls to deprecated
  getLocalView. Use tagged getLocalView instead.
- In BlockRelaxation and the Containers, change interfaces to use const
  on views and multivectors that aren't actually modified

* Tpetra: fix one MV LocalView test, comment out another

We will make sure fix is OK, then uncomment and fix the other

* tpetra:  enable some Tpetra tests without UVM

* tpetra:  fix test for non-Cuda builds

* Ifpack2: fix more constness of apply vectors

* Kokkos: allow CudaUVMSpace::allocate again

Roll back change that made CudaUVMSpace::allocate throw
when UVM was not the default memory space for Cuda.

* tpetra:  changes needed to build with DEPRECATED_CODE=OFF #8821

* fix remaining test

* Tpetra - fix for nox failure

* Thyra: added missing fences to euclidean apply operations used
in MvTimesMatAddMv; the fences resolve test failures with
CUDA_LAUNCH_BLOCKING=0 and cleaner sync/modify in tpetra @rppawlo

Tpetra: the fences above provide a more surgical fix to the test
errors seen in #8821; this commit removes fences from
getLocalView*(ReadOnly).  @kyungjoo-kim

Belos: preventive fence added with @hkthorn's blessing
to mimic those in Thyra.

* tpetra: added fence between device kernels and retrieving blocks on host #8821

* Ifpack2: Minor fix

* DualView: make fencing behavior in sync consistent

sync<Device>() does extra exec space fences if the dev/host memory
spaces are the same. This was missing in sync_host/sync_device, so
this adds it there. Makes all Ifpack2 tests for UVM without launch
blocking.

* tpetra:  exercise the Teuchos-based interfaces, too

* changed access control from WriteOnly to OverwriteAll because semantics mean things

* WIP: fixing idot for MV dualview refactor

And some udpates to ifpack2 and amesos2 about that.
Working around Kokkos issue #3850 where the templated getLocalView was
used.

* WIP: idot/iallreduce cleanup

* Tpetra: finish idot/iallreduce refactor

* Fixed iallreduce test for non-uvm device

* Belos: use new Tpetra MV view interface

* Cleanup

* Remove extra dualview sync fences

* Ifpack2 passes without launch blocking

except RBILUK.

* Ifpack2: add temporary fence in RBILUK for BlockCrs

Later it should be possible to replace this fence with a refactored
DualView interface to BlockCrs.

* Tpetra: add a global reduce to a test so it will fail when only one proc is failing

* Tpetra: fix some typos in a Map unit test

* Tpetra: remove deprecated sync/modify calls from a unit test

* Ifpack2: fix impl_scalar/scalar mismatch

* Tpetra: remove/update remaining mentions of Gauss-Seidel

* Tpetra: fix iallreduce for builds without MPI

* Ifpack2: revert commenting out try/catch

Was causing unused var warning

* Ifpack2: Fixing vector mode mistake

* tpetra, ifpack2:  fixing several access mode errors

* Tpetra: use new MV view interface in Bug8794 test

* Amesos2: revert using tagged Tpetra MV getLocalView

for some reason, using ReadOnly tag to access MV view in
TpetraMultivecAdapter caused solve solution to not get copied back to
the Tpetra multivector. This is surprising because the views were just
used as the source for a Kokkos deep copy, and this caused
BlockRelaxation in Ifpack2 to fail for serial node (in which DualViews
are trivial, and all kernels are synchronous)

* Ifpack2: add back tag clobbered by merge

* kokkos:  patch from kokkos/kokkos#3857

* comment out all the instances of TPETRA_DEPRECATED (#9023)

* MueLu: add fence for recent intrepid2 changes

Fixes MueLu-Intrepid2 unit tests, uvm, no launch blocking.

* Tpetra: restore MV_reduce_strided test.

Key: use the MV (map, dualview, orig_dualview) constructor instead of the
(map, dualview) constructor. If $dualview is noncontiguous, the first one
lets you pass orig_dualview as the contiguous super-view containing
dualview, and orig_dualview can be sync'd without problems.

Also modify TempView::toLayout() to test span_is_contiguous, rather than
assuming that (Layout != LayoutStride) implies contiguous.

* tpetra:  Removed deprecated sync_device calls

* Tpetra: Remove some MultiVector that were checking modification state (#9032)

* Tpetra: Deprecate need_sync* in MultiVector

* Tpetra: for now, we won't deprecate need_sync_host/device

* tpetra:  removed instantiations of removed tests

* Tpetra: don't use CudaSpace in nonblocking collectives

OpenMPI does not support Cuda device buffers for nonblocking collectives
like MPI_Iallreduce, even with a Cuda-aware installation.

* Fix old typo in Ifpack2_UnitTestBlockRelaxation

* Fix access tag: OverwriteAll -> ReadWrite

Tpetra::COPY takes src then dst (opposite order to Kokkos deep_copy) so Y_cur is being read at first and written later.

* Undo bad DualView merge

Co-authored-by: Brian Kelley <[email protected]>
Co-authored-by: Kyungjoo Kim <[email protected]>
Co-authored-by: Chris Siefert <[email protected]>
Co-authored-by: Geoff Danielson <[email protected]>
Co-authored-by: Timothy A. Smith <[email protected]>
Co-authored-by: James M. Willenbring <[email protected]>
Co-authored-by: Matthias Mayr <[email protected]>
Co-authored-by: Timothy Smith <[email protected]>
  • Loading branch information
9 people authored Apr 27, 2021
1 parent 59dd831 commit 34a0760
Show file tree
Hide file tree
Showing 131 changed files with 3,424 additions and 8,698 deletions.
122 changes: 120 additions & 2 deletions cmake/std/PullRequestLinuxCuda10.1.105uvmTestingSettings.cmake
Original file line number Diff line number Diff line change
Expand Up @@ -156,10 +156,128 @@ set (SEACAS_ENABLE_TESTS OFF CACHE BOOL "Turn off tests for non-UVM build")
set (ShyLU_DD_ENABLE_TESTS OFF CACHE BOOL "Turn off tests for non-UVM build")
set (STK_ENABLE_TESTS OFF CACHE BOOL "Turn off tests for non-UVM build")
set (Teko_ENABLE_TESTS OFF CACHE BOOL "Turn off tests for non-UVM build")
set (Tpetra_ENABLE_TESTS OFF CACHE BOOL "Turn off tests for non-UVM build")
set (TpetraCore_ENABLE_TESTS OFF CACHE BOOL "Turn off tests for non-UVM build")
set (Xpetra_ENABLE_TESTS OFF CACHE BOOL "Turn off tests for non-UVM build")
set (Zoltan2_ENABLE_TESTS OFF CACHE BOOL "Turn off tests for non-UVM build")

# Tpetra UVM = OFF tests
set (TpetraCore_BlockCrsMatrix_MPI_4_DISABLE ON CACHE BOOL "Turn off tests for non-UVM build")
set (TpetraCore_Bug5072_MPI_4_DISABLE ON CACHE BOOL "Turn off tests for non-UVM build")
set (TpetraCore_BlankRowBugTest_MPI_2_DISABLE ON CACHE BOOL "Turn off tests for non-UVM build")
set (TpetraCore_iallreduce_MPI_4_DISABLE ON CACHE BOOL "Turn off tests for non-UVM build")
set (TpetraCore_idot_MPI_4_DISABLE ON CACHE BOOL "Turn off tests for non-UVM build")
set (TpetraCore_CrsGraph_UnitTests0_MPI_4_DISABLE ON CACHE BOOL "Turn off tests for non-UVM build")
set (TpetraCore_CrsGraph_UnitTests1_MPI_4_DISABLE ON CACHE BOOL "Turn off tests for non-UVM build")
set (TpetraCore_CrsGraph_UnitTests_Swap_MPI_2_DISABLE ON CACHE BOOL "Turn off tests for non-UVM build")
set (TpetraCore_CrsGraph_ReindexColumns_MPI_4_DISABLE ON CACHE BOOL "Turn off tests for non-UVM build")
set (TpetraCore_Issue601_MPI_8_DISABLE ON CACHE BOOL "Turn off tests for non-UVM build")
set (TpetraCore_CrsGraph_insertGlobalIndicesFiltered_MPI_2_DISABLE ON CACHE BOOL "Turn off tests for non-UVM build")
set (TpetraCore_CrsGraph_PackUnpack_MPI_1_MPI_1_DISABLE ON CACHE BOOL "Turn off tests for non-UVM build")
set (TpetraCore_CrsGraph_getNumDiags_MPI_4_DISABLE ON CACHE BOOL "Turn off tests for non-UVM build")
set (TpetraCore_CrsGraph_UnpackIntoStaticGraph_MPI_4_DISABLE ON CACHE BOOL "Turn off tests for non-UVM build")
set (TpetraCore_CrsGraph_StaticImportExport_MPI_4_DISABLE ON CACHE BOOL "Turn off tests for non-UVM build")
set (TpetraCore_CrsGraph_UnpackMerge_MPI_2_DISABLE ON CACHE BOOL "Turn off tests for non-UVM build")
set (TpetraCore_CrsMatrix_UnitTests_MPI_4_DISABLE ON CACHE BOOL "Turn off tests for non-UVM build")
set (TpetraCore_CrsMatrix_UnitTests2_MPI_4_DISABLE ON CACHE BOOL "Turn off tests for non-UVM build")
set (TpetraCore_CrsMatrix_UnitTests3_MPI_4_DISABLE ON CACHE BOOL "Turn off tests for non-UVM build")
set (TpetraCore_CrsMatrix_UnitTests4_MPI_4_DISABLE ON CACHE BOOL "Turn off tests for non-UVM build")
set (TpetraCore_CrsMatrix_UnitTests_Swap_MPI_2_DISABLE ON CACHE BOOL "Turn off tests for non-UVM build")
set (TpetraCore_CrsMatrix_NonlocalAfterResume_MPI_4_DISABLE ON CACHE BOOL "Turn off tests for non-UVM build")
set (TpetraCore_CrsMatrix_LeftRightScale_MPI_4_DISABLE ON CACHE BOOL "Turn off tests for non-UVM build")
set (TpetraCore_CrsMatrix_2DRandomDist_MPI_4_DISABLE ON CACHE BOOL "Turn off tests for non-UVM build")
set (TpetraCore_CrsMatrix_WithGraph_Cuda_MPI_4_DISABLE ON CACHE BOOL "Turn off tests for non-UVM build")
set (TpetraCore_CrsMatrix_ReplaceDomainMapAndImporter_MPI_4_DISABLE ON CACHE BOOL "Turn off tests for non-UVM build")
set (TpetraCore_CrsMatrix_NonlocalSumInto_MPI_4_DISABLE ON CACHE BOOL "Turn off tests for non-UVM build")
set (TpetraCore_CrsMatrix_NonlocalSumInto_Ignore_MPI_4_DISABLE ON CACHE BOOL "Turn off tests for non-UVM build")
set (TpetraCore_CrsMatrix_Bug5978_MPI_2_DISABLE ON CACHE BOOL "Turn off tests for non-UVM build")
set (TpetraCore_CrsMatrix_Bug6069_1_MPI_3_DISABLE ON CACHE BOOL "Turn off tests for non-UVM build")
set (TpetraCore_CrsMatrix_Bug6069_2_MPI_2_DISABLE ON CACHE BOOL "Turn off tests for non-UVM build")
set (TpetraCore_CrsMatrix_Bug6171_MPI_2_DISABLE ON CACHE BOOL "Turn off tests for non-UVM build")
set (TpetraCore_CrsMatrix_ReplaceLocalValues_MPI_4_DISABLE ON CACHE BOOL "Turn off tests for non-UVM build")
set (TpetraCore_CrsMatrix_ReplaceDiagonal_MPI_4_DISABLE ON CACHE BOOL "Turn off tests for non-UVM build")
set (TpetraCore_CrsMatrix_MultipleFillCompletes_MPI_4_DISABLE ON CACHE BOOL "Turn off tests for non-UVM build")
set (TpetraCore_CrsMatrix_ReindexColumns_MPI_4_DISABLE ON CACHE BOOL "Turn off tests for non-UVM build")
set (TpetraCore_CrsMatrix_TransformValues_MPI_4_DISABLE ON CACHE BOOL "Turn off tests for non-UVM build")
set (TpetraCore_CrsMatrix_GetRowCopy_MPI_4_DISABLE ON CACHE BOOL "Turn off tests for non-UVM build")
set (TpetraCore_CrsMatrix_PackUnpack_MPI_1_MPI_1_DISABLE ON CACHE BOOL "Turn off tests for non-UVM build")
set (TpetraCore_Equilibration_MPI_4_DISABLE ON CACHE BOOL "Turn off tests for non-UVM build")
set (TpetraCore_CrsMatrix_StaticImportExport_MPI_4_DISABLE ON CACHE BOOL "Turn off tests for non-UVM build")
set (TpetraCore_sumIntoStaticProfileExtraSpace_MPI_1_DISABLE ON CACHE BOOL "Turn off tests for non-UVM build")
set (TpetraCore_CrsMatrix_createDeepCopy_MPI_4_DISABLE ON CACHE BOOL "Turn off tests for non-UVM build")
set (TpetraCore_CrsMatrix_UnpackMerge_MPI_2_DISABLE ON CACHE BOOL "Turn off tests for non-UVM build")
set (TpetraCore_CrsMatrix_Bug7745_MPI_4_DISABLE ON CACHE BOOL "Turn off tests for non-UVM build")
set (TpetraCore_CrsMatrix_RemoveEmptyProcesses_MPI_4_DISABLE ON CACHE BOOL "Turn off tests for non-UVM build")
set (TpetraCore_Albany182_MPI_4_DISABLE ON CACHE BOOL "Turn off tests for non-UVM build")
set (TpetraCore_Distributor_CreateFromSendsAndRecvs_MPI_8_DISABLE ON CACHE BOOL "Turn off tests for non-UVM build")
set (TpetraCore_Issue1752_MPI_2_DISABLE ON CACHE BOOL "Turn off tests for non-UVM build")
set (TpetraCore_FECrsGraph_UnitTests_MPI_4_DISABLE ON CACHE BOOL "Turn off tests for non-UVM build")
set (TpetraCore_FECrsMatrix_UnitTests_MPI_4_DISABLE ON CACHE BOOL "Turn off tests for non-UVM build")
set (TpetraCore_FEMultiVector_UnitTests_MPI_4_DISABLE ON CACHE BOOL "Turn off tests for non-UVM build")
set (TpetraCore_FixedHashTableTest_MPI_1_DISABLE ON CACHE BOOL "Turn off tests for non-UVM build")
set (TpetraCore_computeOffsetsFromCounts_MPI_1_DISABLE ON CACHE BOOL "Turn off tests for non-UVM build")
set (TpetraCore_ImportExport_ImportConstructExpert_MPI_4_DISABLE ON CACHE BOOL "Turn off tests for non-UVM build")
set (TpetraCore_UnpackLongRows_MPI_4_DISABLE ON CACHE BOOL "Turn off tests for non-UVM build")
set (TpetraCore_ExportToStaticGraphCrsMatrix_MPI_4_DISABLE ON CACHE BOOL "Turn off tests for non-UVM build")
set (TpetraCore_ImportExport2_UnitTests_MPI_4_DISABLE ON CACHE BOOL "Turn off tests for non-UVM build")
set (TpetraCore_MatrixMarket_Tpetra_CrsMatrix_InOutTest_MPI_4_DISABLE ON CACHE BOOL "Turn off tests for non-UVM build")
set (TpetraCore_MatrixMarket_Tpetra_CrsMatrix_Dist_simple_MPI_1_DISABLE ON CACHE BOOL "Turn off tests for non-UVM build")
set (TpetraCore_MatrixMarket_Tpetra_CrsMatrix_Dist_simple_MPI_3_DISABLE ON CACHE BOOL "Turn off tests for non-UVM build")
set (TpetraCore_MatrixMarket_Tpetra_CrsMatrix_Dist_simple_MPI_4_DISABLE ON CACHE BOOL "Turn off tests for non-UVM build")
set (TpetraCore_MatrixMarket_Tpetra_CrsMatrix_Dist_simple_MPI_6_DISABLE ON CACHE BOOL "Turn off tests for non-UVM build")
set (TpetraCore_MatrixMarket_Tpetra_CrsMatrix_Dist_simple_MPI_10_DISABLE ON CACHE BOOL "Turn off tests for non-UVM build")
set (TpetraCore_MatrixMarket_Tpetra_CrsMatrix_Dist_rmat_MPI_1_DISABLE ON CACHE BOOL "Turn off tests for non-UVM build")
set (TpetraCore_MatrixMarket_Tpetra_CrsMatrix_Dist_rmat_MPI_3_DISABLE ON CACHE BOOL "Turn off tests for non-UVM build")
set (TpetraCore_MatrixMarket_Tpetra_CrsMatrix_Dist_rmat_MPI_4_DISABLE ON CACHE BOOL "Turn off tests for non-UVM build")
set (TpetraCore_MatrixMarket_Tpetra_CrsMatrix_Dist_rmat_MPI_6_DISABLE ON CACHE BOOL "Turn off tests for non-UVM build")
set (TpetraCore_MatrixMarket_Tpetra_CrsMatrix_Dist_rmat_MPI_10_DISABLE ON CACHE BOOL "Turn off tests for non-UVM build")
set (TpetraCore_MatrixMarket_Tpetra_CrsMatrix_Dist_rmat_nodiag_MPI_1_DISABLE ON CACHE BOOL "Turn off tests for non-UVM build")
set (TpetraCore_MatrixMarket_Tpetra_CrsMatrix_Dist_rmat_nodiag_MPI_3_DISABLE ON CACHE BOOL "Turn off tests for non-UVM build")
set (TpetraCore_MatrixMarket_Tpetra_CrsMatrix_Dist_rmat_nodiag_MPI_4_DISABLE ON CACHE BOOL "Turn off tests for non-UVM build")
set (TpetraCore_MatrixMarket_Tpetra_CrsMatrix_Dist_rmat_nodiag_MPI_6_DISABLE ON CACHE BOOL "Turn off tests for non-UVM build")
set (TpetraCore_MatrixMarket_Tpetra_CrsMatrix_Dist_rmat_nodiag_MPI_10_DISABLE ON CACHE BOOL "Turn off tests for non-UVM build")
set (TpetraCore_MatrixMarket_Tpetra_CrsMatrix_Dist_Binary_simple_MPI_1_DISABLE ON CACHE BOOL "Turn off tests for non-UVM build")
set (TpetraCore_MatrixMarket_Tpetra_CrsMatrix_Dist_Binary_simple_MPI_3_DISABLE ON CACHE BOOL "Turn off tests for non-UVM build")
set (TpetraCore_MatrixMarket_Tpetra_CrsMatrix_Dist_Binary_simple_MPI_4_DISABLE ON CACHE BOOL "Turn off tests for non-UVM build")
set (TpetraCore_MatrixMarket_Tpetra_CrsMatrix_Dist_Binary_simple_MPI_6_DISABLE ON CACHE BOOL "Turn off tests for non-UVM build")
set (TpetraCore_MatrixMarket_Tpetra_CrsMatrix_Dist_Binary_simple_MPI_10_DISABLE ON CACHE BOOL "Turn off tests for non-UVM build")
set (TpetraCore_MatrixMarket_Tpetra_CrsMatrix_Dist_Binary_rmat_MPI_1_DISABLE ON CACHE BOOL "Turn off tests for non-UVM build")
set (TpetraCore_MatrixMarket_Tpetra_CrsMatrix_Dist_Binary_rmat_MPI_3_DISABLE ON CACHE BOOL "Turn off tests for non-UVM build")
set (TpetraCore_MatrixMarket_Tpetra_CrsMatrix_Dist_Binary_rmat_MPI_4_DISABLE ON CACHE BOOL "Turn off tests for non-UVM build")
set (TpetraCore_MatrixMarket_Tpetra_CrsMatrix_Dist_Binary_rmat_MPI_6_DISABLE ON CACHE BOOL "Turn off tests for non-UVM build")
set (TpetraCore_MatrixMarket_Tpetra_CrsMatrix_Dist_Binary_rmat_MPI_10_DISABLE ON CACHE BOOL "Turn off tests for non-UVM build")
set (TpetraCore_MatrixMarket_Tpetra_CrsMatrix_Dist_BinaryPerProcess_simple_MPI_1_DISABLE ON CACHE BOOL "Turn off tests for non-UVM build")
set (TpetraCore_MatrixMarket_Tpetra_CrsMatrix_Dist_BinaryPerProcess_simple_MPI_3_DISABLE ON CACHE BOOL "Turn off tests for non-UVM build")
set (TpetraCore_MatrixMarket_Tpetra_CrsMatrix_Dist_BinaryPerProcess_simple_MPI_4_DISABLE ON CACHE BOOL "Turn off tests for non-UVM build")
set (TpetraCore_MatrixMarket_Tpetra_CrsMatrix_Dist_BinaryPerProcess_simple_MPI_6_DISABLE ON CACHE BOOL "Turn off tests for non-UVM build")
set (TpetraCore_MatrixMarket_Tpetra_CrsMatrix_Dist_BinaryPerProcess_simple_MPI_10_DISABLE ON CACHE BOOL "Turn off tests for non-UVM build")
set (TpetraCore_MatrixMarket_Tpetra_CrsMatrix_Dist_BinaryPerProcess_rmat_MPI_1_DISABLE ON CACHE BOOL "Turn off tests for non-UVM build")
set (TpetraCore_MatrixMarket_Tpetra_CrsMatrix_Dist_BinaryPerProcess_rmat_MPI_3_DISABLE ON CACHE BOOL "Turn off tests for non-UVM build")
set (TpetraCore_MatrixMarket_Tpetra_CrsMatrix_Dist_BinaryPerProcess_rmat_MPI_4_DISABLE ON CACHE BOOL "Turn off tests for non-UVM build")
set (TpetraCore_MatrixMarket_Tpetra_CrsMatrix_Dist_BinaryPerProcess_rmat_MPI_6_DISABLE ON CACHE BOOL "Turn off tests for non-UVM build")
set (TpetraCore_MatrixMarket_Tpetra_CrsMatrix_Dist_BinaryPerProcess_rmat_MPI_10_DISABLE ON CACHE BOOL "Turn off tests for non-UVM build")
set (TpetraCore_MatrixMarket_Tpetra_CrsGraph_InOutTest_MPI_4_DISABLE ON CACHE BOOL "Turn off tests for non-UVM build")
set (TpetraCore_MatrixMarket_Operator_Test_MPI_4_DISABLE ON CACHE BOOL "Turn off tests for non-UVM build")
set (TpetraCore_MatrixMatrix_UnitTests_MPI_4_DISABLE ON CACHE BOOL "Turn off tests for non-UVM build")
set (TpetraCore_FECrs_MatrixMatrix_UnitTests_MPI_4_DISABLE ON CACHE BOOL "Turn off tests for non-UVM build")
set (TpetraCore_copyConvert_MPI_1_DISABLE ON CACHE BOOL "Turn off tests for non-UVM build")
set (TpetraCore_StaticView_MPI_1_DISABLE ON CACHE BOOL "Turn off tests for non-UVM build")
set (TpetraCore_RowMatrixTransposer_test_MPI_4_DISABLE ON CACHE BOOL "Turn off tests for non-UVM build")
set (TpetraCore_RowMatrixTransposer_UnitTests_MPI_4_DISABLE ON CACHE BOOL "Turn off tests for non-UVM build")
set (TpetraCore_CrsMatrix_transpose_sortedRows_MPI_4_DISABLE ON CACHE BOOL "Turn off tests for non-UVM build")
set (TpetraCore_lesson03_power_method_MPI_4_DISABLE ON CACHE BOOL "Turn off tests for non-UVM build")
set (TpetraCore_lesson05_redistribution_MPI_4_DISABLE ON CACHE BOOL "Turn off tests for non-UVM build")
set (TpetraCore_FEMAssembly_InsertGlobalIndicesFESP_MPI_4_DISABLE ON CACHE BOOL "Turn off tests for non-UVM build")
set (TpetraCore_FEMAssembly_InsertGlobalIndicesFESPKokkos_MPI_4_DISABLE ON CACHE BOOL "Turn off tests for non-UVM build")
set (TpetraCore_FEMAssembly_TotalElementLoopSP_MPI_4_DISABLE ON CACHE BOOL "Turn off tests for non-UVM build")
set (TpetraCore_FEMAssembly_TotalElementLoopSPKokkos_MPI_4_DISABLE ON CACHE BOOL "Turn off tests for non-UVM build")
set (TpetraCore_AdditiveSchwarzHalo_MPI_4_DISABLE ON CACHE BOOL "Turn off tests for non-UVM build")
set (TpetraCore_BlockCrsPerfTest_MPI_4_DISABLE ON CACHE BOOL "Turn off tests for non-UVM build")
set (TpetraCore_NewReaderExample_MPI_4_DISABLE ON CACHE BOOL "Turn off tests for non-UVM build")
set (TpetraCore_NewReaderExample_rmat_MPI_4_DISABLE ON CACHE BOOL "Turn off tests for non-UVM build")
set (TpetraCore_guide_power_method_1_MPI_4_DISABLE ON CACHE BOOL "Turn off tests for non-UVM build")
set (TpetraCore_guide_matrix_fill_1_MPI_4_DISABLE ON CACHE BOOL "Turn off tests for non-UVM build")
set (TpetraCore_guide_data_redist_1_MPI_4_DISABLE ON CACHE BOOL "Turn off tests for non-UVM build")


# ShyLU_DD UVM = OFF tests
set (ShyLU_DDFROSch_test_thyra_xpetra_laplace_one_rank_TLP_IPOU_DIM3_TPETRA_MPI_1_DISABLE ON CACHE BOOL "Turn off tests for non-UVM build")
set (ShyLU_DDFROSch_test_thyra_xpetra_laplace_one_rank_TLP_GDSW_DIM2_TPETRA_MPI_1_DISABLE ON CACHE BOOL "Turn off tests for non-UVM build")
Expand Down
16 changes: 8 additions & 8 deletions packages/amesos2/src/Amesos2_TpetraMultiVecAdapter_def.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -91,7 +91,7 @@ namespace Amesos2 {
typedef typename multivec_t::dual_view_type dual_view_type;
typedef typename dual_view_type::host_mirror_space host_execution_space;
mv_->template sync<host_execution_space> ();
auto contig_local_view_2d = mv_->template getLocalView<host_execution_space>();
auto contig_local_view_2d = mv_->getLocalViewHost();
auto contig_local_view_1d = Kokkos::subview (contig_local_view_2d, Kokkos::ALL (), 0);
return contig_local_view_1d.data();
}
Expand Down Expand Up @@ -191,7 +191,7 @@ namespace Amesos2 {
typedef typename dual_view_type::host_mirror_space host_execution_space;
redist_mv.template sync < host_execution_space > ();

auto contig_local_view_2d = redist_mv.template getLocalView<host_execution_space>();
auto contig_local_view_2d = redist_mv.getLocalViewHost();
if ( redist_mv.isConstantStride() ) {
for ( size_t j = 0; j < num_vecs; ++j) {
auto av_j = av(lda*j, lda);
Expand All @@ -208,7 +208,7 @@ namespace Amesos2 {
const size_t lclNumRows = redist_mv.getLocalLength();
for (size_t j = 0; j < redist_mv.getNumVectors(); ++j) {
auto av_j = av(lda*j, lclNumRows);
auto X_lcl_j_2d = redist_mv.template getLocalView<host_execution_space> ();
auto X_lcl_j_2d = redist_mv.getLocalViewHost();
auto X_lcl_j_1d = Kokkos::subview (X_lcl_j_2d, Kokkos::ALL (), j);

using val_type = typename decltype( X_lcl_j_1d )::value_type;
Expand Down Expand Up @@ -425,7 +425,7 @@ namespace Amesos2 {
if ( num_vecs == 1 && this->getComm()->getRank() == 0 && this->getComm()->getSize() == 1 ) {
typedef typename multivec_t::dual_view_type::host_mirror_space host_execution_space;
// num_vecs = 1; stride does not matter
auto mv_view_to_modify_2d = mv_->template getLocalView<host_execution_space>();
auto mv_view_to_modify_2d = mv_->getLocalViewHost();
for ( size_t i = 0; i < lda; ++i ) {
mv_view_to_modify_2d(i,0) = new_data[i]; // Only one vector
}
Expand Down Expand Up @@ -464,7 +464,7 @@ namespace Amesos2 {
redist_mv.template modify< host_execution_space > ();

if ( redist_mv.isConstantStride() ) {
auto contig_local_view_2d = redist_mv.template getLocalView<host_execution_space>();
auto contig_local_view_2d = redist_mv.getLocalViewHost();
for ( size_t j = 0; j < num_vecs; ++j) {
auto av_j = new_data(lda*j, lda);
for ( size_t i = 0; i < lda; ++i ) {
Expand All @@ -480,7 +480,7 @@ namespace Amesos2 {
const size_t lclNumRows = redist_mv.getLocalLength();
for (size_t j = 0; j < redist_mv.getNumVectors(); ++j) {
auto av_j = new_data(lda*j, lclNumRows);
auto X_lcl_j_2d = redist_mv.template getLocalView<host_execution_space> ();
auto X_lcl_j_2d = redist_mv.getLocalViewHost();
auto X_lcl_j_1d = Kokkos::subview (X_lcl_j_2d, Kokkos::ALL (), j);

using val_type = typename decltype( X_lcl_j_1d )::value_type;
Expand Down Expand Up @@ -535,7 +535,7 @@ namespace Amesos2 {
// num_vecs = 1; stride does not matter

// If this is the optimized path then kokkos_new_data will be the dst
auto mv_view_to_modify_2d = mv_->getLocalViewDevice();
auto mv_view_to_modify_2d = mv_->getLocalViewDevice(Tpetra::Access::OverwriteAll);
deep_copy_or_assign_view(mv_view_to_modify_2d, kokkos_new_data);
}
else {
Expand Down Expand Up @@ -592,7 +592,7 @@ namespace Amesos2 {
auto host_kokkos_new_data = Kokkos::create_mirror_view(kokkos_new_data);
Kokkos::deep_copy(host_kokkos_new_data, kokkos_new_data);
if ( redist_mv.isConstantStride() ) {
auto contig_local_view_2d = redist_mv.template getLocalView<host_execution_space>();
auto contig_local_view_2d = redist_mv.getLocalViewHost();
for ( size_t j = 0; j < num_vecs; ++j) {
auto av_j = Kokkos::subview(host_kokkos_new_data, Kokkos::ALL, j);
for ( size_t i = 0; i < lda; ++i ) {
Expand Down
6 changes: 6 additions & 0 deletions packages/belos/tpetra/src/BelosMultiVecTraits_Tpetra.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -409,6 +409,12 @@ namespace Belos {
mv.multiply (Teuchos::NO_TRANS, Teuchos::NO_TRANS,
alpha, A, B_mv, beta);
}
Kokkos::fence(); // Belos with Thyra's MvTimesMatAddMv allowed failures
// when fence was not applied after mv.multiply;
// adding the fence fixed the tests in Thyra.
// Out of an abundance of caution (and with blessing
// from @hkthorn), we add the fence here as well.
// #8821 KDD
}

/// \brief <tt>mv := alpha*A + beta*B</tt>
Expand Down
13 changes: 3 additions & 10 deletions packages/belos/tpetra/src/solvers/Belos_Tpetra_GmresSstep.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -76,13 +76,8 @@ class CholQR {
// A^T * A = R^T * R, where R is ncols by ncols upper
// triangular.
int info = 0;
if (R_mv.need_sync_host()) {
// sync R to host before modifying it in place on host
R_mv.sync_host();
}
R_mv.modify_host ();
{
auto R_h = R_mv.getLocalViewHost ();
auto R_h = R_mv.getLocalViewHost (Tpetra::Access::ReadWrite);
int ldr = int (R_h.extent (0));
SC *Rdata = reinterpret_cast<SC*> (R_h.data ());
lapack.POTRF ('U', ncols, Rdata, ldr, &info);
Expand Down Expand Up @@ -114,11 +109,9 @@ class CholQR {
// triangle of R.

// Compute A_cur / R (Matlab notation for A_cur * R^{-1}) in place.
A.sync_device ();
A.modify_device ();
{
auto A_d = A.getLocalViewDevice ();
auto R_d = R_mv.getLocalViewDevice ();
auto A_d = A.getLocalViewDevice (Tpetra::Access::ReadWrite);
auto R_d = R_mv.getLocalViewDevice (Tpetra::Access::ReadOnly);
KokkosBlas::trsm ("R", "U", "N", "N",
one, R_d, A_d);
}
Expand Down
Loading

0 comments on commit 34a0760

Please sign in to comment.