Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Tpetra MultiVector and BlockMultiVector refactor to remove UVM requir…
…ement (trilinos#8821) * Tpetra: add new user-friendly MV view access Also add new "owningView_" DualView member that refers to the actual original DV (not a subview of anything else). This is the DualView to sync in order to maintain consistency regardless of how MultiVectors alias each other. 4 new view accessor functions: getLocalView[Host|Device][Non]Const() - Respect constness - Manage syncs and modifies for the user - Prevent taking out a view in one space while any view in the other space is live. - Existing getLocalView()/getLocalViewHost()/getLocalViewDevice() just have the reference count checking added (no sync/modify). This has no effect for HostSpace or CudaUVMSpace since those host mirrors match the device views. * Tpetra - fix MV test 14. * Tpetra - fix item 17 * Tpetra - fix item 20 * Tpetra - fix item 23 * Tpetra - fix item 28 * Tpetra - fix item 29 * Tpetra - fix item 35 * Tpetra - workaround for item 30 * Tpetra: Modifying Bug7758 test to use the new getLocalViewHostConst (which will make sure things are actually sync'd) * Tpetra: fix MV [un]pack to respect host/device refcounts * fix nonconst in Bug7745 * Tpetra: stashing * Tpetra - issue 354 fix * Tpetra: refactor sameObject so it doesn't simultaneously ask for host and device views * Tpetra: remove static_assert, fix getLocalView() ret type Remove bad static_assert that tripped for Cuda/CudaUVMSpace build. Correct MultiVector::getLocalView() return type to be exactly consistent with DualView::view(). * tpetra: fixed error in MultiVector pack that caused failures with UVM=ON * tpetra: Fix for FEMultivector -- rather than take the subview of a DualView and create a new vector with it, use the MultiVector constructor that gets "offset" views of a vector (in which @brian-kelley has the owningView_ working correctly). While I was at it, I added a swap of the owningView_ to the MultiVector swap() function. * Tpetra: Fixing ImportExport/Issue3968: The tests uses sync_to* without changing the modify flags, which mucks up our internal tracking * tpetra: fix to work without UVM * tpetra: changed getLocalViewHost/Device to new Const/NonConst versions as appropriate. trilinos#8591 Did not change getLocalView as the Const/NonConst versions of getLocalView do not exist yet Did not change MV_reduce_strided to avoid creating conflicts for @brian-kelley * tpetra: change getLocalViewHost to appropriate Const/NonConst version trilinos#8591 * Tpetra: Modifying MultiVector to remove all references to old getLocalViewX functions * Tpetra: More getLocalView mods * Tpetra: Lots and lots of fixes to tests to use the new getLocalView<thing>Const/NonConst functions * Tpetra: Fixing scaleBlockDiagonal signature as per Brian * Tpetra: Fixes to the BlockView test to work correctly with UVM=OFF * Tpetra: Fixing MultiVector print outs for help with non-unified memory debugging * Tpetra - missing getlocal view "device" * Tpetra: public Access:: ReadOnly/ReadWrite/WriteOnly Make WithLocalAccess use these tags instead of internal Details:: ones. These will also be used for the new MultiVector view access interface. * moving from getLocalView... to getLocalView...(Tpetra::Accesspattern) * Tpetra - get1dview logic change * Tpetra, WIP: using new tagged view access * Tpetra: use new interface for all MV getLocalView * tpetra: removed unneeded include file * Tpetra: Tags! * Tpetra: Tags! * Tpetra: Fixing more tests * Tpetra: Fixing more tests * Tpetra: Fixing more tests * Tpetra: Fixing more tests * Tpetra: Fixing more tests * Tpetra: Fixing tests * Tpetra: Fixing tests * Tpetra: Fixing tests * tpetra: copied implementation of getLocalViewHost and getLocalViewDevice from templated getLocalView, as the getLocalView version does not work. This commit may be temporary, but it allows us to make progress on other bugs while someone figures out the template-fu. Sorry for the debugging statements; we'll get rid of those eventually. * adding localview tests * tpetra: getLocalView<template> now works. cleaned up my obnoxious print statements kept Host and Device implementations that do NOT use getLocalView. * tpetra: added Tpetra::Access to many getLocalView<> instances Tests still pass with UVM=ON. * Tpetra: Removing the dreaded parantheses from the Access tags * Manually intercept UVM allocations, throw exception Effectively makes it impossible for any UVM allocations to exist (except for Stokhos, which calls cudaMallocManaged directly) * Tpetra: Deprecate old getLocalView functions * Allow UVM allocations when Kokkos_ENABLE_CUDA_UVM=ON * tpetra: changed getLocalView to use access tags and getLocalViewDevice * tpetra: added access tags to getLocalView(); fixed scope of some pointers * xpetra: fixes to allow compilation * WIP: deprecate getLocalBlock and start adding tagged overloads * Tpetra: rewrite allReduceView to work with non-UVM allReduceView had one bug and one sub-optimal thing: - Tried to make a view copy with both layout and device different - Kokkos can't do that in a single deep_copy - If a LayoutStride -> contiguous copy needed to be made, it always used LayoutLeft. If one of the input/output views was LayoutStride and the other was LayoutRight, they would both be copied to LayoutLeft. Now, use LayoutRight in this case. Some utilities to help manage layouts and MPI + Kokkos views in general are in the new file temporaryViewUtils.hpp: layout unification, making a contiguous view, and making an MPI-safe view. In the future these can be used to clean up idot and iallreduce without losing efficiency. * Tpetra: Block MultiVector correctly uses getLocalView; removed stored pointer * fix host device type for const_little_host_vec_type * tpetra: clean up of BlockMultiVector fixes * Tpetra: deprecated held pointer mvData_ * tpetra: removed modifies without syncs; fixed MueLu tests * Tpetra - removing sync in ScaleAndAssign test * Tpetra - unit test is okay without modify and sync flags * Tpetra - test passes without modify and sync operations * Tpetra - remove unnecessary sync modify clear state flags * Tpetra - remove multi vector sync/modify/ things * Tpetra - remove sync modify things in other places * Tpetra: remove withLocalAccess, for_each, transform The new MV::getLocalView interface is a simpler substitute for these. * Issue 8391. Switched to C++17 standard for GCC 8.3 build. * FROSch: Convert enum NullSpaceType to scoped enum By converting the enum to an enum class NullSpaceType, one is forced to use the enum class and cannot replace it with integers anymore. This guarantees, that the expressive enum class is used in implementations rather than the implicitly encoded integers. * Patch in KokkosKernels trilinos#872 (fix trilinos#8727, TeamPolicy team size too large in sort_crs_*) Adds the KokkosKernels unit test that replicated this issue. * MueLu: Adding Aggregate size percentiles to AggregateQuality * Moved Tpetra CRS GS into Ifpack2 Relaxation * Moved BlockCrs GS functionality into Relaxation * Enabled new local GS code for CRS * Reduce redundant code in CRS (GS/SGS use same fn) * Using refactored block CRS local apply, unify GS/SGS * More refactoring to get rid of redundant functions * Added required syncs/modifies for vectors * Removed unneeded !constantStride paths * Use cached MV to replace getColumnMapMV from CrsMatrix * Ifpack2: remove unneeded includes * Ifpack2: undo some find-and-replace in comments Undoing some "Node" -> "node_type" * MueLu: undo CMake change, should be its own PR * MueLu: in configure, print out missing ETI setting During configure, MueLu prints out the type combinations to ETI. Add <complex, int, long long> to this, since it was missing. * tpetra: treat WriteOnly of subviews as ReadOnly. * Ifpack2: in RBILUK, use tagged BMV::getLocalBlock * Tpetra: add comment with caveat on BMV::getLocalBlock(i, j, WriteOnly) * tpetra: separated BugTests.cpp into separate test files so that we can disable them separately (since they exercise different classes). * Ifpack2: update BMV getLocalBlock calls to use tagged access, and not use manual sync/modify (which has been removed). With UVM, all Tpetra,Belos,Ifpack2,MueLu tests pass. * more test changes * mv localview tests * wrapped up 6 tests for new behaviors * tpetra: scoping fix for Bug7234.cpp; more output from getLocalView* when error occurs, as in parallel runs, throw messages weren't always printed (e.g., from doExport when only 3/4 processors failed) * Tpetra: add MV::aliases(const MV& other) This allows a user to see if two MVs overlap, without actually getting the local views and possibly hitting the reference count checker. * Ifpack2: const correctness, use new getLocalView - Throughout Ifpack2, remove manual sync/modify and calls to deprecated getLocalView. Use tagged getLocalView instead. - In BlockRelaxation and the Containers, change interfaces to use const on views and multivectors that aren't actually modified * Tpetra: fix one MV LocalView test, comment out another We will make sure fix is OK, then uncomment and fix the other * tpetra: enable some Tpetra tests without UVM * tpetra: fix test for non-Cuda builds * Ifpack2: fix more constness of apply vectors * Kokkos: allow CudaUVMSpace::allocate again Roll back change that made CudaUVMSpace::allocate throw when UVM was not the default memory space for Cuda. * tpetra: changes needed to build with DEPRECATED_CODE=OFF trilinos#8821 * fix remaining test * Tpetra - fix for nox failure * Thyra: added missing fences to euclidean apply operations used in MvTimesMatAddMv; the fences resolve test failures with CUDA_LAUNCH_BLOCKING=0 and cleaner sync/modify in tpetra @rppawlo Tpetra: the fences above provide a more surgical fix to the test errors seen in trilinos#8821; this commit removes fences from getLocalView*(ReadOnly). @kyungjoo-kim Belos: preventive fence added with @hkthorn's blessing to mimic those in Thyra. * tpetra: added fence between device kernels and retrieving blocks on host trilinos#8821 * Ifpack2: Minor fix * DualView: make fencing behavior in sync consistent sync<Device>() does extra exec space fences if the dev/host memory spaces are the same. This was missing in sync_host/sync_device, so this adds it there. Makes all Ifpack2 tests for UVM without launch blocking. * tpetra: exercise the Teuchos-based interfaces, too * changed access control from WriteOnly to OverwriteAll because semantics mean things * WIP: fixing idot for MV dualview refactor And some udpates to ifpack2 and amesos2 about that. Working around Kokkos issue trilinos#3850 where the templated getLocalView was used. * WIP: idot/iallreduce cleanup * Tpetra: finish idot/iallreduce refactor * Fixed iallreduce test for non-uvm device * Belos: use new Tpetra MV view interface * Cleanup * Remove extra dualview sync fences * Ifpack2 passes without launch blocking except RBILUK. * Ifpack2: add temporary fence in RBILUK for BlockCrs Later it should be possible to replace this fence with a refactored DualView interface to BlockCrs. * Tpetra: add a global reduce to a test so it will fail when only one proc is failing * Tpetra: fix some typos in a Map unit test * Tpetra: remove deprecated sync/modify calls from a unit test * Ifpack2: fix impl_scalar/scalar mismatch * Tpetra: remove/update remaining mentions of Gauss-Seidel * Tpetra: fix iallreduce for builds without MPI * Ifpack2: revert commenting out try/catch Was causing unused var warning * Ifpack2: Fixing vector mode mistake * tpetra, ifpack2: fixing several access mode errors * Tpetra: use new MV view interface in Bug8794 test * Amesos2: revert using tagged Tpetra MV getLocalView for some reason, using ReadOnly tag to access MV view in TpetraMultivecAdapter caused solve solution to not get copied back to the Tpetra multivector. This is surprising because the views were just used as the source for a Kokkos deep copy, and this caused BlockRelaxation in Ifpack2 to fail for serial node (in which DualViews are trivial, and all kernels are synchronous) * Ifpack2: add back tag clobbered by merge * kokkos: patch from kokkos/kokkos#3857 * comment out all the instances of TPETRA_DEPRECATED (trilinos#9023) * MueLu: add fence for recent intrepid2 changes Fixes MueLu-Intrepid2 unit tests, uvm, no launch blocking. * Tpetra: restore MV_reduce_strided test. Key: use the MV (map, dualview, orig_dualview) constructor instead of the (map, dualview) constructor. If $dualview is noncontiguous, the first one lets you pass orig_dualview as the contiguous super-view containing dualview, and orig_dualview can be sync'd without problems. Also modify TempView::toLayout() to test span_is_contiguous, rather than assuming that (Layout != LayoutStride) implies contiguous. * tpetra: Removed deprecated sync_device calls * Tpetra: Remove some MultiVector that were checking modification state (trilinos#9032) * Tpetra: Deprecate need_sync* in MultiVector * Tpetra: for now, we won't deprecate need_sync_host/device * tpetra: removed instantiations of removed tests * Tpetra: don't use CudaSpace in nonblocking collectives OpenMPI does not support Cuda device buffers for nonblocking collectives like MPI_Iallreduce, even with a Cuda-aware installation. * Fix old typo in Ifpack2_UnitTestBlockRelaxation * Fix access tag: OverwriteAll -> ReadWrite Tpetra::COPY takes src then dst (opposite order to Kokkos deep_copy) so Y_cur is being read at first and written later. * Undo bad DualView merge Co-authored-by: Brian Kelley <[email protected]> Co-authored-by: Kyungjoo Kim <[email protected]> Co-authored-by: Chris Siefert <[email protected]> Co-authored-by: Geoff Danielson <[email protected]> Co-authored-by: Timothy A. Smith <[email protected]> Co-authored-by: James M. Willenbring <[email protected]> Co-authored-by: Matthias Mayr <[email protected]> Co-authored-by: Timothy Smith <[email protected]>
- Loading branch information