Dist1 coloring PPS improvements #578

brian-kelley · 2020-01-23T22:39:29Z

Distance-1 coloring with parallel prefix sum worklist used to require 2 passes: a scan to get new worklist index view, and a for to populate the worklist. The original paper about it even says that the 2-pass approach is a downside of PPS compared to atomics.

This makes PPS use a single scan pass that scatters directly to the new worklist without an index view. This saves some time, one kernel launch per iteration and a significant amount of memory: (num_edges / 2) * sizeof(nnz_lno_t) for EB, and num_rows * sizeof(nnz_lno_t) for VB.

Also, renamed _conflictlist member to _conflictlist_scheme for clarity - it's not a list, it's an enum representing how the worklist is built. I also use those actual enum constants (COLORING_NOCONFLICT, COLORING_ATOMIC, COLORING_PPS) instead of the raw integer values (0,1,2) throughout the code.

Bowman checks:
#######################################################
PASSED TESTS
#######################################################
intel-16.4.258-Pthread-release build_time=724 run_time=1027
intel-16.4.258-Pthread_Serial-release build_time=1034 run_time=1998
intel-16.4.258-Serial-release build_time=706 run_time=963
intel-17.2.174-OpenMP-release build_time=869 run_time=566
intel-17.2.174-OpenMP_Serial-release build_time=1225 run_time=1444
intel-17.2.174-Pthread-release build_time=814 run_time=899
intel-17.2.174-Pthread_Serial-release build_time=1131 run_time=1803
intel-17.2.174-Serial-release build_time=778 run_time=891

RIDE checks:
#######################################################
PASSED TESTS
#######################################################
cuda-9.2.88-Cuda_OpenMP-release build_time=517 run_time=418
cuda-9.2.88-Cuda_Serial-release build_time=507 run_time=527
gcc-6.4.0-OpenMP_Serial-release build_time=194 run_time=396
gcc-7.2.0-OpenMP-release build_time=120 run_time=128
gcc-7.2.0-OpenMP_Serial-release build_time=214 run_time=357
gcc-7.2.0-Serial-release build_time=112 run_time=226

instead of magic numbers 0,1,2

PPS worklist construction (VB and EB) now happens in a single scan, without using a temporary array to store indices. Faster and saves memory.

srajama1

Thanks @brian-kelley ! Please see comments below.

srajama1 · 2020-01-27T16:13:41Z

src/graph/impl/KokkosGraph_Distance1Color_impl.hpp

-        Kokkos::parallel_for ("KokkosGraph::GraphColoring::CreateNewWorkArray",
-            my_exec_space(0, current_vertexListLength_),
-            create_new_work_array<nnz_lno_temp_work_view_t>(current_vertexList_, next_iteration_recolorList_, pps_work_view));
+            ppsWorklistFunctorVB<nnz_lno_temp_work_view_t>(this->nv, current_vertexList_, next_iteration_recolorList_));


Does the values in recolocList need to be reinitialized ?

@srajama1 It doesn't need to be initialized because the exact size of the next worklist is produced by a parallel_reduce of functorFindConflicts.

I meant reinitialized, from second iteration to third iteration

src/graph/impl/KokkosGraph_Distance1Color_impl.hpp

srajama1

@brian-kelley gave me a lesson on why we don't need the parallel prefix array.

Thanks @brian-kelley

brian-kelley added 4 commits January 22, 2020 14:13

D1 color: use conflict resolution scheme enum

8c09b6a

instead of magic numbers 0,1,2

Removed the need for pps work view, fused kernels

79570ef

PPS worklist construction (VB and EB) now happens in a single scan, without using a temporary array to store indices. Faster and saves memory.

Fix shadow warning

f119752

Removed EB pps work view

3de3477

brian-kelley added the enhancement label Jan 23, 2020

brian-kelley requested review from srajama1 and ndellingwood January 23, 2020 22:39

brian-kelley self-assigned this Jan 23, 2020

srajama1 reviewed Jan 27, 2020

View reviewed changes

srajama1 approved these changes Jan 29, 2020

View reviewed changes

brian-kelley merged commit b0840bd into kokkos:develop Jan 29, 2020

brian-kelley deleted the Dist1PPS branch January 29, 2020 18:33

kokkos-devops-admin mentioned this pull request Nov 11, 2021

Add Batched CG and Batched GMRES #1155

Merged

kokkos-devops-admin mentioned this pull request May 23, 2024

Interface for LAPACK geqrf() #2205

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Dist1 coloring PPS improvements #578

Dist1 coloring PPS improvements #578

brian-kelley commented Jan 23, 2020

srajama1 left a comment

srajama1 Jan 27, 2020

brian-kelley Jan 27, 2020

srajama1 Jan 27, 2020

srajama1 left a comment

Dist1 coloring PPS improvements #578

Dist1 coloring PPS improvements #578

Conversation

brian-kelley commented Jan 23, 2020

srajama1 left a comment

Choose a reason for hiding this comment

srajama1 Jan 27, 2020

Choose a reason for hiding this comment

brian-kelley Jan 27, 2020

Choose a reason for hiding this comment

srajama1 Jan 27, 2020

Choose a reason for hiding this comment

srajama1 left a comment

Choose a reason for hiding this comment