Tpetra BCRS: Improve vectorization of small dense linear algebra operations #180

mhoemmen · 2016-03-06T13:17:44Z

@trilinos/tpetra @trilinos/ifpack2 @crtrott @kyungjoo-kim @amklinv

Tpetra::Experimental::BlockCrsMatrix uses the small dense linear algebra operations currently implemented in Tpetra_Experimental_BlockView.hpp. These operations take Kokkos::View or LittleVector / LittleBlock. (Their interfaces are enough alike from the perspective of these operations, that we need only consider Kokkos::View in what follows, without loss of generality.) For example, Tpetra::Experimental::GEMV (small dense matrix times small dense vector) takes a rank-2 View (the matrix) and two rank-1 Views (input and output vectors).

Discussions a couple weeks ago with @nmhamster suggested that we could get outer loop vectorization by doing the following:

Change the storage layout so that the (i,j) entries of consecutive blocks (or the (i) entries of consecutive vectors) are stored contiguously
Linear algebra operations on those small dense blocks would then need to take a whichBlock / whichVector index argument, to tell which block / vector to use

The routines wouldn't change, except that instead of writing A(i,j) or x(k) (for example), we would write A(i,j,whichBlock) or x(k,whichBlock). We have to rely on Kokkos::View::operator() to inline, but this is a much easier approach than explicit SIMD.

This depends on #177 and #179.

srajama1 · 2016-03-09T02:05:25Z

This is one option, but the easier and more portable one is to make GEMV aware of vector or cuda threads. Use VectorRange or whatever Kokkos calls it and then give the team handles to GEMV. In that case we avoid explicit indexing and a specific storage format that is advocated above.

srajama1 · 2016-03-09T02:06:01Z

I should have said came here from #178.

mhoemmen · 2016-03-09T03:14:13Z

This is one option, but the easier and more portable one is to make GEMV aware of vector or cuda threads. Use VectorRange or whatever Kokkos calls it and then give the team handles to GEMV.

Sure, but sometimes users really want to work on one block at a time. Plus, this could be a low-level building block for a team version of GEMV.

In that case we avoid explicit indexing and a specific storage format that is advocated above.

The above doesn't require a specific storage format, other than that the View is 3-D. Whatever, I'm not committed to this interface, just make it fast.

crtrott · 2016-03-15T22:20:28Z

Ok I think we delay this and have to think about what options there are to get outer loop vectorization (if we want that at all). I don't necessarily believe the proposed solution is our best way forward.

srajama1 · 2016-03-15T22:32:37Z

I am assuming you are saying the proposed solution in original issue. I am not a big fan either as I said in my comment above. It is better do it correct once.

mhoemmen · 2016-03-17T07:12:27Z

I am assuming you are saying the proposed solution in original issue. I am not a big fan either as I said in my comment above. It is better do it correct once.

I changed the title to reflect the desired outcome rather than the suggested implementation strategy.

mhoemmen · 2016-06-04T23:50:26Z

This issue is a little bit too abstract, so I'm closing it. Would prefer more concrete issues like #416, or epics with goals that have concrete metrics.

mhoemmen added pkg: Tpetra pkg: Ifpack2 system: manycore ATDM labels Mar 6, 2016

mhoemmen added this to the Tpetra BCRS Kokkos-ization milestone Mar 6, 2016

mhoemmen added the type: help wanted label Mar 7, 2016

mhoemmen mentioned this issue Mar 9, 2016

Tpetra BCRS: Thread-parallelize sparse matrix-vector multiply #178

Closed

jwillenbring removed the ATDM label Mar 9, 2016

mhoemmen changed the title ~~Tpetra BCRS: Add "which block / vector" argument to small dense linear algebra operations~~ Tpetra BCRS: Improve vectorization of small dense linear algebra operations Mar 17, 2016

mhoemmen closed this as completed Jun 4, 2016

trilinos-autotester mentioned this issue Nov 23, 2021

Trilinos Master Merge PR Generator: Auto PR created to promote from master_merge_20211120_000554 branch to master #9961

Closed

trilinos-autotester mentioned this issue Dec 14, 2021

MueLu: [WIP] Added code that improves coverage but failed with cdash #10013

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Tpetra BCRS: Improve vectorization of small dense linear algebra operations #180

Tpetra BCRS: Improve vectorization of small dense linear algebra operations #180

mhoemmen commented Mar 6, 2016

srajama1 commented Mar 9, 2016

srajama1 commented Mar 9, 2016

mhoemmen commented Mar 9, 2016

crtrott commented Mar 15, 2016

srajama1 commented Mar 15, 2016

mhoemmen commented Mar 17, 2016

mhoemmen commented Jun 4, 2016

Tpetra BCRS: Improve vectorization of small dense linear algebra operations #180

Tpetra BCRS: Improve vectorization of small dense linear algebra operations #180

Comments

mhoemmen commented Mar 6, 2016

srajama1 commented Mar 9, 2016

srajama1 commented Mar 9, 2016

mhoemmen commented Mar 9, 2016

crtrott commented Mar 15, 2016

srajama1 commented Mar 15, 2016

mhoemmen commented Mar 17, 2016

mhoemmen commented Jun 4, 2016