Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tpetra BCRS: Improve vectorization of small dense linear algebra operations #180

Closed
mhoemmen opened this issue Mar 6, 2016 · 7 comments
Closed

Comments

@mhoemmen
Copy link
Contributor

mhoemmen commented Mar 6, 2016

@trilinos/tpetra @trilinos/ifpack2 @crtrott @kyungjoo-kim @amklinv

Tpetra::Experimental::BlockCrsMatrix uses the small dense linear algebra operations currently implemented in Tpetra_Experimental_BlockView.hpp. These operations take Kokkos::View or LittleVector / LittleBlock. (Their interfaces are enough alike from the perspective of these operations, that we need only consider Kokkos::View in what follows, without loss of generality.) For example, Tpetra::Experimental::GEMV (small dense matrix times small dense vector) takes a rank-2 View (the matrix) and two rank-1 Views (input and output vectors).

Discussions a couple weeks ago with @nmhamster suggested that we could get outer loop vectorization by doing the following:

  1. Change the storage layout so that the (i,j) entries of consecutive blocks (or the (i) entries of consecutive vectors) are stored contiguously
  2. Linear algebra operations on those small dense blocks would then need to take a whichBlock / whichVector index argument, to tell which block / vector to use

The routines wouldn't change, except that instead of writing A(i,j) or x(k) (for example), we would write A(i,j,whichBlock) or x(k,whichBlock). We have to rely on Kokkos::View::operator() to inline, but this is a much easier approach than explicit SIMD.

This depends on #177 and #179.

@srajama1
Copy link
Contributor

srajama1 commented Mar 9, 2016

This is one option, but the easier and more portable one is to make GEMV aware of vector or cuda threads. Use VectorRange or whatever Kokkos calls it and then give the team handles to GEMV. In that case we avoid explicit indexing and a specific storage format that is advocated above.

@srajama1
Copy link
Contributor

srajama1 commented Mar 9, 2016

I should have said came here from #178.

@mhoemmen
Copy link
Contributor Author

mhoemmen commented Mar 9, 2016

This is one option, but the easier and more portable one is to make GEMV aware of vector or cuda threads. Use VectorRange or whatever Kokkos calls it and then give the team handles to GEMV.

Sure, but sometimes users really want to work on one block at a time. Plus, this could be a low-level building block for a team version of GEMV.

In that case we avoid explicit indexing and a specific storage format that is advocated above.

The above doesn't require a specific storage format, other than that the View is 3-D. Whatever, I'm not committed to this interface, just make it fast.

@jwillenbring jwillenbring removed the ATDM label Mar 9, 2016
@crtrott
Copy link
Member

crtrott commented Mar 15, 2016

Ok I think we delay this and have to think about what options there are to get outer loop vectorization (if we want that at all). I don't necessarily believe the proposed solution is our best way forward.

@srajama1
Copy link
Contributor

I am assuming you are saying the proposed solution in original issue. I am not a big fan either as I said in my comment above. It is better do it correct once.

@mhoemmen mhoemmen changed the title Tpetra BCRS: Add "which block / vector" argument to small dense linear algebra operations Tpetra BCRS: Improve vectorization of small dense linear algebra operations Mar 17, 2016
@mhoemmen
Copy link
Contributor Author

I am assuming you are saying the proposed solution in original issue. I am not a big fan either as I said in my comment above. It is better do it correct once.

I changed the title to reflect the desired outcome rather than the suggested implementation strategy.

@mhoemmen
Copy link
Contributor Author

mhoemmen commented Jun 4, 2016

This issue is a little bit too abstract, so I'm closing it. Would prefer more concrete issues like #416, or epics with goals that have concrete metrics.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants