-
Notifications
You must be signed in to change notification settings - Fork 579
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Tpetra BCRS: Improve vectorization of small dense linear algebra operations #180
Comments
This is one option, but the easier and more portable one is to make GEMV aware of vector or cuda threads. Use VectorRange or whatever Kokkos calls it and then give the team handles to GEMV. In that case we avoid explicit indexing and a specific storage format that is advocated above. |
I should have said came here from #178. |
Sure, but sometimes users really want to work on one block at a time. Plus, this could be a low-level building block for a team version of GEMV.
The above doesn't require a specific storage format, other than that the View is 3-D. Whatever, I'm not committed to this interface, just make it fast. |
Ok I think we delay this and have to think about what options there are to get outer loop vectorization (if we want that at all). I don't necessarily believe the proposed solution is our best way forward. |
I am assuming you are saying the proposed solution in original issue. I am not a big fan either as I said in my comment above. It is better do it correct once. |
I changed the title to reflect the desired outcome rather than the suggested implementation strategy. |
This issue is a little bit too abstract, so I'm closing it. Would prefer more concrete issues like #416, or epics with goals that have concrete metrics. |
@trilinos/tpetra @trilinos/ifpack2 @crtrott @kyungjoo-kim @amklinv
Tpetra::Experimental::BlockCrsMatrix uses the small dense linear algebra operations currently implemented in Tpetra_Experimental_BlockView.hpp. These operations take Kokkos::View or LittleVector / LittleBlock. (Their interfaces are enough alike from the perspective of these operations, that we need only consider Kokkos::View in what follows, without loss of generality.) For example, Tpetra::Experimental::GEMV (small dense matrix times small dense vector) takes a rank-2 View (the matrix) and two rank-1 Views (input and output vectors).
Discussions a couple weeks ago with @nmhamster suggested that we could get outer loop vectorization by doing the following:
The routines wouldn't change, except that instead of writing A(i,j) or x(k) (for example), we would write A(i,j,whichBlock) or x(k,whichBlock). We have to rely on Kokkos::View::operator() to inline, but this is a much easier approach than explicit SIMD.
This depends on #177 and #179.
The text was updated successfully, but these errors were encountered: