Batched diagonalization on CUDA GPUs #219

kbarros · 2024-01-10T20:08:48Z

CuSolve provides a function to perform batched diagonalization of Hermitian matrices:
https://docs.nvidia.com/cuda/cusolver/index.html#cusolverdn-t-syevj

We could consider using this for accelerating LSWT. Two concerns:

What we actually need is a batched generalized eigenvalue decomposition. That is, we need sygvj and not syevj, as documented here. However, the existing batched eigensolvers do not seem to support the generalized case? We could implement a generalized eigensolver ourselves using batched Cholesky decomposition, which is supported by CUDA.
Given that many LSWT calculations will typically working on small matrix sizes, especially in dipole mode, the diagonalization subroutine itself may not be the dominant cost. To make this beneficial, we would probably need to move a lot of the calculation onto the GPU (e.g., the matrix-builds for each q).

The text was updated successfully, but these errors were encountered:

kbarros added the enhancement New feature or request label Nov 16, 2024

Provide feedback