You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We could consider using this for accelerating LSWT. Two concerns:
What we actually need is a batched generalized eigenvalue decomposition. That is, we need sygvj and not syevj, as documented here. However, the existing batched eigensolvers do not seem to support the generalized case? We could implement a generalized eigensolver ourselves using batched Cholesky decomposition, which is supported by CUDA.
Given that many LSWT calculations will typically working on small matrix sizes, especially in dipole mode, the diagonalization subroutine itself may not be the dominant cost. To make this beneficial, we would probably need to move a lot of the calculation onto the GPU (e.g., the matrix-builds for each q).
The text was updated successfully, but these errors were encountered:
CuSolve provides a function to perform batched diagonalization of Hermitian matrices:
https://docs.nvidia.com/cuda/cusolver/index.html#cusolverdn-t-syevj
Performance benefits may depend a lot on matrix size, etc: https://discourse.julialang.org/t/eigenvalues-for-lots-of-small-matrices-gpu-batched-vs-cpu-eigen/50792
We could consider using this for accelerating LSWT. Two concerns:
sygvj
and notsyevj
, as documented here. However, the existing batched eigensolvers do not seem to support the generalized case? We could implement a generalized eigensolver ourselves using batched Cholesky decomposition, which is supported by CUDA.The text was updated successfully, but these errors were encountered: