Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Batched diagonalization on CUDA GPUs #219

Open
kbarros opened this issue Jan 10, 2024 · 0 comments
Open

Batched diagonalization on CUDA GPUs #219

kbarros opened this issue Jan 10, 2024 · 0 comments
Labels
enhancement New feature or request

Comments

@kbarros
Copy link
Member

kbarros commented Jan 10, 2024

CuSolve provides a function to perform batched diagonalization of Hermitian matrices:
https://docs.nvidia.com/cuda/cusolver/index.html#cusolverdn-t-syevj

Performance benefits may depend a lot on matrix size, etc: https://discourse.julialang.org/t/eigenvalues-for-lots-of-small-matrices-gpu-batched-vs-cpu-eigen/50792

We could consider using this for accelerating LSWT. Two concerns:

  1. What we actually need is a batched generalized eigenvalue decomposition. That is, we need sygvj and not syevj, as documented here. However, the existing batched eigensolvers do not seem to support the generalized case? We could implement a generalized eigensolver ourselves using batched Cholesky decomposition, which is supported by CUDA.
  2. Given that many LSWT calculations will typically working on small matrix sizes, especially in dipole mode, the diagonalization subroutine itself may not be the dominant cost. To make this beneficial, we would probably need to move a lot of the calculation onto the GPU (e.g., the matrix-builds for each q).
@kbarros kbarros added the enhancement New feature or request label Nov 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant