Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Various CUDA Optimizations #1

Open
wants to merge 51 commits into
base: master
Choose a base branch
from

Conversation

lilinitsy
Copy link

I've added in a ton of CUDA optimizations for LS/CE/AOV.

Largely, I've done some lightcurve batching on lomb-scargle, and made use of CUDA asynchronous streams (including async memory transfers) to get some nice performance improvements.

Smaller improvements come from using restrict on pointers that are appropriate for it, and using some GPU intrinsic functions for slow math calls (ie,, using __sincosf).

All testings were done on a V100 on the SDSC Expanse GPU cluster.

TIMINGS

METHOD VERSION MACHINE DATA TIME % GAIN
LS Baseline EXPANSE all time, all mags, 100k periods 139.75323605537415
LS OPTIMIZED EXPANSE all time, all mags, 100k periods 122.99888670444 +11.98%
CE Baseline EXPANSE all time, all mags, all 237.9152114391327
CE OPTIMIZED EXPANSE all time, all mags, all 224.75576400756836 +5.53%
AOV Baseline EXPANSE all time, all mags, 100k periods 244.3776876926422
AOV OPTIMIZED EXPANSE all time, all mags, 100k periods 214.41236901283264 +12.26%
METHOD VERSION MACHINE DATA TIME % GAIN
LS Baseline EXPANSE 1000 lightcurves, all periods 17.119052171707153
LS OPTIMIZED EXPANSE 1000 lightcurves, all periods 16.61055612564087 +2.97%
CE Baseline EXPANSE 1000 lightcurves, all periods 28.453676223754883
CE OPTIMIZED EXPANSE 1000 lightcurves, all periods 27.26391577720642 +4.18%
AOV Baseline EXPANSE 1000 lightcurves, all periods 30.421194076538086
AOV OPTIMIZED EXPANSE 1000 lightcurves, all periods 26.356033086776733 +13.36%

The performance gains decrease as the GPU memory bandwidth increases -- on a GTX 1080, the lomb-scargle gains were in the mid 20%'s.

@ejaszewski
Copy link
Collaborator

Happy to review the changes this weekend if you would like another pair of eyes on it!

@ejaszewski
Copy link
Collaborator

The changed .clang-format makes it very difficult to actually tell what has changed because the diff is picking up all of the whitespace changes. If possible, can you re-format this with the original clang-format so the diff is meaningful?

@lilinitsy
Copy link
Author

@ejaszewski Sure thing. I'll try and do that this weekend.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants