You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Jul 16, 2019. It is now read-only.
We could also look at roofline plots. In general, we expect our problems to be bandwidth limited (not flop limited). So, in an ideal world the only cost we'd like to see is the cost of loading and writing memory and all the floating point ops should be hidden.
Whether this is achievable is another question....
We haven't done these calculation yet for our more optimized codes, but this is something we plan to do in the coming weeks to get a sense of how much performance we're leaving on the table.
Personally, I (and maybe all of us at NPS?) have more experience optimizing GPU codes than CPU codes so this will be a good exercise for me/us as well.
Sign up for freeto subscribe to this conversation on GitHub.
Already have an account?
Sign in.
The text was updated successfully, but these errors were encountered: