Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

For pm-gpu: use small kernels as default for shoc/p3 #7099

Merged
merged 2 commits into from
Mar 10, 2025

Conversation

ndkeen
Copy link
Contributor

@ndkeen ndkeen commented Mar 7, 2025

For pm-gpu, we see the small kernel path (SMALL_KERNELS) performing faster in all situations.
Also BFB with monolithic kernels.
Currently, this only impacts SHOC/P3.

[bfb]

@ndkeen ndkeen added Machine Files Performance pm-gpu Perlmutter machine at NERSC (GPU nodes) labels Mar 7, 2025
@ndkeen ndkeen self-assigned this Mar 7, 2025
@ndkeen ndkeen requested a review from bartgol March 7, 2025 17:10
@ndkeen
Copy link
Contributor Author

ndkeen commented Mar 7, 2025

doh! Yep, I was testing with it ON/OFF, and committed before turning it back ON. Fixed.

Copy link
Contributor

@mahf708 mahf708 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if this runs for you, then gucci 🍏

@mahf708 mahf708 linked an issue Mar 7, 2025 that may be closed by this pull request
@mahf708 mahf708 added the EAMxx PRs focused on capabilities for EAMxx label Mar 7, 2025
ndkeen added a commit that referenced this pull request Mar 8, 2025
…o next (PR #7099)

For pm-gpu, we see the small kernel path (SMALL_KERNELS) performing faster in all situations.
Also BFB with monolithic kernels.
Currently, this only impacts SHOC/P3.

[bfb]
@ndkeen
Copy link
Contributor Author

ndkeen commented Mar 8, 2025

merged to next. Late Friday, but what's the worst that could happen?

@ndkeen ndkeen merged commit c617ec5 into master Mar 10, 2025
7 of 19 checks passed
@ndkeen ndkeen deleted the ndk/machinefiles/pm-gpu-make-SMALL_KERNELS_default branch March 10, 2025 16:16
@ndkeen
Copy link
Contributor Author

ndkeen commented Mar 14, 2025

Noting that with daily testing, all of the non-DEBUG cases are BFB.
But all of the DEBUG cases failed compare with baselines.
We are trying to decide if that's ok, or revert.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
EAMxx PRs focused on capabilities for EAMxx Machine Files Performance pm-gpu Perlmutter machine at NERSC (GPU nodes)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Make the default for pm-gpu use SMALL_KERNELS for shoc/p3
3 participants