Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use a more aggressive approach to unrolling in simd_gemm #518

Merged
merged 1 commit into from
Jan 5, 2025

Conversation

robertknight
Copy link
Owner

@robertknight robertknight commented Jan 5, 2025

For most architectures the compiler unrolls the main GEMM loop as requested. For AVX-512 however it didn't. To fix this, swap out the unroll macro for one which uses a more aggressive method to force unrolling.

See https://gist.github.com/robertknight/948170d4edb048804105a99cb4831b4b for a before/after comparison of the assembly.

For most architectures the compiler unrolls the main GEMM loop as requested. For
AVX-512 however it didn't. To fix this, swap out the unroll macro for one which
uses a more aggressive method to force unrolling: duplicating the body 4x.

See https://gist.github.com/robertknight/948170d4edb048804105a99cb4831b4b
for a before/after comparison.
@robertknight robertknight merged commit 2cd82d3 into main Jan 5, 2025
2 checks passed
@robertknight robertknight deleted the avx512-gemm-tweaks branch January 5, 2025 17:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant