Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

polyval: match ideal assembly #44

Merged
merged 1 commit into from
Dec 21, 2019
Merged

Conversation

tarcieri
Copy link
Member

The previous implementation used separate #[target_feature(...)] blocks for each core::arch intrinsic. This thwarts the inliner, so these all translated to call instructions.

This change inlines the intrinsic calls into larger #[target_feature(...)]-gated functions.

When compiling with -C target-cpu=skylake, the generated assembly matches the idealized version (for at least the Montgomery fast reduction) as described in this QuarksLab blog post:

https://blog.quarkslab.com/reversing-a-finite-field-multiplication-optimization.html

Their version:

Screen Shot 2019-12-21 at 10 22 16 AM

Godbolt: https://godbolt.org/z/Zjuvwu

Screen Shot 2019-12-21 at 10 54 49 AM

The previous implementation used separate `#[target_feature(...)]`
blocks for each `core::arch` intrinsic. This thwarts the inliner, so
these all translated to `call` instructions.

This change inlines the intrinsic calls into larger
`#[target_feature(...)]`-gated functions.

When compiling with `-C target-cpu=skylake`, the generated assembly
matches the idealized version as described in this QuarksLab blog post:

https://blog.quarkslab.com/reversing-a-finite-field-multiplication-optimization.html
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant