-
Notifications
You must be signed in to change notification settings - Fork 13.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Sub-optimal codegen for float newtypes #32031
Comments
I'm working on this |
Probably the same issue as #24963 |
Apparently it will be more useful to tackle this when @eddyb is done with his current work on trans. |
Hey, the situation here seems to have improved recently. The nightly compiler on playpen now produces the same code for bare |
@bsteinb Indeed, newtypes get unwrapped directly, for the Rust ABI at least. I think we might want a codegen test to avoid regressing on this matter. |
This issue can be closed now. |
It seems that floating point types wrapped in newtypes are passed to functions in general purpose registers instead of SIMD registers the way plain f32 and f64 are. Consider this code example in playpen.
Without inlining,
add_f32()
andadd_f64()
compile to a single instruction (plus return) whileadd_newtype_{f32|f64}()
first have to move their arguments from a general purpose register perform the addition and move the result back to the GPR.With inlining, the situation is better, but still not optimal. Once again, the functions defined on plain types work directly in SIMD registers and the loop now gets unrolled by a factor of 10. For the newtypes, the accumulator is still kept in a GPR, however the loop is unrolled by a factor of 5. Here, the accumulator is only moved from the GPR to a SIMD register at the start of a loop iteration and moved back at the end (after 5 additions have been performed).
The text was updated successfully, but these errors were encountered: