Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sub-optimal codegen for float newtypes #32031

Closed
bsteinb opened this issue Mar 3, 2016 · 7 comments
Closed

Sub-optimal codegen for float newtypes #32031

bsteinb opened this issue Mar 3, 2016 · 7 comments
Assignees
Labels
A-codegen Area: Code generation E-needs-test Call for participation: An issue has been fixed and does not reproduce, but no test has been added.

Comments

@bsteinb
Copy link

bsteinb commented Mar 3, 2016

It seems that floating point types wrapped in newtypes are passed to functions in general purpose registers instead of SIMD registers the way plain f32 and f64 are. Consider this code example in playpen.

Without inlining, add_f32() and add_f64() compile to a single instruction (plus return) while add_newtype_{f32|f64}() first have to move their arguments from a general purpose register perform the addition and move the result back to the GPR.

With inlining, the situation is better, but still not optimal. Once again, the functions defined on plain types work directly in SIMD registers and the loop now gets unrolled by a factor of 10. For the newtypes, the accumulator is still kept in a GPR, however the loop is unrolled by a factor of 5. Here, the accumulator is only moved from the GPR to a SIMD register at the start of a loop iteration and moved back at the end (after 5 additions have been performed).

@dotdash dotdash self-assigned this Mar 4, 2016
@dotdash
Copy link
Contributor

dotdash commented Mar 4, 2016

I'm working on this

@sfackler sfackler added the A-codegen Area: Code generation label Mar 4, 2016
@bluss
Copy link
Member

bluss commented Mar 4, 2016

Probably the same issue as #24963

@dotdash dotdash assigned eddyb and unassigned dotdash Mar 4, 2016
@dotdash
Copy link
Contributor

dotdash commented Mar 4, 2016

Apparently it will be more useful to tackle this when @eddyb is done with his current work on trans.

@bsteinb
Copy link
Author

bsteinb commented Apr 9, 2016

Hey, the situation here seems to have improved recently. The nightly compiler on playpen now produces the same code for bare f32/f64 and their newtype wrapped counterparts.

@eddyb
Copy link
Member

eddyb commented Apr 9, 2016

@bsteinb Indeed, newtypes get unwrapped directly, for the Rust ABI at least. I think we might want a codegen test to avoid regressing on this matter.

@eddyb eddyb added the E-needs-test Call for participation: An issue has been fixed and does not reproduce, but no test has been added. label Apr 9, 2016
@dsprenkels
Copy link
Contributor

dsprenkels commented Jun 12, 2016

After a bisection using the test case from #34241, I found that the commit that fixed this was 9a8b807.

dsprenkels added a commit to dsprenkels/rust that referenced this issue Jun 12, 2016
bors added a commit that referenced this issue Jun 12, 2016
add a test case for issue #32031

I propose a test case to finish the fix for issue #32031. Please review this commit thoroughly, as I have never written a codegen test before.

r? @eddyb
@dsprenkels
Copy link
Contributor

This issue can be closed now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-codegen Area: Code generation E-needs-test Call for participation: An issue has been fixed and does not reproduce, but no test has been added.
Projects
None yet
Development

No branches or pull requests

6 participants