-
Notifications
You must be signed in to change notification settings - Fork 13.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Sub-optimal codegen for f32 tuples #32045
Comments
The reason for this is due to us passing composite types less than a word in size as integers. In the vast majority of cases this results in much better code generation, so this is an unfortunate edge case where that isn't true. The only real thing I can see us doing here is passing two-field structs as two-arguments, as long as both arguments fit into registers, something we should probably be doing anyway. |
Would it make sense to introduce a special case for composite types made up only of floating point types? They could be passed inside SIMD registers. |
Yes, this should get passed as a 2xf32 vector IIRC. Same thing that the C
|
Hey, the situation here has changed slightly in the meantime. Without inlining the compiler (nightly on playpen) still crams the two |
I would like to take this issue, and see if I can fix it. My proposal is to just make an exception for aggregate types containing (only) floating point types when casting to an integer type. I am however not sure what to do with heterogeneous aggregate types (containing both |
I have now looked at this issue for some time. I think "Rust" functions could be adjusted for the "C" ABI (code), which would fix the sub-optimal codegen. This however has some side effects, for which I do not have the experience to fix. cc @eddyb |
Use ty::layout for ABI computation instead of LLVM types. This is the first step in creating a backend-agnostic library for computing call ABI details from signatures. I wanted to open the PR *before* attempting to move `cabi_*` from trans to avoid rebase churn in #39999. **EDIT**: As I suspected, #39999 needs this PR to fully work (see #39999 (comment)). The first 3 commits add more APIs to `ty::layout` and replace non-ABI uses of `sizing_type_of`. These APIs are probably usable by other backends, and miri too (cc @stoklund @solson). The last commit rewrites `rustc_trans::cabi_*` to use `ty::layout` and new `rustc_trans::abi` APIs. Also, during the process, a couple trivial bugs were identified and fixed: * `msp430`, `nvptx`, `nvptx64`: type sizes *in bytes* were compared with `32` and `64` * `x86` (`fastcall`): `f64` was incorrectly not treated the same way as `f32` Although not urgent, this PR also uses the more general "homogenous aggregate" logic to fix #32045.
Similar to the behavior observed in #32031, tuples of
f32
andf64
seem to be passed to functions in GPRs.The
f32
tuple takes an especially large hit, since the two f32 are passed inside a single 64 bit GPR and have to be excracted and compressed viashift
andor
instructions. Even with inlining turned on, this does not go away.The
f64
tuple is not as bad as thef32
tuple. Without inlining it does somemove
s to and from the SIMD registers and with inlining turned on, the tuple is kept in a SIMD register and the loop is vectorized and unrolled.EDIT: Forgot to link to the code example on playpen.
The text was updated successfully, but these errors were encountered: