Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Constructing a Vector4 from 4 constant floats is very inefficient #10044

Closed
tannergooding opened this issue Mar 27, 2018 · 6 comments
Closed
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI optimization
Milestone

Comments

@tannergooding
Copy link
Member

Code that is in the form of: var value = new Vector4(1, 2, 3, 4) currently generates very inefficient code.

Today, this produces something similar to:

C4E17A100518010000   vmovss   xmm0, dword ptr [reloc @RWD00]
C4E17A100D13010000   vmovss   xmm1, dword ptr [reloc @RWD04]
C4E17A10150E010000   vmovss   xmm2, dword ptr [reloc @RWD08]
C4E17A101D09010000   vmovss   xmm3, dword ptr [reloc @RWD12]
C4E15857E4           vxorps   xmm4, xmm4
C4E15A10E3           vmovss   xmm4, xmm4, xmm3
C4E15973FC04         vpslldq  xmm4, 4
C4E15A10E2           vmovss   xmm4, xmm4, xmm2
C4E15973FC04         vpslldq  xmm4, 4
C4E15A10E1           vmovss   xmm4, xmm4, xmm1
C4E15973FC04         vpslldq  xmm4, 4
C4E15A10E0           vmovss   xmm4, xmm4, xmm0
C4E17828C4           vmovaps  xmm0, xmm4

The JIT should recognize this type of code and instead optimize it to be a single read:

vmovaps xmm0 xmm ptr [address]    ; Use vmovups if address is not aligned

category:cq
theme:vector-codegen
skill-level:intermediate
cost:medium

@tannergooding
Copy link
Member Author

CC. @CarolEidt, @eerhardt

@mikedn
Copy link
Contributor

mikedn commented Mar 27, 2018

Eh, I guess I need to look into emitting those 16 bytes data sections a bit more, this issue keeps coming up one way or another :)

@mikedn
Copy link
Contributor

mikedn commented Mar 28, 2018

Funny thing that this needs 16 bytes of data section anyway, to store the 4 FP constants. And then generates ~75 bytes of extra code...

@ufcpp
Copy link
Contributor

ufcpp commented Mar 28, 2018

related to dotnet/csharplang#688 ?

@mikedn
Copy link
Contributor

mikedn commented Mar 28, 2018

Yes. Except this one is actually doable. The csharplang suggestion, well, not so much...

@tannergooding
Copy link
Member Author

The csharplang suggestion, well, not so much...

Could you please comment, on the language issue, why you believe this is not doable.

At least from the initial IL tests, using Data declarations to emit structured constants worked as intended, on multiple platforms. So, outside of language design issues (such as what to do for cross-assembly constant declarations), it should be doable (and is actually doable in IL today).

@msftgits msftgits transferred this issue from dotnet/coreclr Jan 31, 2020
@msftgits msftgits added this to the Future milestone Jan 31, 2020
@ghost ghost locked as resolved and limited conversation to collaborators Dec 17, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI optimization
Projects
None yet
Development

No branches or pull requests

4 participants