x64: Improve memory support in `{insert,extract}lane` #5982

alexcrichton · 2023-03-10T18:22:16Z

This commit improves adds support to Cranelift to emit pextr{b,w,d,q} with a memory destination, merging a store-of-extract operation into one instruction. Additionally AVX support is added for the pextr* instructions.

I've additionally tried to ensure that codegen tests and runtests exist for all forms of these instructions too.

This commit improves adds support to Cranelift to emit `pextr{b,w,d,q}` with a memory destination, merging a store-of-extract operation into one instruction. Additionally AVX support is added for the `pextr*` instructions. I've additionally tried to ensure that codegen tests and runtests exist for all forms of these instructions too.

alexcrichton · 2023-03-10T18:22:50Z

This is a generalization of #5924 for more vector types and more lanes.

abrown

LGTM but see a couple comments below.

abrown · 2023-03-10T23:39:16Z

cranelift/codegen/src/isa/x64/lower.isle

                    address
                    offset))
      (side_effect
       (x64_movsd_store (to_amode flags address offset) value)))
+(rule 2 (lower (store flags
+                    (has_type $I8 (extractlane value (u8_from_uimm8 n)))


I checked and u8_from_uimm8 does not guarantee that n is in the right range (e.g., 0 to 15 here). Did you see any validation that would prevent wrong n here at the CLIF level or do we rely exclusively on the Wasm-level construction? If there is nothing and it ends up being too tricky to do here in ISLE, maybe emit.rs could gain some assert!s or something like that.

I think this is the block in the verifier which validates that lanes are always inbounds, so I think we should be good on that front. I can add some extra asserts though to the backend too.

cranelift/filetests/filetests/isa/x64/extractlane-avx.clif

abrown · 2023-03-10T23:50:43Z

cranelift/filetests/filetests/isa/x64/insertlane-avx.clif

+;   movq    %rsp, %rbp
+; block0:
+;   vmovsd  0(%rdi), %xmm3
+;   vmovsd  %xmm0, %xmm3, %xmm0


Looking at the other cases, I started to worry that one of these instructions actually zeroes bits we do not want to zero. I think it is this case; from the MOVSD documentation:

Legacy version: When the source and destination operands are XMM registers, bits MAXVL:64 of the destination operand remains unchanged. When the source operand is a memory location and destination operand is an XMM registers, the quadword at bits 127:64 of the destination operand is cleared to all 0s, bits MAXVL:128 of the destination operand remains unchanged.

VEX and EVEX encoded register-register syntax: Moves a scalar double precision floating-point value from the second source operand (the third operand) to the low quadword element of the destination operand (the first operand). Bits 127:64 of the destination operand are copied from the first source operand (the second operand). Bits (MAXVL-1:128) of the corresponding destination register are zeroed

So it is the legacy SSE case where a movsd 0(%rdi) .... would zero too many bits but actually in the AVX code we could merge these two vmovsd into one. Is there a way to do that?

Going by this which I realize isn't official but has been what I've been using so far, my read is that VEX.128.F2.0F 11 /r: VMOVSD xmm1, xmm2, xmm3 has different semantics than VEX.128.F2.0F 10 /r: VMOVSD xmm1, m64 so I think that AVX and SSE match here where if you use the register-to-register form it preserves the upper bits but if you use the memory-to-register form it always zeros the upper bits, so I don't think we can fuse?

I'll admit though that this is all real subtle and if the official docs are different I wouldn't be too surprised.

cranelift/filetests/filetests/isa/x64/insertlane-avx.clif

abrown · 2023-03-10T23:52:15Z

cranelift/filetests/filetests/isa/x64/insertlane.clif

-;   vmovsd  0(%rdi), %xmm3
-;   vmovsd  %xmm0, %xmm3, %xmm0
+;   movsd   0(%rdi), %xmm3
+;   movsd   %xmm0, %xmm3, %xmm0


alexcrichton · 2023-03-11T02:33:10Z

I should also note that pinsrb and pinsrw do not actually perform load sinking at this time due to this check which is correct for most general-purpose instructions but isn't applicable for pinsrb and pinsrw specifically as they read the correct byte size. I wasn't sure how to plumb that through though easily to get the load sinking to happen, so the tests reflect how the load isn't sunk currently.

alexcrichton requested a review from abrown March 10, 2023 18:22

github-actions bot added cranelift Issues related to the Cranelift code generator cranelift:area:x64 Issues related to x64 codegen labels Mar 10, 2023

abrown approved these changes Mar 10, 2023

View reviewed changes

Add missing commas

6093cd3

Fix tests

29fc2ea

alexcrichton added this pull request to the merge queue Mar 13, 2023

Merged via the queue into bytecodealliance:main with commit 6ecdc24 Mar 13, 2023

alexcrichton deleted the pextr branch March 13, 2023 20:21

jameysharp mentioned this pull request Apr 5, 2023

Add release notes for 8.0.0 #6145

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

x64: Improve memory support in `{insert,extract}lane` #5982

x64: Improve memory support in `{insert,extract}lane` #5982

alexcrichton commented Mar 10, 2023

alexcrichton commented Mar 10, 2023

abrown left a comment

abrown Mar 10, 2023

alexcrichton Mar 11, 2023

abrown Mar 10, 2023

alexcrichton Mar 11, 2023

abrown Mar 10, 2023

alexcrichton commented Mar 11, 2023

x64: Improve memory support in {insert,extract}lane #5982

x64: Improve memory support in {insert,extract}lane #5982

Conversation

alexcrichton commented Mar 10, 2023

alexcrichton commented Mar 10, 2023

abrown left a comment

Choose a reason for hiding this comment

abrown Mar 10, 2023

Choose a reason for hiding this comment

alexcrichton Mar 11, 2023

Choose a reason for hiding this comment

abrown Mar 10, 2023

Choose a reason for hiding this comment

alexcrichton Mar 11, 2023

Choose a reason for hiding this comment

abrown Mar 10, 2023

Choose a reason for hiding this comment

alexcrichton commented Mar 11, 2023

x64: Improve memory support in `{insert,extract}lane` #5982

x64: Improve memory support in `{insert,extract}lane` #5982