s390x: Support both big- and little-endian vector lane order

This implements the s390x back-end portion of the solution for bytecodealliance#4566 We now support both big- and little-endian vector lane order in code generation. The order used for a function is determined by the function's ABI: if it uses a Wasmtime ABI, it will use little-endian lane order, and big-endian lane order otherwise. (This ensures that all raw_bitcast instructions generated by both wasmtime and other cranelift frontends can always be implemented as a no-op.) Lane order affects the implementation of a number of operations: - Vector immediates - Vector memory load / store (in big- and little-endian variants) - Operations explicitly using lane numbers (insertlane, extractlane, shuffle, swizzle) - Operations implicitly using lane numbers (iadd_pairwise, narrow/widen, promote/demote, fcvt_low, vhigh_bits) In addition, when calling a function using a different lane order, we need to lane-swap all vector values passed or returned in registers. A small number of changes to common code were also needed: - Ensure we always select a Wasmtime calling convention on s390x in crates/cranelift (func_signature). - Fix vector immediates for filetests/runtests. In PR bytecodealliance#4427, I attempted to fix this by byte-swapping the V128 value, but with the new scheme, we'd instead need to perform a per-lane byte swap. Since we do not know the actual type in write_to_slice and read_from_slice, this isn't easily possible. Revert this part of PR bytecodealliance#4427 again, and instead just mark the memory buffer as little-endian when emitting the trampoline; the back-end will then emit correct code to load the constant. - Change a runtest in simd-bitselect-to-vselect.clif to no longer make little-endian lane order assumptions. - Remove runtests in simd-swizzle.clif that make little-endian lane order assumptions by relying on implicit type conversion when using a non-i16x8 swizzle result type (this feature should probably be removed anyway). Tested with both wasmtime and cg_clif.
uweigand · Aug 10, 2022 · 658da3e · 658da3e
1 parent a25d520
commit 658da3e
Show file tree

Hide file tree

Showing 29 changed files with 6,595 additions and 604 deletions.
diff --git a/cranelift/codegen/src/data_value.rs b/cranelift/codegen/src/data_value.rs
@@ -91,8 +91,8 @@ impl DataValue {
             DataValue::I128(i) => dst[..16].copy_from_slice(&i.to_ne_bytes()[..]),
             DataValue::F32(f) => dst[..4].copy_from_slice(&f.bits().to_ne_bytes()[..]),
             DataValue::F64(f) => dst[..8].copy_from_slice(&f.bits().to_ne_bytes()[..]),
-            DataValue::V128(v) => dst[..16].copy_from_slice(&u128::from_le_bytes(*v).to_ne_bytes()),
-            DataValue::V64(v) => dst[..8].copy_from_slice(&u64::from_le_bytes(*v).to_ne_bytes()),
+            DataValue::V128(v) => dst[..16].copy_from_slice(&v[..]),
+            DataValue::V64(v) => dst[..8].copy_from_slice(&v[..]),
             _ => unimplemented!(),
         };
     }
@@ -124,11 +124,9 @@ impl DataValue {
             }
             _ if ty.is_vector() => {
                 if ty.bytes() == 16 {
-                    DataValue::V128(
-                        u128::from_ne_bytes(src[..16].try_into().unwrap()).to_le_bytes(),
-                    )
+                    DataValue::V128(src[..16].try_into().unwrap())
                 } else if ty.bytes() == 8 {
-                    DataValue::V64(u64::from_ne_bytes(src[..8].try_into().unwrap()).to_le_bytes())
+                    DataValue::V128(src[..8].try_into().unwrap())
                 } else {
                     unimplemented!()
                 }