Skip to content

Commit

Permalink
s390x: Support both big- and little-endian vector lane order
Browse files Browse the repository at this point in the history
This implements the s390x back-end portion of the solution for
bytecodealliance#4566

We now support both big- and little-endian vector lane order
in code generation.  The order used for a function is determined
by the function's ABI: if it uses a Wasmtime ABI, it will use
little-endian lane order, and big-endian lane order otherwise.
(This ensures that all raw_bitcast instructions generated by
both wasmtime and other cranelift frontends can always be
implemented as a no-op.)

Lane order affects the implementation of a number of operations:
- Vector immediates
- Vector memory load / store (in big- and little-endian variants)
- Operations explicitly using lane numbers
  (insertlane, extractlane, shuffle, swizzle)
- Operations implicitly using lane numbers
  (iadd_pairwise, narrow/widen, promote/demote, fcvt_low, vhigh_bits)

In addition, when calling a function using a different lane order,
we need to lane-swap all vector values passed or returned in registers.

A small number of changes to common code were also needed:

- Ensure we always select a Wasmtime calling convention on s390x
  in crates/cranelift (func_signature).

- Fix vector immediates for filetests/runtests.  In PR bytecodealliance#4427,
  I attempted to fix this by byte-swapping the V128 value, but
  with the new scheme, we'd instead need to perform a per-lane
  byte swap.  Since we do not know the actual type in write_to_slice
  and read_from_slice, this isn't easily possible.

  Revert this part of PR bytecodealliance#4427 again, and instead just mark the
  memory buffer as little-endian when emitting the trampoline;
  the back-end will then emit correct code to load the constant.

- Change a runtest in simd-bitselect-to-vselect.clif to no longer
  make little-endian lane order assumptions.

- Remove runtests in simd-swizzle.clif that make little-endian
  lane order assumptions by relying on implicit type conversion
  when using a non-i16x8 swizzle result type (this feature should
  probably be removed anyway).

Tested with both wasmtime and cg_clif.
  • Loading branch information
uweigand committed Aug 10, 2022
1 parent a25d520 commit 658da3e
Show file tree
Hide file tree
Showing 29 changed files with 6,595 additions and 604 deletions.
10 changes: 4 additions & 6 deletions cranelift/codegen/src/data_value.rs
Original file line number Diff line number Diff line change
Expand Up @@ -91,8 +91,8 @@ impl DataValue {
DataValue::I128(i) => dst[..16].copy_from_slice(&i.to_ne_bytes()[..]),
DataValue::F32(f) => dst[..4].copy_from_slice(&f.bits().to_ne_bytes()[..]),
DataValue::F64(f) => dst[..8].copy_from_slice(&f.bits().to_ne_bytes()[..]),
DataValue::V128(v) => dst[..16].copy_from_slice(&u128::from_le_bytes(*v).to_ne_bytes()),
DataValue::V64(v) => dst[..8].copy_from_slice(&u64::from_le_bytes(*v).to_ne_bytes()),
DataValue::V128(v) => dst[..16].copy_from_slice(&v[..]),
DataValue::V64(v) => dst[..8].copy_from_slice(&v[..]),
_ => unimplemented!(),
};
}
Expand Down Expand Up @@ -124,11 +124,9 @@ impl DataValue {
}
_ if ty.is_vector() => {
if ty.bytes() == 16 {
DataValue::V128(
u128::from_ne_bytes(src[..16].try_into().unwrap()).to_le_bytes(),
)
DataValue::V128(src[..16].try_into().unwrap())
} else if ty.bytes() == 8 {
DataValue::V64(u64::from_ne_bytes(src[..8].try_into().unwrap()).to_le_bytes())
DataValue::V128(src[..8].try_into().unwrap())
} else {
unimplemented!()
}
Expand Down
Loading

0 comments on commit 658da3e

Please sign in to comment.