-
Notifications
You must be signed in to change notification settings - Fork 43
Pseudo-Minimum and Pseudo-Maximum instructions #122
Conversation
This actually seem like a very good idea. On the other hand, would going from scalar "regular" min/max to simd quasi min/max for the loop be an issue for optimizers? They would not be able to argue that the semantics are preserved. Should matching operations be added to scalar instruction set? |
In LLVM at least, the autovectorizer does not know what kind of min/max instructions are natively supported so its behavior would not change. But if a user had a loop of min/max operations obeying the semantics of wasm's scalar min/max, the min max semantics would not be changed by vectorization so these new vector min/max instructions would not be used. |
As a minor bikeshed, the word "quasi" involves nondeterminism in the quasi-fma proposal, so it's a little confusing that "quasi" here doesn't involve nondeterminism. What would you think about using the term "asymmetric" instead here? |
It looks like emscripten resolves min and max scalar builtins would to calls, rather than to instructions. This makes sense given the semantics. For x86 Clang produces a max instruction from the same example.
There might be use in defining min/max with native semantics for Wasm in general, not only for Wasm SIMD, I do have some mild concerns of adding it just here, but I don't feel very strongly either way. |
The precedent here has been set by the MVP. Having different semantics for the scalar, and vector versions of these is problematic for engines that support a scalar fallback to vector operations because the behavior for these instructions is subtly different with different code paths. I'm concerned about baking this type non-conformity in the Spec, though arguably the applications that depend on the scalarized code are not the primary target of this proposal. This is one of the cases where the consistency with the MVP is at odds with the native semantics, and sounds like a good candidate to bring up to the broader CG for feedback, and possibly a resolution with a vote. |
@penzn Please note that I propose to add new instructions with the semantics of |
@sunfishcode "quasi" is Latin for "almost". Quasi-Fused Multiply-Add is almost fused, in the sense that it is fused on most processors (e.g. all ARM64 processors), but can be non-fused on some (e.g. low-end Intel processors). Quasi Minimum/Maximum is almost Minimum/Maximum, in the sense that it usually produce minimum/maximum of two numbers, but may produce "wrong" result in two cases: That said, I care more about having these functions in the SIMD specifications then about their names, and open to alternative naming conventions. Unfortunately, "asymmetric" minimum/maximum would abbreviate to |
@penzn C/C++ |
@dtig I agree that having inconsistency between scalar and SIMD operations would be concerning. However, my proposed is not about removing or modifying existing |
@Maratyszcza you are right, fmin/fmax is different from std::min/std::max. My bad! Looks like std min/max gets lowered into a FP comparison followed by select. It does get vectorized and its vectorized form does not involve Wasm SIMD min/max operations.
This shows |
@penzn In lieu of |
It's not important to me what "quasi" means here, as long as it means something consistent within wasm. "different and nondeterministic rounding" and "different interpretation of NaN and -0" to me are different meanings. Fwiw,
x86 is really the only popular platform that can't implement wasm's The C/C++ situation is unfortunate, although on one hand, now that IEEE 754 has (Note: I don't have a strong opinion either way at this point; I want to raise these topics for discussion.) |
2ff0546
to
72fbb93
Compare
@sunfishcode, @tlively: renamed instructions to Pseudo-Minimum and Pseudo-Maximum to avoid confusion with Quasi-FMA |
My concern with the proposed operations is not limited to just the existing operations - i.e. adding new pmin/pmax operations still means that there isn't a good way to emulate the new operations using MVP operations. I agree that adding the scalar versions of these operations falls outside the scope of this proposal, but without these any scalar fallbacks have added complexity, or will be inaccurate. |
Just a note, I've hit this when trying to convert some code to WASM SIMD. I expected "max(0.f, v)" to result in an efficient codegen but instead it was really inefficient, and noticeably slower than using compare + and ( |
72fbb93
to
5647c1d
Compare
@dtig While these SIMD instructions don't have a direct equivalent in WAsm MVP, the scalar operation can be simulated with just two MVP instructions -- |
5647c1d
to
cb0d731
Compare
cb0d731
to
b1e3f6a
Compare
Note: I think the pmax lowering is incorrect:
say The lowering should be:
Same for f64x2. @Maratyszcza please take a look, thanks! |
@ngzhian You're right. |
Thanks! I have another feedback on lowering on ARM. But lowering suggested here does hide a bit of the slowness on ARM. (Also, of anyone has ideas on improving the f64x2lt implementation, please let me know, thanks!) |
F64 versions are doomed to be slow on 32-bit ARM due to lack of SIMD capability. However, the same applies to the standard |
This patch implements f32x4.pmin, f32x4.pmax, f64x2.pmin, and f64x2.pmax for x64 and interpreter. Pseudo-min and Pseudo-max instructions were proposed in WebAssembly/simd#122. These instructions exactly match std::min and std::max in C++ STL, and thus have different semantics from the existing min and max. The instruction-selector for x64 switches the operands around, because it allows for defining the dst to be same as first (really the second input node), allowing better codegen. For example, b = f32x4.pmin(a, b) directly maps to vminps(b, b, a) or minps(b, a), as long as we can define dst == b, and switching the instruction operands around allows us to do that. Bug: v8:10501 Change-Id: I06f983fc1764caf673e600ac91d9c0ac5166e17e Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2186630 Commit-Queue: Zhi An Ng <[email protected]> Reviewed-by: Tobias Tebbi <[email protected]> Reviewed-by: Deepti Gandluri <[email protected]> Cr-Commit-Position: refs/heads/master@{#67688}
As specified in WebAssembly/simd#122.
Implement some of the experimental SIMD opcodes that are supported by all of V8, LLVM, and Binaryen, for maximum compatibility with test content we might be exposed to. Most/all of these will probably make it into the spec, as they lead to substantial speedups in some programs, and they are deterministic. For spec and cpu mapping details, see: WebAssembly/simd#122 (pmax/pmin) WebAssembly/simd#232 (rounding) WebAssembly/simd#127 (dot product) WebAssembly/simd#237 (load zero) The wasm bytecode values used here come from the binaryen changes that are linked from those tickets, that's the best documentation right now. Current binaryen opcode mappings are here: https://github.com/WebAssembly/binaryen/blob/master/src/wasm-binary.h Also: Drive-by fix for signatures of vroundss and vroundsd, these are unary operations and should follow the conventions for these with src/dest arguments, not src0/src1/dest. Also: Drive-by fix to add variants of vmovss and vmovsd on x64 that take Operand source and FloatRegister destination. Differential Revision: https://phabricator.services.mozilla.com/D85982 UltraBlame original commit: 2e7ddb00c8f9240e148cf5843b50a7ba7b913351
Implement some of the experimental SIMD opcodes that are supported by all of V8, LLVM, and Binaryen, for maximum compatibility with test content we might be exposed to. Most/all of these will probably make it into the spec, as they lead to substantial speedups in some programs, and they are deterministic. For spec and cpu mapping details, see: WebAssembly/simd#122 (pmax/pmin) WebAssembly/simd#232 (rounding) WebAssembly/simd#127 (dot product) WebAssembly/simd#237 (load zero) The wasm bytecode values used here come from the binaryen changes that are linked from those tickets, that's the best documentation right now. Current binaryen opcode mappings are here: https://github.com/WebAssembly/binaryen/blob/master/src/wasm-binary.h Also: Drive-by fix for signatures of vroundss and vroundsd, these are unary operations and should follow the conventions for these with src/dest arguments, not src0/src1/dest. Also: Drive-by fix to add variants of vmovss and vmovsd on x64 that take Operand source and FloatRegister destination. Differential Revision: https://phabricator.services.mozilla.com/D85982 UltraBlame original commit: 2d73a015caaa3e70c175172158a6548625dc6da3
This has been accepted into the proposal [0] during the sync on 2020-09-04. I have an outstanding PR [1] to renumber the opcodes based on what's currently reserved in NewOpcodes.md. There also seems to be a merge conflict to be resolved, I suggest a rebase, then we can merge this in. Also removed the "pending prototype data", since we have it in #122 (comment) and #122 (comment). [0] https://docs.google.com/document/d/138cF6aOUa9RZC2tOR7AhlIQWdmX5EtpzXRTVDAN3bfo/edit# see "3. Pseudo min/max" |
b1e3f6a
to
41aee8b
Compare
@ngzhian Rebased and updated opcodes as per your PR |
41aee8b
to
7ae8006
Compare
F32x4 and F64x2 pmin and pmax were accepted into the proposal [0], this removes all the ifdefs and todo guarding the prototypes, and moves these instructions out of the post-mvp flag. [0] WebAssembly/simd#122 Bug: v8:10904 Change-Id: I4e0c2f29ddc5d7fc19a209cd02b3d369617574a0 Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2405802 Reviewed-by: Bill Budge <[email protected]> Commit-Queue: Zhi An Ng <[email protected]> Cr-Commit-Position: refs/heads/master@{#69855}
Port 3ba4431 Original Commit Message: F32x4 and F64x2 pmin and pmax were accepted into the proposal [0], this removes all the ifdefs and todo guarding the prototypes, and moves these instructions out of the post-mvp flag. [0] WebAssembly/simd#122 [email protected], [email protected], [email protected], [email protected] BUG= LOG=N Change-Id: I8b2ae60240f769e1f4c0b00e98d53846519b305e Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2410806 Reviewed-by: Junliang Yan <[email protected]> Reviewed-by: Milad Farazmand <[email protected]> Commit-Queue: Milad Farazmand <[email protected]> Cr-Commit-Position: refs/heads/master@{#69893}
…ed status. r=jseward Background: WebAssembly/simd#122 For all the pseudo-min/max SIMD instructions: - remove the internal 'Experimental' opcode suffix in the C++ code - remove the guard on experimental Wasm instructions in all the C++ decoders - move the test cases from simd/experimental.js to simd/ad-hack.js I have checked that current V8 and wasm-tools use the same opcode mappings. V8 in turn guarantees the correct mapping for LLVM and binaryen. Differential Revision: https://phabricator.services.mozilla.com/D92928
…ed status. r=jseward Background: WebAssembly/simd#122 For all the pseudo-min/max SIMD instructions: - remove the internal 'Experimental' opcode suffix in the C++ code - remove the guard on experimental Wasm instructions in all the C++ decoders - move the test cases from simd/experimental.js to simd/ad-hack.js I have checked that current V8 and wasm-tools use the same opcode mappings. V8 in turn guarantees the correct mapping for LLVM and binaryen. Differential Revision: https://phabricator.services.mozilla.com/D92928
…structions This patch implements, for aarch64, the following wasm SIMD extensions Floating-point rounding instructions WebAssembly/simd#232 Pseudo-Minimum and Pseudo-Maximum instructions WebAssembly/simd#122 The changes are straightforward: * `build.rs`: the relevant tests have been enabled * `cranelift/codegen/meta/src/shared/instructions.rs`: new CLIF instructions `fmin_pseudo` and `fmax_pseudo`. The wasm rounding instructions do not need any new CLIF instructions. * `cranelift/wasm/src/code_translator.rs`: translation into CLIF; this is pretty much the same as any other unary or binary vector instruction (for the rounding and the pmin/max respectively) * `cranelift/codegen/src/isa/aarch64/lower_inst.rs`: - `fmin_pseudo` and `fmax_pseudo` are converted into a two instruction sequence, `fcmpgt` followed by `bsl` - the CLIF rounding instructions are converted to a suitable vector `frint{n,z,p,m}` instruction. * `cranelift/codegen/src/isa/aarch64/inst/mod.rs`: minor extension of `pub enum VecMisc2` to handle the rounding operations. And corresponding `emit` cases.
…structions This patch implements, for aarch64, the following wasm SIMD extensions Floating-point rounding instructions WebAssembly/simd#232 Pseudo-Minimum and Pseudo-Maximum instructions WebAssembly/simd#122 The changes are straightforward: * `build.rs`: the relevant tests have been enabled * `cranelift/codegen/meta/src/shared/instructions.rs`: new CLIF instructions `fmin_pseudo` and `fmax_pseudo`. The wasm rounding instructions do not need any new CLIF instructions. * `cranelift/wasm/src/code_translator.rs`: translation into CLIF; this is pretty much the same as any other unary or binary vector instruction (for the rounding and the pmin/max respectively) * `cranelift/codegen/src/isa/aarch64/lower_inst.rs`: - `fmin_pseudo` and `fmax_pseudo` are converted into a two instruction sequence, `fcmpgt` followed by `bsl` - the CLIF rounding instructions are converted to a suitable vector `frint{n,z,p,m}` instruction. * `cranelift/codegen/src/isa/aarch64/inst/mod.rs`: minor extension of `pub enum VecMisc2` to handle the rounding operations. And corresponding `emit` cases.
…structions This patch implements, for aarch64, the following wasm SIMD extensions Floating-point rounding instructions WebAssembly/simd#232 Pseudo-Minimum and Pseudo-Maximum instructions WebAssembly/simd#122 The changes are straightforward: * `build.rs`: the relevant tests have been enabled * `cranelift/codegen/meta/src/shared/instructions.rs`: new CLIF instructions `fmin_pseudo` and `fmax_pseudo`. The wasm rounding instructions do not need any new CLIF instructions. * `cranelift/wasm/src/code_translator.rs`: translation into CLIF; this is pretty much the same as any other unary or binary vector instruction (for the rounding and the pmin/max respectively) * `cranelift/codegen/src/isa/aarch64/lower_inst.rs`: - `fmin_pseudo` and `fmax_pseudo` are converted into a two instruction sequence, `fcmpgt` followed by `bsl` - the CLIF rounding instructions are converted to a suitable vector `frint{n,z,p,m}` instruction. * `cranelift/codegen/src/isa/aarch64/inst/mod.rs`: minor extension of `pub enum VecMisc2` to handle the rounding operations. And corresponding `emit` cases.
…structions This patch implements, for aarch64, the following wasm SIMD extensions Floating-point rounding instructions WebAssembly/simd#232 Pseudo-Minimum and Pseudo-Maximum instructions WebAssembly/simd#122 The changes are straightforward: * `build.rs`: the relevant tests have been enabled * `cranelift/codegen/meta/src/shared/instructions.rs`: new CLIF instructions `fmin_pseudo` and `fmax_pseudo`. The wasm rounding instructions do not need any new CLIF instructions. * `cranelift/wasm/src/code_translator.rs`: translation into CLIF; this is pretty much the same as any other unary or binary vector instruction (for the rounding and the pmin/max respectively) * `cranelift/codegen/src/isa/aarch64/lower_inst.rs`: - `fmin_pseudo` and `fmax_pseudo` are converted into a two instruction sequence, `fcmpgt` followed by `bsl` - the CLIF rounding instructions are converted to a suitable vector `frint{n,z,p,m}` instruction. * `cranelift/codegen/src/isa/aarch64/inst/mod.rs`: minor extension of `pub enum VecMisc2` to handle the rounding operations. And corresponding `emit` cases.
…structions This patch implements, for aarch64, the following wasm SIMD extensions Floating-point rounding instructions WebAssembly/simd#232 Pseudo-Minimum and Pseudo-Maximum instructions WebAssembly/simd#122 The changes are straightforward: * `build.rs`: the relevant tests have been enabled * `cranelift/codegen/meta/src/shared/instructions.rs`: new CLIF instructions `fmin_pseudo` and `fmax_pseudo`. The wasm rounding instructions do not need any new CLIF instructions. * `cranelift/wasm/src/code_translator.rs`: translation into CLIF; this is pretty much the same as any other unary or binary vector instruction (for the rounding and the pmin/max respectively) * `cranelift/codegen/src/isa/aarch64/lower_inst.rs`: - `fmin_pseudo` and `fmax_pseudo` are converted into a two instruction sequence, `fcmpgt` followed by `bsl` - the CLIF rounding instructions are converted to a suitable vector `frint{n,z,p,m}` instruction. * `cranelift/codegen/src/isa/aarch64/inst/mod.rs`: minor extension of `pub enum VecMisc2` to handle the rounding operations. And corresponding `emit` cases.
…structions This patch implements, for aarch64, the following wasm SIMD extensions Floating-point rounding instructions WebAssembly/simd#232 Pseudo-Minimum and Pseudo-Maximum instructions WebAssembly/simd#122 The changes are straightforward: * `build.rs`: the relevant tests have been enabled * `cranelift/codegen/meta/src/shared/instructions.rs`: new CLIF instructions `fmin_pseudo` and `fmax_pseudo`. The wasm rounding instructions do not need any new CLIF instructions. * `cranelift/wasm/src/code_translator.rs`: translation into CLIF; this is pretty much the same as any other unary or binary vector instruction (for the rounding and the pmin/max respectively) * `cranelift/codegen/src/isa/aarch64/lower_inst.rs`: - `fmin_pseudo` and `fmax_pseudo` are converted into a two instruction sequence, `fcmpgt` followed by `bsl` - the CLIF rounding instructions are converted to a suitable vector `frint{n,z,p,m}` instruction. * `cranelift/codegen/src/isa/aarch64/inst/mod.rs`: minor extension of `pub enum VecMisc2` to handle the rounding operations. And corresponding `emit` cases.
Implement some of the experimental SIMD opcodes that are supported by all of V8, LLVM, and Binaryen, for maximum compatibility with test content we might be exposed to. Most/all of these will probably make it into the spec, as they lead to substantial speedups in some programs, and they are deterministic. For spec and cpu mapping details, see: WebAssembly/simd#122 (pmax/pmin) WebAssembly/simd#232 (rounding) WebAssembly/simd#127 (dot product) WebAssembly/simd#237 (load zero) The wasm bytecode values used here come from the binaryen changes that are linked from those tickets, that's the best documentation right now. Current binaryen opcode mappings are here: https://github.com/WebAssembly/binaryen/blob/master/src/wasm-binary.h Also: Drive-by fix for signatures of vroundss and vroundsd, these are unary operations and should follow the conventions for these with src/dest arguments, not src0/src1/dest. Also: Drive-by fix to add variants of vmovss and vmovsd on x64 that take Operand source and FloatRegister destination. Differential Revision: https://phabricator.services.mozilla.com/D85982
Implement some of the experimental SIMD opcodes that are supported by all of V8, LLVM, and Binaryen, for maximum compatibility with test content we might be exposed to. Most/all of these will probably make it into the spec, as they lead to substantial speedups in some programs, and they are deterministic. For spec and cpu mapping details, see: WebAssembly/simd#122 (pmax/pmin) WebAssembly/simd#232 (rounding) WebAssembly/simd#127 (dot product) WebAssembly/simd#237 (load zero) The wasm bytecode values used here come from the binaryen changes that are linked from those tickets, that's the best documentation right now. Current binaryen opcode mappings are here: https://github.com/WebAssembly/binaryen/blob/master/src/wasm-binary.h Also: Drive-by fix for signatures of vroundss and vroundsd, these are unary operations and should follow the conventions for these with src/dest arguments, not src0/src1/dest. Also: Drive-by fix to add variants of vmovss and vmovsd on x64 that take Operand source and FloatRegister destination. Differential Revision: https://phabricator.services.mozilla.com/D85982
…structions This patch implements, for aarch64, the following wasm SIMD extensions Floating-point rounding instructions WebAssembly/simd#232 Pseudo-Minimum and Pseudo-Maximum instructions WebAssembly/simd#122 The changes are straightforward: * `build.rs`: the relevant tests have been enabled * `cranelift/codegen/meta/src/shared/instructions.rs`: new CLIF instructions `fmin_pseudo` and `fmax_pseudo`. The wasm rounding instructions do not need any new CLIF instructions. * `cranelift/wasm/src/code_translator.rs`: translation into CLIF; this is pretty much the same as any other unary or binary vector instruction (for the rounding and the pmin/max respectively) * `cranelift/codegen/src/isa/aarch64/lower_inst.rs`: - `fmin_pseudo` and `fmax_pseudo` are converted into a two instruction sequence, `fcmpgt` followed by `bsl` - the CLIF rounding instructions are converted to a suitable vector `frint{n,z,p,m}` instruction. * `cranelift/codegen/src/isa/aarch64/inst/mod.rs`: minor extension of `pub enum VecMisc2` to handle the rounding operations. And corresponding `emit` cases.
Introduction
The
f32x4.min
/f32x4.max
/f64x2.min
/f64x2.max
instructions in the WebAssembly SIMD are the natural extensions of the scalar WebAssembly MVP instructions to SIMD instruction sets. These instructions follow JavaScript rules on NaN propagation, i.e. if any operand is NaN, the output is NaN. However, themin
/max
operations as defined in the WebAssembly specification have several drawbacks:f32x4.min
maps to a single instruction (FMIN Vd.4S, Vn.4S, Vm.4S
) on ARM64, it doesn't have a direct or even a close equivalent in x86 instruction sets. As a result, V8 has to use 8 SSE2 instructions to lower this WebAssembly instruction.min
/max
instructions from WAsm SIMD by auto-vectorizing scalar codes.{ f32x4.min(a, b), f32x4.max(a, b) }
is not identically equivalent to the set of values{ a, b }
. Consequentially, sorting networks, which underlie SIMD-friendly algorithms for sorting (e.g. Bitonic sort) and partial ordering, can't be implemented on top ofmin
/max
operations from WebAssembly SIMD.New instructions
This PR introduce Pseudo-Minimum (
f32x4.pmin
andf64x2.pmin
) and Pseudo-Maximum (f32x4.pmax
andf64x2.pmax
) instructions, which implement Pseudo-Minimum and Pseudo-Maximum operations with slightly different semantics than the Minimum and Maximum in the current spec. Pseudo-Minimum is defined aspmin(a, b) := b < a ? b : a
and Pseudo-Maximum is defined aspmax(a, b) := a < b ? b : a
. "Pseudo" in the name refers to the fact that these operations may not return the minimum in case of signed zero inputs, in particular:pmin(+0.0, -0.0) == +0.0
pmax(-0.0, +0.0) == -0.0
The new instructions fix some of the issues with WebAssembly
min
/max
instructions:MINPS
/MAXPS
/MINPD
/MAXPD
instructions. ARM processors don't have an exact equivalent, but can implement the same operation with just two instructions. The table below compares the cost of the newf32x4.pmin
instruction to the currently available alternatives:f32x4.min
f32x4.bitselect(b, a, f32x4.lt(b, a))
f32x4.pmin
std::min<T>
andstd::max<T>
functions in C++ standard template library. Thus, optimizing compilers are more likely to find opportunities for auto-vectorization in existing scalar codes.{ pmin(a, b), pmax(a, b) } == { a, b }
. Thus, they are suitable for efficient implementation of sorting networks, and in particular the bitonic sort algorithm.Mapping to Common Instruction Sets
This section illustrates how the new WebAssembly instructions can be lowered on common instruction sets. However, these patterns are provided only for convenience, compliant WebAssembly implementations do not have to follow the same code generation patterns.
x86/x86-64 processors with AVX instruction set
y = f32x4.pmin(a, b)
is lowered toVMINPS xmm_y, xmm_b, xmm_a
y = f32x4.pmax(a, b)
is lowered toVMAXPS xmm_y, xmm_b, xmm_a
y = f64x2.pmin(a, b)
is lowered toVMINPD xmm_y, xmm_b, xmm_a
y = f64x2.pmax(a, b)
is lowered toVMAXPD xmm_y, xmm_b, xmm_a
x86/x86-64 processors with SSE2 instruction set
b = f32x4.pmin(a, b)
is lowered toMINPS xmm_b, xmm_a
y = f32x4.pmin(a, b)
is lowered toMOVAPS xmm_y, xmm_b + MINPS xmm_y, xmm_a
b = f32x4.pmax(a, b)
is lowered toMAXPS xmm_b, xmm_a
y = f32x4.pmax(a, b)
is lowered toMOVAPS xmm_y, xmm_b + MAXPS xmm_y, xmm_a
b = f64x2.pmin(a, b)
is lowered toMINPD xmm_b, xmm_a
y = f64x2.pmin(a, b)
is lowered toMOVAPD xmm_y, xmm_b + MINPD xmm_y, xmm_a
b = f64x2.pmax(a, b)
is lowered toMAXPD xmm_b, xmm_a
y = f64x2.pmax(a, b)
is lowered toMOVAPD xmm_y, xmm_b + MAXPD xmm_y, xmm_a
Other processors and instruction sets
y = f32x4.pmin(a, b)
is lowered likev128.bitselect(b, a, f32x4.lt(b, a))
y = f32x4.pmax(a, b)
is lowered likev128.bitselect(b, a, f32x4.lt(a, b))
y = f64x2.pmin(a, b)
is lowered likev128.bitselect(b, a, f64x2.lt(b, a))
y = f64x2.pmax(a, b)
is lowered likev128.bitselect(b, a, f64x2.lt(a, b))