Pseudo-Minimum and Pseudo-Maximum instructions #122

Maratyszcza · 2019-10-21T19:41:23Z

Introduction

The f32x4.min/f32x4.max/f64x2.min/f64x2.max instructions in the WebAssembly SIMD are the natural extensions of the scalar WebAssembly MVP instructions to SIMD instruction sets. These instructions follow JavaScript rules on NaN propagation, i.e. if any operand is NaN, the output is NaN. However, the min/max operations as defined in the WebAssembly specification have several drawbacks:

They have very asymmetric cost across popular architectures. E.g. while f32x4.min maps to a single instruction (FMIN Vd.4S, Vn.4S, Vm.4S) on ARM64, it doesn't have a direct or even a close equivalent in x86 instruction sets. As a result, V8 has to use 8 SSE2 instructions to lower this WebAssembly instruction.
No operator or function in C or C++ defines minimum or maximum operation with the same semantic as WebAssembly. Therefore, optimizing compilers can't generate min/max instructions from WAsm SIMD by auto-vectorizing scalar codes.
They lose information. In particular, the set of values { f32x4.min(a, b), f32x4.max(a, b) } is not identically equivalent to the set of values { a, b }. Consequentially, sorting networks, which underlie SIMD-friendly algorithms for sorting (e.g. Bitonic sort) and partial ordering, can't be implemented on top of min/max operations from WebAssembly SIMD.

New instructions

This PR introduce Pseudo-Minimum (f32x4.pmin and f64x2.pmin) and Pseudo-Maximum (f32x4.pmax and f64x2.pmax) instructions, which implement Pseudo-Minimum and Pseudo-Maximum operations with slightly different semantics than the Minimum and Maximum in the current spec. Pseudo-Minimum is defined as pmin(a, b) := b < a ? b : a and Pseudo-Maximum is defined as pmax(a, b) := a < b ? b : a. "Pseudo" in the name refers to the fact that these operations may not return the minimum in case of signed zero inputs, in particular:

pmin(+0.0, -0.0) == +0.0
pmax(-0.0, +0.0) == -0.0

The new instructions fix some of the issues with WebAssembly min/max instructions:

They have much more uniform cost across different architectures. On x86 processors with SSE2 instruction sets, these instructions directly map to MINPS/MAXPS/MINPD/MAXPD instructions. ARM processors don't have an exact equivalent, but can implement the same operation with just two instructions. The table below compares the cost of the new f32x4.pmin instruction to the currently available alternatives:

Instructions	x86 with SSE2	ARM NEON	ARM64
`f32x4.min`	8	1	1
`f32x4.bitselect(b, a, f32x4.lt(b, a))`	4	2	2
`f32x4.pmin`	1	2	2

The definition of Pseudo-Minimum and Pseudo-Maximum operations exactly match the std::min<T> and std::max<T> functions in C++ standard template library. Thus, optimizing compilers are more likely to find opportunities for auto-vectorization in existing scalar codes.
Pseudo-Minimum and Pseudo-Maximum operations jointly preserve information about their inputs, i.e. { pmin(a, b), pmax(a, b) } == { a, b }. Thus, they are suitable for efficient implementation of sorting networks, and in particular the bitonic sort algorithm.

Mapping to Common Instruction Sets

This section illustrates how the new WebAssembly instructions can be lowered on common instruction sets. However, these patterns are provided only for convenience, compliant WebAssembly implementations do not have to follow the same code generation patterns.

x86/x86-64 processors with AVX instruction set

f32x4.pmin
- y = f32x4.pmin(a, b) is lowered to VMINPS xmm_y, xmm_b, xmm_a
f32x4.pmax
- y = f32x4.pmax(a, b) is lowered to VMAXPS xmm_y, xmm_b, xmm_a
f64x2.pmin
- y = f64x2.pmin(a, b) is lowered to VMINPD xmm_y, xmm_b, xmm_a
f64x2.pmax
- y = f64x2.pmax(a, b) is lowered to VMAXPD xmm_y, xmm_b, xmm_a

x86/x86-64 processors with SSE2 instruction set

f32x4.pmin
- b = f32x4.pmin(a, b) is lowered to MINPS xmm_b, xmm_a
- y = f32x4.pmin(a, b) is lowered to MOVAPS xmm_y, xmm_b + MINPS xmm_y, xmm_a
f32x4.pmax
- b = f32x4.pmax(a, b) is lowered to MAXPS xmm_b, xmm_a
- y = f32x4.pmax(a, b) is lowered to MOVAPS xmm_y, xmm_b + MAXPS xmm_y, xmm_a
f64x2.pmin
- b = f64x2.pmin(a, b) is lowered to MINPD xmm_b, xmm_a
- y = f64x2.pmin(a, b) is lowered to MOVAPD xmm_y, xmm_b + MINPD xmm_y, xmm_a
f64x2.pmax
- b = f64x2.pmax(a, b) is lowered to MAXPD xmm_b, xmm_a
- y = f64x2.pmax(a, b) is lowered to MOVAPD xmm_y, xmm_b + MAXPD xmm_y, xmm_a

Other processors and instruction sets

f32x4.pmin
- y = f32x4.pmin(a, b) is lowered like v128.bitselect(b, a, f32x4.lt(b, a))
f32x4.pmax
- y = f32x4.pmax(a, b) is lowered like v128.bitselect(b, a, f32x4.lt(a, b))
f64x2.pmin
- y = f64x2.pmin(a, b) is lowered like v128.bitselect(b, a, f64x2.lt(b, a))
f64x2.pmax
- y = f64x2.pmax(a, b) is lowered like v128.bitselect(b, a, f64x2.lt(a, b))

penzn · 2019-10-29T22:12:48Z

This actually seem like a very good idea. On the other hand, would going from scalar "regular" min/max to simd quasi min/max for the loop be an issue for optimizers? They would not be able to argue that the semantics are preserved. Should matching operations be added to scalar instruction set?

tlively · 2019-10-29T22:36:12Z

In LLVM at least, the autovectorizer does not know what kind of min/max instructions are natively supported so its behavior would not change. But if a user had a loop of min/max operations obeying the semantics of wasm's scalar min/max, the min max semantics would not be changed by vectorization so these new vector min/max instructions would not be used.

sunfishcode · 2019-10-29T23:17:56Z

As a minor bikeshed, the word "quasi" involves nondeterminism in the quasi-fma proposal, so it's a little confusing that "quasi" here doesn't involve nondeterminism. What would you think about using the term "asymmetric" instead here?

penzn · 2019-10-30T00:25:53Z

It looks like emscripten resolves min and max scalar builtins would to calls, rather than to instructions. This makes sense given the semantics. For x86 Clang produces a max instruction from the same example.

$ cat max.c
#include <math.h>

float get_max(float a, float b) {
  return fmax(a, b);
}
$ ~/emscripten/emcc -O3 --target=wasm32 -S max.c -o -
get_max:                                # @get_max
        .functype       get_max (f32, f32) -> (f32)
# %bb.0:                                # %entry
        local.get       0
        f64.promote_f32
        local.get       1
        f64.promote_f32
        f64.call        fmax
        f32.demote_f64
                                        # fallthrough-return-value
        end_function

There might be use in defining min/max with native semantics for Wasm in general, not only for Wasm SIMD, I do have some mild concerns of adding it just here, but I don't feel very strongly either way.

dtig · 2019-10-31T00:18:32Z

The precedent here has been set by the MVP. Having different semantics for the scalar, and vector versions of these is problematic for engines that support a scalar fallback to vector operations because the behavior for these instructions is subtly different with different code paths. I'm concerned about baking this type non-conformity in the Spec, though arguably the applications that depend on the scalarized code are not the primary target of this proposal. This is one of the cases where the consistency with the MVP is at odds with the native semantics, and sounds like a good candidate to bring up to the broader CG for feedback, and possibly a resolution with a vote.

Maratyszcza · 2019-10-31T22:38:08Z

@penzn Please note that I propose to add new instructions with the semantics of std::min and std::max in C++. I don't propose to remove or modify existing f32x4.min/f32x4.max/f64x2.min/f64x2.max instructions. Thus, an optimizing compiler would vectorize f32.min/f32.max/f64.min/f64.max as f32x4.min/f32x4.max/f64x2.min/f64x2.max instructions and vectorize std::min<float>/std::max<float>/std::min<double>/std::max<double> as f32x4.qmin/f32x4.qmax/f64x2.qmin/f64x2.qmax instructions. The operations of std::min<float>/std::max<float>/std::min<double>/std::max<double> indeed don't have an equivalent instruction in WAsm MVP and will be represented as a sequence of instructions. While it would be useful to have scalar equivalents, AFAIU they are out of scope of SIMD spec.

Maratyszcza · 2019-10-31T22:46:15Z

@sunfishcode "quasi" is Latin for "almost". Quasi-Fused Multiply-Add is almost fused, in the sense that it is fused on most processors (e.g. all ARM64 processors), but can be non-fused on some (e.g. low-end Intel processors). Quasi Minimum/Maximum is almost Minimum/Maximum, in the sense that it usually produce minimum/maximum of two numbers, but may produce "wrong" result in two cases: qmin(+0.0, -0.0) == +0.0 and qmax(-0.0, +0.0) == -0.0.

That said, I care more about having these functions in the SIMD specifications then about their names, and open to alternative naming conventions. Unfortunately, "asymmetric" minimum/maximum would abbreviate to amin/amax, which can be confused with absolute minimum/maximum operation, present in some SIMD instruction sets (e.g. x86 AVX512 and MIPS MSA).

Maratyszcza · 2019-10-31T22:50:08Z

@penzn C/C++ fmin and fmax have different semantics than floating-point min/max instructions in WebAssembly (and also different semantics than the proposed qmin/qmax instructions): if one of the operands is NaN, fmin and fmax return the other operand. This operation is called minNum/maxNum in IEEE754 specification.

Maratyszcza · 2019-10-31T22:55:02Z

@dtig I agree that having inconsistency between scalar and SIMD operations would be concerning. However, my proposed is not about removing or modifying existing f32x4.min/f32x4.max/f64x2.min/f64x2.max operations, but rather about adding new f32x4.qmin/f32x4.qmax/f64x2.qmin/f64x2.qmax alongside existing ones. Of course, it would be helpful to have equivalents of Quasi-Minimum/Maximum operations as scalar instructions, but I'm afraid it would fall outside the scope of the SIMD specification.

penzn · 2019-10-31T23:12:45Z

@Maratyszcza you are right, fmin/fmax is different from std::min/std::max. My bad!

Looks like std min/max gets lowered into a FP comparison followed by select. It does get vectorized and its vectorized form does not involve Wasm SIMD min/max operations.

$ cat max.cc
#include <algorithm>

float get_max(float a, float b) {
  return std::max<float>(a, b);
}

void get_many(float * a1, float * a2, float * res, unsigned sz) {
  for (int i = 0; i < sz; ++i) {
    *res = std::max<float>(*a1, *a2);
    ++a1;
    ++a2;
    ++res;
  }
}
$ emcc -msimd128 -O3 -S max.cc -o -

This shows std::max getting vectorized into into f32x4.lt followed by v128.bitselect. LLVM IR also shows vector compare followed by vector select.

Maratyszcza · 2019-10-31T23:15:33Z

@penzn In lieu of f32x4.qmin, f32x4.lt followed by v128.bitselect would be lowered into 4 instructions (1 for f32x4.lt and 3 for v128.bitselect).

sunfishcode · 2019-10-31T23:33:26Z

It's not important to me what "quasi" means here, as long as it means something consistent within wasm. "different and nondeterministic rounding" and "different interpretation of NaN and -0" to me are different meanings.

Fwiw, minNum and maxNum were removed in the recently-published IEEE 754-2019. IEEE 754 now defines:

minimum and maximum, which wasm's min and max correspond to
minimumNumber and maximumNumber, which are similar to the old minNum and maxNum, but
- correct a mistake in the handling of signalling NaN
- make the handling of -0 deterministic (in the same way wasm does, by interpreting 0 to be "greater" than -0)

x86 is really the only popular platform that can't implement wasm's min and max in a single instruction today. And now that these are now standardized in IEEE 754, not to mention JavaScript and other popular languages, it's entirely possible that future x86 extensions will add them. While this will take a while even if true, waiting for this is a plausible strategy, if we consider WebAssembly to be around for a long time.

The C/C++ situation is unfortunate, although on one hand, now that IEEE 754 has minimumNumber and maximumNumber, wasm could probably add operators corresponding to those without too much trouble, in which case C's fmin and fmax could compile to those. And on the other, even with a<b?a:b or std::min, with compiler flags, users can override strict NaN and -0 semantics and recover the optimizations.

(Note: I don't have a strong opinion either way at this point; I want to raise these topics for discussion.)

Maratyszcza · 2019-11-11T07:31:47Z

@sunfishcode, @tlively: renamed instructions to Pseudo-Minimum and Pseudo-Maximum to avoid confusion with Quasi-FMA

dtig · 2019-11-18T16:35:52Z

@dtig I agree that having inconsistency between scalar and SIMD operations would be concerning. However, my proposed is not about removing or modifying existing f32x4.min/f32x4.max/f64x2.min/f64x2.max operations, but rather about adding new f32x4.qmin/f32x4.qmax/f64x2.qmin/f64x2.qmax alongside existing ones. Of course, it would be helpful to have equivalents of Quasi-Minimum/Maximum operations as scalar instructions, but I'm afraid it would fall outside the scope of the SIMD specification.

My concern with the proposed operations is not limited to just the existing operations - i.e. adding new pmin/pmax operations still means that there isn't a good way to emulate the new operations using MVP operations. I agree that adding the scalar versions of these operations falls outside the scope of this proposal, but without these any scalar fallbacks have added complexity, or will be inaccurate.

zeux · 2020-01-12T21:08:17Z

Just a note, I've hit this when trying to convert some code to WASM SIMD. I expected "max(0.f, v)" to result in an efficient codegen but instead it was really inefficient, and noticeably slower than using compare + and (and(v, ge(v, 0.f))).

Maratyszcza · 2020-01-14T18:14:49Z

@dtig While these SIMD instructions don't have a direct equivalent in WAsm MVP, the scalar operation can be simulated with just two MVP instructions -- f32.lt + f32.select.

ngzhian · 2020-05-07T22:56:15Z

Note: I think the pmax lowering is incorrect:

y = f32x4.pmax(a, b) is lowered like v128.bitselect(a, b, f32x4.lt(b, a))

say b == a, f32x4.lt(b,a) would then be 0, which would select b. But it should select a, since pmax always returns the first input (like std::max) if the inputs are equal.

The lowering should be:

y = f32x4.pmax(a, b) is lowered like v128.bitselect(a, b, f32x4.le(b, a))
or
y = f32x4.pmax(a, b) is lowered like v128.bitselect(b, a, f32x4.gt(b, a))

Same for f64x2. @Maratyszcza please take a look, thanks!

Maratyszcza · 2020-05-07T23:29:07Z

@ngzhian You're right. std::max(a, b) := (a < b) ? b : a, and thus y = f32x4.pmax(a, b) is lowered like v128.bitselect(b, a, f32x4.lt(a, b)). Updated the PR description.

ngzhian · 2020-05-07T23:38:45Z

Thanks! I have another feedback on lowering on ARM.
F64x2Lt is not efficient on ARM at all. The current implementation uses compares lane by lane, and uses a few conditional moves (seehttps://source.chromium.org/chromium/chromium/src/+/master:v8/src/compiler/backend/arm/code-generator-arm.cc;l=1960;drc=3795f5bbfcf5f8c3f6f740b08a513e09ca818697).
I think pmin and pmax will do something similar and perhaps doesn't need the bitselect.

But lowering suggested here does hide a bit of the slowness on ARM.

(Also, of anyone has ideas on improving the f64x2lt implementation, please let me know, thanks!)

Maratyszcza · 2020-05-07T23:44:48Z

F64 versions are doomed to be slow on 32-bit ARM due to lack of SIMD capability. However, the same applies to the standard f64x2.min/f64x2.max ops, so I don't think we make it worse than it is.

This patch implements f32x4.pmin, f32x4.pmax, f64x2.pmin, and f64x2.pmax for x64 and interpreter. Pseudo-min and Pseudo-max instructions were proposed in WebAssembly/simd#122. These instructions exactly match std::min and std::max in C++ STL, and thus have different semantics from the existing min and max. The instruction-selector for x64 switches the operands around, because it allows for defining the dst to be same as first (really the second input node), allowing better codegen. For example, b = f32x4.pmin(a, b) directly maps to vminps(b, b, a) or minps(b, a), as long as we can define dst == b, and switching the instruction operands around allows us to do that. Bug: v8:10501 Change-Id: I06f983fc1764caf673e600ac91d9c0ac5166e17e Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2186630 Commit-Queue: Zhi An Ng <[email protected]> Reviewed-by: Tobias Tebbi <[email protected]> Reviewed-by: Deepti Gandluri <[email protected]> Cr-Commit-Position: refs/heads/master@{#67688}

As specified in WebAssembly/simd#122.

Implement some of the experimental SIMD opcodes that are supported by all of V8, LLVM, and Binaryen, for maximum compatibility with test content we might be exposed to. Most/all of these will probably make it into the spec, as they lead to substantial speedups in some programs, and they are deterministic. For spec and cpu mapping details, see: WebAssembly/simd#122 (pmax/pmin) WebAssembly/simd#232 (rounding) WebAssembly/simd#127 (dot product) WebAssembly/simd#237 (load zero) The wasm bytecode values used here come from the binaryen changes that are linked from those tickets, that's the best documentation right now. Current binaryen opcode mappings are here: https://github.com/WebAssembly/binaryen/blob/master/src/wasm-binary.h Also: Drive-by fix for signatures of vroundss and vroundsd, these are unary operations and should follow the conventions for these with src/dest arguments, not src0/src1/dest. Also: Drive-by fix to add variants of vmovss and vmovsd on x64 that take Operand source and FloatRegister destination. Differential Revision: https://phabricator.services.mozilla.com/D85982 UltraBlame original commit: 2e7ddb00c8f9240e148cf5843b50a7ba7b913351

Implement some of the experimental SIMD opcodes that are supported by all of V8, LLVM, and Binaryen, for maximum compatibility with test content we might be exposed to. Most/all of these will probably make it into the spec, as they lead to substantial speedups in some programs, and they are deterministic. For spec and cpu mapping details, see: WebAssembly/simd#122 (pmax/pmin) WebAssembly/simd#232 (rounding) WebAssembly/simd#127 (dot product) WebAssembly/simd#237 (load zero) The wasm bytecode values used here come from the binaryen changes that are linked from those tickets, that's the best documentation right now. Current binaryen opcode mappings are here: https://github.com/WebAssembly/binaryen/blob/master/src/wasm-binary.h Also: Drive-by fix for signatures of vroundss and vroundsd, these are unary operations and should follow the conventions for these with src/dest arguments, not src0/src1/dest. Also: Drive-by fix to add variants of vmovss and vmovsd on x64 that take Operand source and FloatRegister destination. Differential Revision: https://phabricator.services.mozilla.com/D85982 UltraBlame original commit: 2d73a015caaa3e70c175172158a6548625dc6da3

ngzhian · 2020-09-11T16:38:41Z

This has been accepted into the proposal [0] during the sync on 2020-09-04. I have an outstanding PR [1] to renumber the opcodes based on what's currently reserved in NewOpcodes.md. There also seems to be a merge conflict to be resolved, I suggest a rebase, then we can merge this in.

Also removed the "pending prototype data", since we have it in #122 (comment) and #122 (comment).

[0] https://docs.google.com/document/d/138cF6aOUa9RZC2tOR7AhlIQWdmX5EtpzXRTVDAN3bfo/edit# see "3. Pseudo min/max"
[1] Maratyszcza#1

Maratyszcza · 2020-09-11T17:18:54Z

@ngzhian Rebased and updated opcodes as per your PR

proposals/simd/BinarySIMD.md

F32x4 and F64x2 pmin and pmax were accepted into the proposal [0], this removes all the ifdefs and todo guarding the prototypes, and moves these instructions out of the post-mvp flag. [0] WebAssembly/simd#122 Bug: v8:10904 Change-Id: I4e0c2f29ddc5d7fc19a209cd02b3d369617574a0 Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2405802 Reviewed-by: Bill Budge <[email protected]> Commit-Queue: Zhi An Ng <[email protected]> Cr-Commit-Position: refs/heads/master@{#69855}

Port 3ba4431 Original Commit Message: F32x4 and F64x2 pmin and pmax were accepted into the proposal [0], this removes all the ifdefs and todo guarding the prototypes, and moves these instructions out of the post-mvp flag. [0] WebAssembly/simd#122 [email protected], [email protected], [email protected], [email protected] BUG= LOG=N Change-Id: I8b2ae60240f769e1f4c0b00e98d53846519b305e Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2410806 Reviewed-by: Junliang Yan <[email protected]> Reviewed-by: Milad Farazmand <[email protected]> Commit-Queue: Milad Farazmand <[email protected]> Cr-Commit-Position: refs/heads/master@{#69893}

…ed status. r=jseward Background: WebAssembly/simd#122 For all the pseudo-min/max SIMD instructions: - remove the internal 'Experimental' opcode suffix in the C++ code - remove the guard on experimental Wasm instructions in all the C++ decoders - move the test cases from simd/experimental.js to simd/ad-hack.js I have checked that current V8 and wasm-tools use the same opcode mappings. V8 in turn guarantees the correct mapping for LLVM and binaryen. Differential Revision: https://phabricator.services.mozilla.com/D92928

…structions This patch implements, for aarch64, the following wasm SIMD extensions Floating-point rounding instructions WebAssembly/simd#232 Pseudo-Minimum and Pseudo-Maximum instructions WebAssembly/simd#122 The changes are straightforward: * `build.rs`: the relevant tests have been enabled * `cranelift/codegen/meta/src/shared/instructions.rs`: new CLIF instructions `fmin_pseudo` and `fmax_pseudo`. The wasm rounding instructions do not need any new CLIF instructions. * `cranelift/wasm/src/code_translator.rs`: translation into CLIF; this is pretty much the same as any other unary or binary vector instruction (for the rounding and the pmin/max respectively) * `cranelift/codegen/src/isa/aarch64/lower_inst.rs`: - `fmin_pseudo` and `fmax_pseudo` are converted into a two instruction sequence, `fcmpgt` followed by `bsl` - the CLIF rounding instructions are converted to a suitable vector `frint{n,z,p,m}` instruction. * `cranelift/codegen/src/isa/aarch64/inst/mod.rs`: minor extension of `pub enum VecMisc2` to handle the rounding operations. And corresponding `emit` cases.

Implement some of the experimental SIMD opcodes that are supported by all of V8, LLVM, and Binaryen, for maximum compatibility with test content we might be exposed to. Most/all of these will probably make it into the spec, as they lead to substantial speedups in some programs, and they are deterministic. For spec and cpu mapping details, see: WebAssembly/simd#122 (pmax/pmin) WebAssembly/simd#232 (rounding) WebAssembly/simd#127 (dot product) WebAssembly/simd#237 (load zero) The wasm bytecode values used here come from the binaryen changes that are linked from those tickets, that's the best documentation right now. Current binaryen opcode mappings are here: https://github.com/WebAssembly/binaryen/blob/master/src/wasm-binary.h Also: Drive-by fix for signatures of vroundss and vroundsd, these are unary operations and should follow the conventions for these with src/dest arguments, not src0/src1/dest. Also: Drive-by fix to add variants of vmovss and vmovsd on x64 that take Operand source and FloatRegister destination. Differential Revision: https://phabricator.services.mozilla.com/D85982

…structions This patch implements, for aarch64, the following wasm SIMD extensions Floating-point rounding instructions WebAssembly/simd#232 Pseudo-Minimum and Pseudo-Maximum instructions WebAssembly/simd#122 The changes are straightforward: * `build.rs`: the relevant tests have been enabled * `cranelift/codegen/meta/src/shared/instructions.rs`: new CLIF instructions `fmin_pseudo` and `fmax_pseudo`. The wasm rounding instructions do not need any new CLIF instructions. * `cranelift/wasm/src/code_translator.rs`: translation into CLIF; this is pretty much the same as any other unary or binary vector instruction (for the rounding and the pmin/max respectively) * `cranelift/codegen/src/isa/aarch64/lower_inst.rs`: - `fmin_pseudo` and `fmax_pseudo` are converted into a two instruction sequence, `fcmpgt` followed by `bsl` - the CLIF rounding instructions are converted to a suitable vector `frint{n,z,p,m}` instruction. * `cranelift/codegen/src/isa/aarch64/inst/mod.rs`: minor extension of `pub enum VecMisc2` to handle the rounding operations. And corresponding `emit` cases.

Maratyszcza mentioned this pull request Oct 28, 2019

SIMD Sync meeting 10/22/2019 Agenda #121

Closed

Maratyszcza force-pushed the quasi-minmax branch from 2ff0546 to 72fbb93 Compare November 11, 2019 07:28

Maratyszcza changed the title ~~Quasi-Minimum and Quasi-Maximum instructions~~ Pseudo-Minimum and Pseudo-Maximum instructions Nov 11, 2019

Maratyszcza force-pushed the quasi-minmax branch from 72fbb93 to 5647c1d Compare January 14, 2020 18:10

Maratyszcza force-pushed the quasi-minmax branch from 5647c1d to cb0d731 Compare January 14, 2020 18:16

Maratyszcza force-pushed the quasi-minmax branch from cb0d731 to b1e3f6a Compare February 15, 2020 01:46

dtig mentioned this pull request Feb 18, 2020

Inefficient x64 codegen for fmin/fmax #186

Open

dtig mentioned this pull request Apr 10, 2020

WebAssembly SIMD review w3ctag/design-reviews#487

Closed

1 task

tlively added a commit to tlively/binaryen that referenced this pull request May 12, 2020

Implement pseudo-min/max SIMD instructions

74c6c69

As specified in WebAssembly/simd#122.

ngzhian mentioned this pull request Aug 25, 2020

Agenda for Sync meeting 09/04/20 (?) #323

Closed

ngzhian removed the pending prototype data label Sep 11, 2020

Maratyszcza force-pushed the quasi-minmax branch from b1e3f6a to 41aee8b Compare September 11, 2020 17:18

ngzhian reviewed Sep 11, 2020

View reviewed changes

proposals/simd/BinarySIMD.md Outdated Show resolved Hide resolved

Pseudo-Minimum and Pseudo-Maximum instructions

7ae8006

Maratyszcza force-pushed the quasi-minmax branch from 41aee8b to 7ae8006 Compare September 11, 2020 17:27

ngzhian merged commit e1ff82e into WebAssembly:master Sep 11, 2020

ngzhian mentioned this pull request Sep 17, 2020

Implement pmin/pmax in interpreter #349

Merged

ngzhian mentioned this pull request Oct 21, 2020

Remove integer SIMD not-equals instructions #351

Closed

julian-seward1 mentioned this pull request Oct 23, 2020

CL/aarch64: implement the wasm SIMD pseudo-max/min and FP-rounding in… bytecodealliance/wasmtime#2312

Merged

cfallin mentioned this pull request Jul 27, 2021

Implement fmin_pseudo and fmax_pseudo for scalars bytecodealliance/wasmtime#3115

Merged

lars-t-hansen mentioned this pull request Oct 4, 2021

Instruction names WebAssembly/relaxed-simd#42

Closed

afonso360 mentioned this pull request Jan 15, 2023

cranelift: Optimize select+icmp into {s,u}{min,max} bytecodealliance/wasmtime#5546

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Pseudo-Minimum and Pseudo-Maximum instructions #122

Pseudo-Minimum and Pseudo-Maximum instructions #122

Maratyszcza commented Oct 21, 2019 •

edited

Loading

penzn commented Oct 29, 2019

tlively commented Oct 29, 2019

sunfishcode commented Oct 29, 2019 •

edited

Loading

penzn commented Oct 30, 2019

dtig commented Oct 31, 2019

Maratyszcza commented Oct 31, 2019

Maratyszcza commented Oct 31, 2019

Maratyszcza commented Oct 31, 2019

Maratyszcza commented Oct 31, 2019

penzn commented Oct 31, 2019

Maratyszcza commented Oct 31, 2019

sunfishcode commented Oct 31, 2019

Maratyszcza commented Nov 11, 2019

dtig commented Nov 18, 2019

zeux commented Jan 12, 2020

Maratyszcza commented Jan 14, 2020

ngzhian commented May 7, 2020

Maratyszcza commented May 7, 2020 •

edited

Loading

ngzhian commented May 7, 2020

Maratyszcza commented May 7, 2020

ngzhian commented Sep 11, 2020

Maratyszcza commented Sep 11, 2020

Pseudo-Minimum and Pseudo-Maximum instructions #122

Pseudo-Minimum and Pseudo-Maximum instructions #122

Conversation

Maratyszcza commented Oct 21, 2019 • edited Loading

Introduction

New instructions

Mapping to Common Instruction Sets

x86/x86-64 processors with AVX instruction set

x86/x86-64 processors with SSE2 instruction set

Other processors and instruction sets

penzn commented Oct 29, 2019

tlively commented Oct 29, 2019

sunfishcode commented Oct 29, 2019 • edited Loading

penzn commented Oct 30, 2019

dtig commented Oct 31, 2019

Maratyszcza commented Oct 31, 2019

Maratyszcza commented Oct 31, 2019

Maratyszcza commented Oct 31, 2019

Maratyszcza commented Oct 31, 2019

penzn commented Oct 31, 2019

Maratyszcza commented Oct 31, 2019

sunfishcode commented Oct 31, 2019

Maratyszcza commented Nov 11, 2019

dtig commented Nov 18, 2019

zeux commented Jan 12, 2020

Maratyszcza commented Jan 14, 2020

ngzhian commented May 7, 2020

Maratyszcza commented May 7, 2020 • edited Loading

ngzhian commented May 7, 2020

Maratyszcza commented May 7, 2020

ngzhian commented Sep 11, 2020

Maratyszcza commented Sep 11, 2020

Maratyszcza commented Oct 21, 2019 •

edited

Loading

sunfishcode commented Oct 29, 2019 •

edited

Loading

Maratyszcza commented May 7, 2020 •

edited

Loading