Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incorrect execution of openjph compiled file with SIMD #4315

Closed
yurydelendik opened this issue Jun 24, 2022 · 5 comments
Closed

Incorrect execution of openjph compiled file with SIMD #4315

yurydelendik opened this issue Jun 24, 2022 · 5 comments

Comments

@yurydelendik
Copy link
Contributor

yurydelendik commented Jun 24, 2022

While working on the SIMD tests at bytecodealliance/sightglass#189, I found that this file is not executed properly. The app informs:

Size: 400x400
ojph error 0x000300A1 at ojph_codestream.cpp:4067: Error decoding a codeblock

Though expected something like:

Size: 400x400
Line 100:
 [0]:  37 36 35 36 37....

It appears to be running on Node and SM, so I assume it is an issue of correctness. Attaching the compiled benchmark (minus bench_start/_end):

test-case.zip

P.S. on Intel CPU x64

@yurydelendik
Copy link
Contributor Author

Also, works on wasmtime for ARM64

@alexcrichton
Copy link
Member

I've done some work to narrow this down but unfortunately I'm not being super successful at the moment. My strategy for narrowing this down has been:

  • Modify the CLI to print the sha256 of memory after running the wasm
  • Modify fuel to get checked before every single instruction instead of just loop headers and function entries

Next I have a script which runs the binary on aarch64 and x86_64 with a binary-search-of-sorts. This pins down the precise fuel count where with one more fuel the execution differs. My goal here is to find the precise wasm instruction at which execution diverges here. This quickly-ish found results and I started sprinkling "debug stores" within the *.wat. Basically I'd take a value of the stack, shove it in a temporary global, store the global to linear memory address 0, and then load the global back onto the stack. The goal here was a quick-and-dirty way of debugging the wasm value stack.

In doing this I've narrowed this down to function 40 in the input file above. Specifically these two instructions:

  0x4942 | fd 0c 00 01 | V128Const { value: V128([0, 1, 0, 1, 0, 1, 0, 1, 4, 5, 4, 5, 4, 5, 4, 5]) }
         | 00 01 00 01
         | 00 01 04 05
         | 04 05 04 05
         | 04 05
  0x4954 | fd 0e       | I8x16Swizzle

Unfortunately this v128.const value is not actually going into i8x16.swizzle. With my "debug store" inserted between these two instructions it turns out that the value stored in memory is indeed not the v128.const here. Instead some other bit-pattern gets stored in memory and naturally when that gets fed into i8x16.swizzle everything goes awry.

Other than that though I have now hit a wall and am unable to make progress debugging this. Function 40 is huge and it's tough to explore the disassembly. Using rr I can find the portion of the function that actually creates the value which eventually gets fed into i8x16.swizzle. This should feed in the v128.const value but it is not. Unfortunately I can't figure out how to translate this random snippet of assembly back to the source file.

I have confirmed, though, that 0.35.0 reproduces this issue so I don't believe this is a problem with recent developments like regalloc2, recent ISLE migrations, or the alias analysis pass added.

Others who are more familiar with the backend may know more about how to debug this though. IIRC there's some optimizations around constants, constant pools, and trying to not always reify something into a register or something like that, and something may be going awry there.


... aaaand as I am writing this up I took another look at the definition of the Swizzle instruction. This looks suspicious, specifically treating the mask as a writable reg since I think in this case it's the mask that's getting corrupted going into the swizzle instruction.

I added a move from the input to a temporary register and that didn't actually fix the original program but it did surprisingly allow it to make more progress. I think that means there's actually at least two bugs here!

Doing my bisection again I think I found a much simpler reproduction, namely:

(module
  (func (param v128 v128 i32) (result v128)
        local.get 0
        local.get 1
        local.get 2
        select
        ))

I don't think that our codegen for the select instruction is correct for v128 values. Namely this produces:

0000000000000000 <_wasm_function_0>:
       0:       55                      push   %rbp
       1:       48 89 e5                mov    %rsp,%rbp
       4:       85 d2                   test   %edx,%edx
       6:       0f 84 04 00 00 00       je     10 <_wasm_function_0+0x10>
       c:       f3 0f 10 c8             movss  %xmm0,%xmm1
      10:       66 0f 6f c1             movdqa %xmm1,%xmm0
      14:       48 89 ec                mov    %rbp,%rsp
      17:       5d                      pop    %rbp
      18:       c3                      retq

where the movss I don't believe is correct, I think that needs to be movdqa or something sized larger.

I'm running out of steam for tonight so I think this is as far as I'll get today. I'm actually quite worried that fuzzing hasn't discovered anything in this area. These are somewhat trivial bugs which in theory should be found almost immediately by any halfway decent fuzzing.

alexcrichton added a commit to alexcrichton/wasmtime that referenced this issue Jun 25, 2022
This commit fixes a mistake in the `Swizzle` opcode implementation in
the x64 backend of Cranelift. Previously an input register was casted to
a writable register and then modified, which I believe instructions are
not supposed to do. This was discovered as part of my investigation
into bytecodealliance#4315.
@alexcrichton
Copy link
Member

Ok I can confirm that with #4317 and #4318 the original test case now has the same behavior on arm64 and on x86_64

alexcrichton added a commit that referenced this issue Jun 27, 2022
This commit fixes a mistake in the `Swizzle` opcode implementation in
the x64 backend of Cranelift. Previously an input register was casted to
a writable register and then modified, which I believe instructions are
not supposed to do. This was discovered as part of my investigation
into #4315.
alexcrichton added a commit to alexcrichton/wasmtime that referenced this issue Jun 27, 2022
…e#4318)

This commit fixes a mistake in the `Swizzle` opcode implementation in
the x64 backend of Cranelift. Previously an input register was casted to
a writable register and then modified, which I believe instructions are
not supposed to do. This was discovered as part of my investigation
into bytecodealliance#4315.
@alexcrichton
Copy link
Member

I realize now I forgot to say this earlier but thanks @yurydelendik for taking the time to file this! We'll be making an 0.38.1 release with these fixes soon and a few other fixes for Cranelift (unrelated to this).

We also plan on filing a CVE about these micompiles because while they don't affect hosts themselves this could affect in-wasm execution which is often just as important in some environments.

alexcrichton added a commit that referenced this issue Jun 27, 2022
* x64: Fix codegen for the `i8x16.swizzle` instruction (#4318)

This commit fixes a mistake in the `Swizzle` opcode implementation in
the x64 backend of Cranelift. Previously an input register was casted to
a writable register and then modified, which I believe instructions are
not supposed to do. This was discovered as part of my investigation
into #4315.

* x64: Fix codegen for the `select` instruction with v128 (#4317)

This commit fixes a bug in the previous codegen for the `select`
instruction when the operations of the `select` were of the `v128` type.
Previously teh `XmmCmove` instruction only stored an `OperandSize` of 32
or 64 for a 64 or 32-bit move, but this was also used for these 128-bit
types which meant that when used the wrong move instruction was
generated. The fix applied here is to store the whole `Type` being moved
so the 128-bit variant can be selected as well.
@alexcrichton
Copy link
Member

I believe everything is now merged so I'm going to close this.

afonso360 pushed a commit to afonso360/wasmtime that referenced this issue Jun 30, 2022
…e#4318)

This commit fixes a mistake in the `Swizzle` opcode implementation in
the x64 backend of Cranelift. Previously an input register was casted to
a writable register and then modified, which I believe instructions are
not supposed to do. This was discovered as part of my investigation
into bytecodealliance#4315.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants