Spec-interpreter fuzzing: modify to generate and test single-instruction or no-control-flow cases #3251

cfallin · 2021-08-26T18:03:44Z

In #3186 we found that the WebAssembly spec interpreter may not be suitable for high-throughput fuzzing because of certain performance characteristics. While we were able to locally patch one source of quadratic runtime, there are likely others, and the patch was not really a good fit to upstream. We are looking into other peer implementations to fuzz against for general programs (aside from existing differential_wasmi fuzzing).

However, there is another way we could use the reference interpreter: we could validate single instructions' semantics more closely by generating short test cases that just execute an instruction (or a straight-line sequence with no control flow or calls) and return. This would be very valuable for instructions with more complex semantics, such as many of the SIMD instructions.

This is sort of a breadth-vs-depth situation: the "depth" comes from longer-running programs and tests interactions between instructions, and a good oracle there is a more optimized implementation; while the "breadth" comes from exhaustive many-inputs testing of very short sequences and tests mainly that we got the instruction semantics right. The combination seems like it should give pretty good coverage.

cc @abrown @alexcrichton

The text was updated successfully, but these errors were encountered:

This commit removes the `differential_spec` fuzz target for now, although this removal is intended to be temporary. We have bytecodealliance#3251 to track re-enabling the spec interpreter in a way that it won't time out, and additionally the spec interpreter is also failing to build with ocaml on oss-fuzz so that will also need to be investigated when re-enabling.

This commit removes the `differential_spec` fuzz target for now, although this removal is intended to be temporary. We have #3251 to track re-enabling the spec interpreter in a way that it won't time out, and additionally the spec interpreter is also failing to build with ocaml on oss-fuzz so that will also need to be investigated when re-enabling.

github-actions · 2021-11-19T16:23:38Z

Subscribe to Label Action

cc @fitzgen

This issue or pull request has been labeled: "fuzzing"

Thus the following users have been cc'd because of the following labels:

fitzgen: fuzzing

To subscribe or unsubscribe from this label, edit the .github/subscribe-to-label.json configuration file.

Learn more.

fitzgen · 2021-11-19T18:04:05Z

My intuition is that combining multiple instructions, even if it is just two or three, would give us much more bang for our buck than testing single instructions.

cfallin · 2021-11-19T18:39:34Z

Ah, old issue, paging back in context (I'm not sure why the labeler decided to tag you just now on a discussion from August!)...

Yup, agreed, that's the "or a straight-line sequence with no control flow or calls" above :-)

Adding some more thought just now: IMHO we should do it only up to the max depth of our instruction combining patterns. The point of this suggestion was specifically to target the fuzz throughput in a different way, at the input-value space rather than the input-program space. As we test straightline sequences of n instructions we have O(|wasm opcodes|^n) possibilities, and cut our time spent (on exploring argument values) per individual test program exponentially.

In other words, if my goal is to say that all cases for (say) SIMD instruction X have been covered, then separately fuzzing vector_add(X(x, y, z), [1.0, 1.0, 1.0, 1.0]) and vector_add(X(x, y, z), [2.0, 2.0, 2.0, 2.0]) doesn't really add any coverage. Testing vector_add(X, ...) and vector_mul(X, ...) might, if we have combining patterns that differ for those. But combining instructions may also mask some of the value space, for any operation that isn't bijective (e.g. extract one lane, or multiply by zero, or min/max).

So, said another way, what I think would be useful is a unit-test-inspired approach where we take each individual lowering and then throw a long test-vector of argument values at it. (In other words the Arbitrary type is a (short inst sequence, Vec<ArgValues>).) That gets us "each individual lowering is good" coverage in a way that might hide when we're testing n (for n larger) instructions at a time.

cc @abrown , any interest in looking at this as a way to get the spec interpreter in active fuzzing use?

abrown · 2021-11-23T23:49:47Z

I would eventually like to look into this more but if anyone else has the bandwidth feel free to jump in first (read: I'm not sure when I will get to this).

In looking into bytecodealliance/wasmtime#3251, I created a mechanism for restricting what kinds of instructions wasm-smith can generate. The [WebAssembly specification](https://webassembly.github.io/spec/core/syntax/instructions.html) organizes its instructions into several categories (e.g., numeric, vector, reference, control, etc.) and this change allows the user to configure the module generation based on these categories: ``` ``` There is some related configuration in wasm-smith to restrict what instructions are available. Currently, the wasm-smith configuration is organized around "proposals," which can be enabled or disabled. In theory, a user could be confused if the proposal was disabled and they explicitly enabled an instruction kind (e.g. reference)--"why aren't reference instructions being generated?" But this accident seems unlikely: `--allowed-instructions` defaults to enabling all kinds, so the user would have to explicitly filter out some kind, deliberately shooting themselves in the foot. Despite some risk of confusion (mitigated by the documentation in this PR), this filtering of instructions kinds ends up being useful in a general way: not only is it a start at fixing the issue above, it is useful for work I am doing to generate fuzz only parts of the spec.

In looking into bytecodealliance/wasmtime#3251, I created a mechanism for restricting what kinds of instructions wasm-smith can generate. The [WebAssembly specification](https://webassembly.github.io/spec/core/syntax/instructions.html) organizes its instructions into several categories (e.g., numeric, vector, reference, control, etc.) and this change allows the user to configure the module generation based on these categories: ``` head -c 10000 /dev/urandom | cargo run --bin wasm-smith -- --allowed-instructions memory,parametric -o test.wasm && wasm2wat test.wasm ``` There is some related configuration in wasm-smith to restrict what instructions are available. Currently, the wasm-smith configuration is organized around "proposals," which can be enabled or disabled. In theory, a user could be confused if the proposal was disabled and they explicitly enabled an instruction kind (e.g. reference)--"why aren't reference instructions being generated?" But this accident seems unlikely: `--allowed-instructions` defaults to enabling all kinds, so the user would have to explicitly filter out some kind, deliberately shooting themselves in the foot. Despite some risk of confusion (mitigated by the documentation in this PR), this filtering of instructions kinds ends up being useful in a general way: not only is it a start at fixing the issue above, it is useful for work I am doing to generate fuzz only parts of the spec.

In looking into bytecodealliance/wasmtime#3251, I created a mechanism for restricting what kinds of instructions wasm-smith can generate. The [WebAssembly specification](https://webassembly.github.io/spec/core/syntax/instructions.html) organizes its instructions into several categories (e.g., numeric, vector, reference, control, etc.) and this change allows the user to configure the module generation based on these categories: ``` head -c 10000 /dev/urandom | cargo run --bin wasm-smith -- --allowed-instructions memory,parametric -o test.wasm && wasm2wat test.wasm ``` There is some related configuration in wasm-smith to restrict what instructions are available. Currently, the wasm-smith configuration is organized around "proposals," which can be enabled or disabled. In theory, a user could be confused if the proposal was disabled and they explicitly enabled an instruction kind (e.g. reference)--"why aren't reference instructions being generated?" But this accident seems unlikely: `--allowed-instructions` defaults to enabling all kinds, so the user would have to explicitly filter out some kind, deliberately shooting themselves in the foot. Despite some risk of confusion (mitigated by the documentation in this PR), this filtering of instructions kinds ends up being useful in a general way: not only is it a start at fixing the issue above, it is useful for work I am doing to fuzz only parts of the spec.

abrown · 2022-01-05T23:58:11Z

@cfallin, in a separate project I did something using proptest that reminded me of this. The basic idea, adapted to Wasmtime, would be to:

construct a single-instruction WebAssembly module
create random inputs using proptest
run the module's sole function in two different engines and check the results for equality

I wasn't too concerned with multiple-instruction sequences and other things mentioned above but if we have a place to put this proptest-guided fuzz test then I would be happy to contribute the code.

@cfallin

As proposed by @cfallin in bytecodealliance#3251, this change adds a way to generate a Wasm module for a single instruction. It captures the necessary parameter and result types so that fuzzing can not only choose which instruction to check but also generate values to pass to the instruction. Not all instructions are available yet, but a significant portion of scalar instructions are implemented in this change. This does not wire the generator up to any fuzz targets.

@cfallin

As proposed by @cfallin in bytecodealliance#3251, this change adds a way to generate a Wasm module for a single instruction. It captures the necessary parameter and result types so that fuzzing can not only choose which instruction to check but also generate values to pass to the instruction. Not all instructions are available yet, but a significant portion of scalar instructions are implemented in this change. This does not wire the generator up to any fuzz targets.

@cfallin

* fuzz: add a single instruction module generator As proposed by @cfallin in #3251, this change adds a way to generate a Wasm module for a single instruction. It captures the necessary parameter and result types so that fuzzing can not only choose which instruction to check but also generate values to pass to the instruction. Not all instructions are available yet, but a significant portion of scalar instructions are implemented in this change. This does not wire the generator up to any fuzz targets. * review: use raw string in test * review: remove once_cell, use slices * review: refactor macros to use valtype! * review: avoid cloning when choosing a SingleInstModule

abrown · 2022-08-23T20:28:21Z

I think this is closed by #4409 which is now in use by the new differential fuzz target (#4515).

alexcrichton mentioned this issue Sep 3, 2021

Timeouts in spec interpreter fuzzing #3186

Closed

alexcrichton added the fuzz-bug Bugs found by a fuzzer label Sep 3, 2021

alexcrichton mentioned this issue Sep 30, 2021

Remove spec interpreter fuzz target temporarily #3399

Merged

alexcrichton added fuzzing Issues related to our fuzzing infrastructure and removed fuzz-bug Bugs found by a fuzzer labels Nov 19, 2021

abrown mentioned this issue Nov 30, 2021

wasm-smith: restrict which kinds of instructions can be generated bytecodealliance/wasm-tools#400

Merged

cfallin mentioned this issue Jun 26, 2022

Differential fuzzing with wasm-smith isn't the most effective #4322

Closed

abrown mentioned this issue Jul 7, 2022

fuzz: add a single instruction module generator #4409

Merged

abrown closed this as completed Aug 23, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Spec-interpreter fuzzing: modify to generate and test single-instruction or no-control-flow cases #3251

Spec-interpreter fuzzing: modify to generate and test single-instruction or no-control-flow cases #3251

cfallin commented Aug 26, 2021

github-actions bot commented Nov 19, 2021

fitzgen commented Nov 19, 2021

cfallin commented Nov 19, 2021

abrown commented Nov 23, 2021

abrown commented Jan 5, 2022

abrown commented Aug 23, 2022

Spec-interpreter fuzzing: modify to generate and test single-instruction or no-control-flow cases #3251

Spec-interpreter fuzzing: modify to generate and test single-instruction or no-control-flow cases #3251

Comments

cfallin commented Aug 26, 2021

github-actions bot commented Nov 19, 2021

Subscribe to Label Action

fitzgen commented Nov 19, 2021

cfallin commented Nov 19, 2021

abrown commented Nov 23, 2021

abrown commented Jan 5, 2022

abrown commented Aug 23, 2022