Less basic fuzzer #21246

ProkopRandacek · 2024-08-29T22:38:46Z

This PR improves the current fuzzer implementation. It is not yet competetive with other fuzzers but it is working and ready for a first round of review (which I assume is better than to drop a huge PR later). I plan to keep improving the fuzzer in the near future.

The fuzzer stores a corpus of (once) interesting inputs inside a pair of memory mapped files, ready to be shared with other fuzzing processes (which there are none currently since I am not sure where is the best place to spawn them).

It randomly picks a input, mutates it, sees if it hits any new features in the instrumented program and if so, adds it to the corpus. The mutations are taken from llvm's libfuzzer.

My next plan is to:

track feature rarity
improve input selection based on rare features
implement more mutations
track values from cmp instrumentation and use in mutations
use inplace mutation
spawn more threads
improve feature capture
improve how crashing inputs are reported

closes #20814
closes #20803
wip on #20804

andrewrk · 2024-08-29T22:51:15Z

Exciting! This is great timing. I was about to embark on a similar branch, so instead I will cooperate with your efforts here.

Can you share your testing methodology?

lib/fuzzer/main.zig

lib/fuzzer/dump_corpus.zig

lib/fuzzer/InputPoolPosix.zig

lib/fuzzer/util.zig

andrewrk · 2024-08-30T00:18:29Z

When I ran this locally, it seemed to find a test failure rather quickly:

[nix-shell:~/dev/zig/build-release]$ stage3/bin/zig build test-std  -Dtest-filter="tokenizer" -Dskip-release-small -Dskip-release-fast -Dskip-debug -Dskip-non-native -Dskip-libc -Dskip-single-threaded --fuzz  --port 41099
info: web interface listening at http://127.0.0.1:41099/
test-std
└─ run test std-x86_64-linux.6.10.2...6.10.2-gnu.2.39-znver4-ReleaseSafe failure
failed with error.TestUnexpectedResult
error: the following command exited with error code 1:
/home/andy/dev/zig/.zig-cache/o/29911edb78c1004a6c983079ca845abb/test --seed=0x4ca436b6 --cache-dir=/home/andy/dev/zig/.zig-cache --listen=- 
error: all fuzz workers crashed
info: source changes detected; rebuilt wasm component
^C

Total Runs: 57649
Unique Runs: 64 (0.1%)
Coverage: 451 / 11895 (3.8%)

However, when I used the dump_corpus tool to extract strings

--- a/lib/fuzzer/dump_corpus.zig
+++ b/lib/fuzzer/dump_corpus.zig
@@ -39,6 +39,6 @@ pub fn main() void {
         // volatile was trying to achieve in the first place
         const str2: []const u8 = @volatileCast(str);
 
-        std.log.info("\"{s}\"", .{str2});
+        std.log.info("\"{}\"", .{std.zig.fmtEscapes(str2)});
     }
 }

--- a/lib/std/zig/tokenizer.zig
+++ b/lib/std/zig/tokenizer.zig
@@ -1841,10 +1841,7 @@ fn testTokenize(source: [:0]const u8, expected_token_tags: []const Token.Tag) !v
     try std.testing.expectEqual(source.len, last_token.loc.end);
 }
 
-test "fuzzable properties upheld" {
-    const source = std.testing.fuzzInput(.{});
-    const source0 = try std.testing.allocator.dupeZ(u8, source);
-    defer std.testing.allocator.free(source0);
+fn testProperties(source0: [:0]const u8) anyerror!void {
     var tokenizer = Tokenizer.init(source0);
     var tokenization_failed = false;
     while (true) {
@@ -1885,3 +1882,339 @@ test "fuzzable properties upheld" {
         }
     };
 }
+
+test "fuzzable properties upheld" {
+    const source = std.testing.fuzzInput(.{});
+    const source0 = try std.testing.allocator.dupeZ(u8, source);
+    defer std.testing.allocator.free(source0);
+    return testProperties(source0);
+}
+
+test "fuzzable properties upheld - corpus" {
+    for (corpus) |one| {
+        try testProperties(one);
+    }
+}
+
+const corpus = [_][:0]const u8{

the crashing input didn't seem to be present:

[nix-shell:~/dev/zig/build-release]$ stage3/bin/zig test ../lib/std/std.zig --test-filter "fuzzable properties upheld - corpus"
All 61 tests passed.

ProkopRandacek · 2024-08-30T20:54:26Z

the crashing input didn't seem to be present

haha that is because it just calls exit when a test fails. I totally forgot to report found bad inputs :D

ProkopRandacek · 2024-08-30T20:59:48Z

Can you share your testing methodology?

I am ashamed to admit that my testing methodology was to just run it on my toy bencode parser and see how quickly the coverage number goes up. This PR is more focused on setting up a working fuzzer-shaped project and less focused on tuning the fuzzer to get better results.

andrewrk · 2024-08-30T21:52:35Z

haha that is because it just calls exit when a test fails. I totally forgot to report found bad inputs :D

Ah yes, that is intentional, because another way for it to find a bad input is to segfault, abort, or crash in another unexpected manner. So that's why I thought it should write the input before running the user's test code. This way, a test failure and a crashing process look the same to the fuzzer.

I am ashamed to admit that my testing methodology was to just run it on my toy bencode parser and see how quickly the coverage number goes up.

No shame in this! We're both starting with toy examples and working our way up.

ProkopRandacek · 2024-08-31T10:04:08Z

haha that is because it just calls exit when a test fails. I totally forgot to report found bad inputs :D

Ah yes, that is intentional, because another way for it to find a bad input is to segfault, abort, or crash in another unexpected manner. So that's why I thought it should write the input before running the user's test code. This way, a test failure and a crashing process look the same to the fuzzer.

writing every input into the corpus before evaluating it doesn't work since deleting a string from the corpus in the current design is expensive and requires exclusive access to the entire corpus.

It should probably write the bad input from a panic handler (not into the corpus, just into some file) and somehow recover and keep fuzzing. The only way i can see how to recover from a panic handler back into the fuzzer is using longjump but that might be controversial. Maybe we can just start the fuzzer again. Crashing inputs should be rare..

ProkopRandacek · 2024-09-03T18:31:15Z

Thank you for the feedback! I appreciate it. I am currently unfortunately overwhelmed with other work so I'll get back to this in approximately 3 weeks. I hope that is OK. I look forward to getting this merged.

andrewrk · 2024-09-18T17:46:35Z

Welcome back. I have some local changes I expect to push today and you'll need to rebase on them. I did things a little differently than you.

ProkopRandacek · 2024-09-30T06:12:43Z

By local changes you mean that the fuzzer now gets a function pointer instead of being called from the test or is there something more? I like this change btw.

I'm aaking because I rebased on top of master and didn't get any conflicts.

lib/fuzzer/main.zig

lib/fuzzer.zig

lib/fuzzer/mutate.zig

lib/std/Thread/Futex.zig

lib/std/testing.zig

andrewrk

The PR writeup says "closes #20803" but then it doesn't do it, because the entire point is to handle abnormal process termination.

The issue also says,

to close it, interesting inputs that are found should be displayed in a reproducible manner, where re-running a particular command will in fact reproduce the crash.

and this PR also doesn't do that.

Please see #22862 for the direction the fuzzer is going.

andrewrk · 2025-02-22T22:30:26Z

Closing for now. I would still welcome collaboration on this, however I do want to veto the design direction this PR takes the fuzzer. I think plenty of it could be salvageable however.

ProkopRandacek · 2025-02-23T08:21:33Z

Sad to see this closed but hope it means that the zig fuzzer project moves forward stronger and better :D

Would you mind giving some more specific feedback (no rush)? I guess the limit of 2GiB of good inputs is unfortunate and relying on Linux lazy memory commit is not portable and could be designed better. But I think that the storage chosen is solid, or at least better than dumping one string per file as most other implementations do.

I am also quite unhappy with the testing, or rather my lack of ideas on how to test the performance of the entire system. Couldn't think of anything better than trying out the fuzzer on buggy code and seeing if it finds them. Automating this would probably be the first thing I would do if I were to write this again from scratch.

Looking forward to contributing to other fuzzer implementation asap. I really believe this is something that would catch a lot of bugs for a lot of people, especially when bundled with the compiler.

andrewrk requested changes Aug 29, 2024

View reviewed changes

lib/fuzzer/main.zig Outdated Show resolved Hide resolved

andrewrk reviewed Aug 29, 2024

View reviewed changes

lib/fuzzer/dump_corpus.zig Outdated Show resolved Hide resolved

andrewrk requested changes Aug 29, 2024

View reviewed changes

lib/fuzzer/InputPoolPosix.zig Outdated Show resolved Hide resolved

lib/fuzzer/InputPoolPosix.zig Outdated Show resolved Hide resolved

lib/fuzzer/InputPoolPosix.zig Outdated Show resolved Hide resolved

andrewrk reviewed Aug 29, 2024

View reviewed changes

lib/fuzzer/util.zig Outdated Show resolved Hide resolved

ProkopRandacek force-pushed the prokop-fuzzer branch 2 times, most recently from b1f3351 to a32bf7c Compare September 18, 2024 11:43

ProkopRandacek force-pushed the prokop-fuzzer branch from a32bf7c to bc18acc Compare September 30, 2024 06:51

ProkopRandacek force-pushed the prokop-fuzzer branch from bc18acc to 3d05a46 Compare November 11, 2024 11:38

gooncreeper suggested changes Nov 23, 2024

View reviewed changes

gooncreeper suggested changes Nov 25, 2024

View reviewed changes

ProkopRandacek added 3 commits February 5, 2025 19:05

fuzzer: document entry point

cd11688

fuzzer: webui modifications

29840b5

fuzzer: less basic

5efee57

ProkopRandacek force-pushed the prokop-fuzzer branch from f545448 to 5efee57 Compare February 5, 2025 20:27

andrewrk requested changes Feb 11, 2025

View reviewed changes

andrewrk closed this Feb 22, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Less basic fuzzer #21246

Less basic fuzzer #21246

ProkopRandacek commented Aug 29, 2024

andrewrk commented Aug 29, 2024

andrewrk commented Aug 30, 2024

ProkopRandacek commented Aug 30, 2024

ProkopRandacek commented Aug 30, 2024

andrewrk commented Aug 30, 2024

ProkopRandacek commented Aug 31, 2024

ProkopRandacek commented Sep 3, 2024

andrewrk commented Sep 18, 2024

ProkopRandacek commented Sep 30, 2024 •

edited

Loading

andrewrk left a comment

andrewrk commented Feb 22, 2025

ProkopRandacek commented Feb 23, 2025

Less basic fuzzer #21246

Less basic fuzzer #21246

Conversation

ProkopRandacek commented Aug 29, 2024

andrewrk commented Aug 29, 2024

andrewrk commented Aug 30, 2024

ProkopRandacek commented Aug 30, 2024

ProkopRandacek commented Aug 30, 2024

andrewrk commented Aug 30, 2024

ProkopRandacek commented Aug 31, 2024

ProkopRandacek commented Sep 3, 2024

andrewrk commented Sep 18, 2024

ProkopRandacek commented Sep 30, 2024 • edited Loading

andrewrk left a comment

Choose a reason for hiding this comment

andrewrk commented Feb 22, 2025

ProkopRandacek commented Feb 23, 2025

ProkopRandacek commented Sep 30, 2024 •

edited

Loading