Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Less basic fuzzer #21246

Closed
wants to merge 3 commits into from
Closed

Conversation

ProkopRandacek
Copy link
Contributor

This PR improves the current fuzzer implementation. It is not yet competetive with other fuzzers but it is working and ready for a first round of review (which I assume is better than to drop a huge PR later). I plan to keep improving the fuzzer in the near future.

The fuzzer stores a corpus of (once) interesting inputs inside a pair of memory mapped files, ready to be shared with other fuzzing processes (which there are none currently since I am not sure where is the best place to spawn them).

It randomly picks a input, mutates it, sees if it hits any new features in the instrumented program and if so, adds it to the corpus. The mutations are taken from llvm's libfuzzer.

My next plan is to:

  • track feature rarity
  • improve input selection based on rare features
  • implement more mutations
  • track values from cmp instrumentation and use in mutations
  • use inplace mutation
  • spawn more threads
  • improve feature capture
  • improve how crashing inputs are reported

closes #20814
closes #20803
wip on #20804

@andrewrk
Copy link
Member

Exciting! This is great timing. I was about to embark on a similar branch, so instead I will cooperate with your efforts here.

Can you share your testing methodology?

@andrewrk
Copy link
Member

When I ran this locally, it seemed to find a test failure rather quickly:

[nix-shell:~/dev/zig/build-release]$ stage3/bin/zig build test-std  -Dtest-filter="tokenizer" -Dskip-release-small -Dskip-release-fast -Dskip-debug -Dskip-non-native -Dskip-libc -Dskip-single-threaded --fuzz  --port 41099
info: web interface listening at http://127.0.0.1:41099/
test-std
└─ run test std-x86_64-linux.6.10.2...6.10.2-gnu.2.39-znver4-ReleaseSafe failure
failed with error.TestUnexpectedResult
error: the following command exited with error code 1:
/home/andy/dev/zig/.zig-cache/o/29911edb78c1004a6c983079ca845abb/test --seed=0x4ca436b6 --cache-dir=/home/andy/dev/zig/.zig-cache --listen=- 
error: all fuzz workers crashed
info: source changes detected; rebuilt wasm component
^C
Total Runs: 57649
Unique Runs: 64 (0.1%)
Coverage: 451 / 11895 (3.8%)

However, when I used the dump_corpus tool to extract strings

--- a/lib/fuzzer/dump_corpus.zig
+++ b/lib/fuzzer/dump_corpus.zig
@@ -39,6 +39,6 @@ pub fn main() void {
         // volatile was trying to achieve in the first place
         const str2: []const u8 = @volatileCast(str);
 
-        std.log.info("\"{s}\"", .{str2});
+        std.log.info("\"{}\"", .{std.zig.fmtEscapes(str2)});
     }
 }
--- a/lib/std/zig/tokenizer.zig
+++ b/lib/std/zig/tokenizer.zig
@@ -1841,10 +1841,7 @@ fn testTokenize(source: [:0]const u8, expected_token_tags: []const Token.Tag) !v
     try std.testing.expectEqual(source.len, last_token.loc.end);
 }
 
-test "fuzzable properties upheld" {
-    const source = std.testing.fuzzInput(.{});
-    const source0 = try std.testing.allocator.dupeZ(u8, source);
-    defer std.testing.allocator.free(source0);
+fn testProperties(source0: [:0]const u8) anyerror!void {
     var tokenizer = Tokenizer.init(source0);
     var tokenization_failed = false;
     while (true) {
@@ -1885,3 +1882,339 @@ test "fuzzable properties upheld" {
         }
     };
 }
+
+test "fuzzable properties upheld" {
+    const source = std.testing.fuzzInput(.{});
+    const source0 = try std.testing.allocator.dupeZ(u8, source);
+    defer std.testing.allocator.free(source0);
+    return testProperties(source0);
+}
+
+test "fuzzable properties upheld - corpus" {
+    for (corpus) |one| {
+        try testProperties(one);
+    }
+}
+
+const corpus = [_][:0]const u8{

the crashing input didn't seem to be present:

[nix-shell:~/dev/zig/build-release]$ stage3/bin/zig test ../lib/std/std.zig --test-filter "fuzzable properties upheld - corpus"
All 61 tests passed.

@ProkopRandacek
Copy link
Contributor Author

the crashing input didn't seem to be present

haha that is because it just calls exit when a test fails. I totally forgot to report found bad inputs :D

@ProkopRandacek
Copy link
Contributor Author

Can you share your testing methodology?

I am ashamed to admit that my testing methodology was to just run it on my toy bencode parser and see how quickly the coverage number goes up. This PR is more focused on setting up a working fuzzer-shaped project and less focused on tuning the fuzzer to get better results.

@andrewrk
Copy link
Member

haha that is because it just calls exit when a test fails. I totally forgot to report found bad inputs :D

Ah yes, that is intentional, because another way for it to find a bad input is to segfault, abort, or crash in another unexpected manner. So that's why I thought it should write the input before running the user's test code. This way, a test failure and a crashing process look the same to the fuzzer.

I am ashamed to admit that my testing methodology was to just run it on my toy bencode parser and see how quickly the coverage number goes up.

No shame in this! We're both starting with toy examples and working our way up.

@ProkopRandacek
Copy link
Contributor Author

haha that is because it just calls exit when a test fails. I totally forgot to report found bad inputs :D

Ah yes, that is intentional, because another way for it to find a bad input is to segfault, abort, or crash in another unexpected manner. So that's why I thought it should write the input before running the user's test code. This way, a test failure and a crashing process look the same to the fuzzer.

writing every input into the corpus before evaluating it doesn't work since deleting a string from the corpus in the current design is expensive and requires exclusive access to the entire corpus.

It should probably write the bad input from a panic handler (not into the corpus, just into some file) and somehow recover and keep fuzzing. The only way i can see how to recover from a panic handler back into the fuzzer is using longjump but that might be controversial. Maybe we can just start the fuzzer again. Crashing inputs should be rare..

@ProkopRandacek
Copy link
Contributor Author

Thank you for the feedback! I appreciate it. I am currently unfortunately overwhelmed with other work so I'll get back to this in approximately 3 weeks. I hope that is OK. I look forward to getting this merged.

@ProkopRandacek ProkopRandacek force-pushed the prokop-fuzzer branch 2 times, most recently from b1f3351 to a32bf7c Compare September 18, 2024 11:43
@andrewrk
Copy link
Member

Welcome back. I have some local changes I expect to push today and you'll need to rebase on them. I did things a little differently than you.

@ProkopRandacek
Copy link
Contributor Author

ProkopRandacek commented Sep 30, 2024

By local changes you mean that the fuzzer now gets a function pointer instead of being called from the test or is there something more? I like this change btw.

I'm aaking because I rebased on top of master and didn't get any conflicts.

Copy link
Member

@andrewrk andrewrk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PR writeup says "closes #20803" but then it doesn't do it, because the entire point is to handle abnormal process termination.

The issue also says,

to close it, interesting inputs that are found should be displayed in a reproducible manner, where re-running a particular command will in fact reproduce the crash.

and this PR also doesn't do that.

Please see #22862 for the direction the fuzzer is going.

@andrewrk
Copy link
Member

Closing for now. I would still welcome collaboration on this, however I do want to veto the design direction this PR takes the fuzzer. I think plenty of it could be salvageable however.

@andrewrk andrewrk closed this Feb 22, 2025
@ProkopRandacek
Copy link
Contributor Author

Sad to see this closed but hope it means that the zig fuzzer project moves forward stronger and better :D

Would you mind giving some more specific feedback (no rush)? I guess the limit of 2GiB of good inputs is unfortunate and relying on Linux lazy memory commit is not portable and could be designed better. But I think that the storage chosen is solid, or at least better than dumping one string per file as most other implementations do.

I am also quite unhappy with the testing, or rather my lack of ideas on how to test the performance of the entire system. Couldn't think of anything better than trying out the fuzzer on buggy code and seeing if it finds them. Automating this would probably be the first thing I would do if I were to write this again from scratch.

Looking forward to contributing to other fuzzer implementation asap. I really believe this is something that would catch a lot of bugs for a lot of people, especially when bundled with the compiler.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
4 participants