-
-
Notifications
You must be signed in to change notification settings - Fork 2.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Less basic fuzzer #21246
Less basic fuzzer #21246
Conversation
Exciting! This is great timing. I was about to embark on a similar branch, so instead I will cooperate with your efforts here. Can you share your testing methodology? |
When I ran this locally, it seemed to find a test failure rather quickly:
However, when I used the dump_corpus tool to extract strings --- a/lib/fuzzer/dump_corpus.zig
+++ b/lib/fuzzer/dump_corpus.zig
@@ -39,6 +39,6 @@ pub fn main() void {
// volatile was trying to achieve in the first place
const str2: []const u8 = @volatileCast(str);
- std.log.info("\"{s}\"", .{str2});
+ std.log.info("\"{}\"", .{std.zig.fmtEscapes(str2)});
}
} --- a/lib/std/zig/tokenizer.zig
+++ b/lib/std/zig/tokenizer.zig
@@ -1841,10 +1841,7 @@ fn testTokenize(source: [:0]const u8, expected_token_tags: []const Token.Tag) !v
try std.testing.expectEqual(source.len, last_token.loc.end);
}
-test "fuzzable properties upheld" {
- const source = std.testing.fuzzInput(.{});
- const source0 = try std.testing.allocator.dupeZ(u8, source);
- defer std.testing.allocator.free(source0);
+fn testProperties(source0: [:0]const u8) anyerror!void {
var tokenizer = Tokenizer.init(source0);
var tokenization_failed = false;
while (true) {
@@ -1885,3 +1882,339 @@ test "fuzzable properties upheld" {
}
};
}
+
+test "fuzzable properties upheld" {
+ const source = std.testing.fuzzInput(.{});
+ const source0 = try std.testing.allocator.dupeZ(u8, source);
+ defer std.testing.allocator.free(source0);
+ return testProperties(source0);
+}
+
+test "fuzzable properties upheld - corpus" {
+ for (corpus) |one| {
+ try testProperties(one);
+ }
+}
+
+const corpus = [_][:0]const u8{ the crashing input didn't seem to be present:
|
haha that is because it just calls exit when a test fails. I totally forgot to report found bad inputs :D |
I am ashamed to admit that my testing methodology was to just run it on my toy bencode parser and see how quickly the coverage number goes up. This PR is more focused on setting up a working fuzzer-shaped project and less focused on tuning the fuzzer to get better results. |
Ah yes, that is intentional, because another way for it to find a bad input is to segfault, abort, or crash in another unexpected manner. So that's why I thought it should write the input before running the user's test code. This way, a test failure and a crashing process look the same to the fuzzer.
No shame in this! We're both starting with toy examples and working our way up. |
writing every input into the corpus before evaluating it doesn't work since deleting a string from the corpus in the current design is expensive and requires exclusive access to the entire corpus. It should probably write the bad input from a panic handler (not into the corpus, just into some file) and somehow recover and keep fuzzing. The only way i can see how to recover from a panic handler back into the fuzzer is using longjump but that might be controversial. Maybe we can just start the fuzzer again. Crashing inputs should be rare.. |
Thank you for the feedback! I appreciate it. I am currently unfortunately overwhelmed with other work so I'll get back to this in approximately 3 weeks. I hope that is OK. I look forward to getting this merged. |
b1f3351
to
a32bf7c
Compare
Welcome back. I have some local changes I expect to push today and you'll need to rebase on them. I did things a little differently than you. |
By local changes you mean that the fuzzer now gets a function pointer instead of being called from the test or is there something more? I like this change btw. I'm aaking because I rebased on top of master and didn't get any conflicts. |
a32bf7c
to
bc18acc
Compare
bc18acc
to
3d05a46
Compare
f545448
to
5efee57
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The PR writeup says "closes #20803" but then it doesn't do it, because the entire point is to handle abnormal process termination.
The issue also says,
to close it, interesting inputs that are found should be displayed in a reproducible manner, where re-running a particular command will in fact reproduce the crash.
and this PR also doesn't do that.
Please see #22862 for the direction the fuzzer is going.
Closing for now. I would still welcome collaboration on this, however I do want to veto the design direction this PR takes the fuzzer. I think plenty of it could be salvageable however. |
Sad to see this closed but hope it means that the zig fuzzer project moves forward stronger and better :D Would you mind giving some more specific feedback (no rush)? I guess the limit of 2GiB of good inputs is unfortunate and relying on Linux lazy memory commit is not portable and could be designed better. But I think that the storage chosen is solid, or at least better than dumping one string per file as most other implementations do. I am also quite unhappy with the testing, or rather my lack of ideas on how to test the performance of the entire system. Couldn't think of anything better than trying out the fuzzer on buggy code and seeing if it finds them. Automating this would probably be the first thing I would do if I were to write this again from scratch. Looking forward to contributing to other fuzzer implementation asap. I really believe this is something that would catch a lot of bugs for a lot of people, especially when bundled with the compiler. |
This PR improves the current fuzzer implementation. It is not yet competetive with other fuzzers but it is working and ready for a first round of review (which I assume is better than to drop a huge PR later). I plan to keep improving the fuzzer in the near future.
The fuzzer stores a corpus of (once) interesting inputs inside a pair of memory mapped files, ready to be shared with other fuzzing processes (which there are none currently since I am not sure where is the best place to spawn them).
It randomly picks a input, mutates it, sees if it hits any new features in the instrumented program and if so, adds it to the corpus. The mutations are taken from llvm's libfuzzer.
My next plan is to:
closes #20814
closes #20803
wip on #20804