-
-
Notifications
You must be signed in to change notification settings - Fork 406
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
As a follow-up to #1304, this PR introduces additional fuzz targets, fuzz test dictionaries, and `fuzzing/fuzz-targets/test_utils.py` which includes test utilities to help DRY fuzzing test code. The changes here should increase fuzzing coverage from ~2% to ~17% based on the results of my local testing. The commit messages in this PR should describe the specific changes, but the most significant information detailed below: ## New Fuzz Targets **fuzzing/fuzz-targets/fuzz_bundle.py** - Tests the `Bundle` related functionality using fuzzer provided data. - This test is based on [`test_bundle.py`](https://github.com/jelmer/dulwich/blob/9d13065fab6bdc0251d25bda79bb013d01f42f24/tests/test_bundle.py), the unit test of the same functionality. **fuzzing/fuzz-targets/fuzz_object_store.py** - Tests the `Blob`, `Tree`, and `Commit` classes using fuzzer provided data. - This test is based on the example code in the [Object Store tutorial](https://www.dulwich.io/docs/tutorial/object-store.html), `fuzz_object_store.py` uses a `MemoryRepo` to avoid disk IO where possible, in the interest of test execution efficiency. **fuzzing/fuzz-targets/fuzz_repo.py** - Tests basic functionality of the `Repo` class. - This test must perform actual disk IO to effectively test all functionality, so it is somewhat slow compared to other fuzz targets in this repo. There might be ways to improve this, but as of this PR it works well enough. ## `fuzzing/fuzz-targets/test_utils.py` - Adds a `EnhancedFuzzedDataProvider` class that extends `atheris.FuzzedDataProvider` to abstract some common use-cases into DRY method calls. - The `is_expected_error` helper function was extracted from `fuzz_configfile.py` into this dedicated test utility file so it can be reused by other fuzz harnesses in `fuzz-targets/`. - Also renamed and better documented the `is_expected_error` function now that it is shared. ## Other Notes I've tested all of the changes proposed here extensively in my local environment. They are working well enough that I feel they are a net value add to the fuzz test suite, but **these tests can likely be further optimized to improve coverage and efficiency**. I plan to keep an eye on their performance and further optimize the tests & supporting code as needed.
- Loading branch information
Showing
10 changed files
with
343 additions
and
11 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,5 @@ | ||
"# v2 git bundle" | ||
"# v3 git bundle" | ||
"\\001\\000\\000\\000" | ||
"\\001\\000" | ||
"\\377\\377\\377\\377\\377\\377\\377\\377" |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,12 @@ | ||
"_tu" | ||
"old" | ||
"\\007\\000\\000\\000\\000\\000\\000\\000" | ||
"rename_detector" | ||
"\\000\\000\\000\\000\\000\\000\\000\\000" | ||
"\\031\\000\\000\\000\\000\\000\\000\\000" | ||
"\\032\\000\\000\\000\\000\\000\\000\\000" | ||
"\\001\\000\\000\\000" | ||
"\\000\\000\\000\\000" | ||
"\\001\\000" | ||
"\\032\\000\\000\\000\\000\\000\\000\\000" | ||
"\\002\\000\\000\\000\\000\\000\\000\\000" |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,2 @@ | ||
"{self.size}" | ||
"win32" |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,54 @@ | ||
import sys | ||
from io import BytesIO | ||
|
||
import atheris | ||
|
||
with atheris.instrument_imports(): | ||
# We instrument `test_utils` as well, so it doesn't block coverage analysis in Fuzz Introspector: | ||
from test_utils import EnhancedFuzzedDataProvider, is_expected_exception | ||
|
||
from dulwich.bundle import Bundle, read_bundle, write_bundle | ||
from dulwich.pack import PackData, write_pack_objects | ||
|
||
|
||
def TestOneInput(data): | ||
fdp = EnhancedFuzzedDataProvider(data) | ||
bundle = Bundle() | ||
bundle.version = fdp.PickValueInList([2, 3, None]) | ||
bundle.references = {fdp.ConsumeRandomString(): fdp.ConsumeBytes(20)} | ||
bundle.prerequisites = [(fdp.ConsumeBytes(20), fdp.ConsumeRandomBytes())] | ||
bundle.capabilities = { | ||
fdp.ConsumeRandomString(): fdp.ConsumeRandomString(), | ||
} | ||
|
||
b = BytesIO() | ||
write_pack_objects(b.write, []) | ||
b.seek(0) | ||
bundle.pack_data = PackData.from_file(b) | ||
|
||
# Test __repr__ method | ||
_ = repr(bundle) | ||
|
||
try: | ||
bundle_file = BytesIO() | ||
write_bundle(bundle_file, bundle) | ||
_ = read_bundle(bundle_file) | ||
except (AttributeError, UnicodeEncodeError, AssertionError) as e: | ||
expected_exceptions = [ | ||
"'bytes' object has no attribute 'encode'", | ||
"surrogates not allowed", | ||
"unsupported bundle format header", | ||
] | ||
if is_expected_exception(expected_exceptions, e): | ||
return -1 | ||
else: | ||
raise e | ||
|
||
|
||
def main(): | ||
atheris.Setup(sys.argv, TestOneInput) | ||
atheris.Fuzz() | ||
|
||
|
||
if __name__ == "__main__": | ||
main() |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,94 @@ | ||
import stat | ||
import sys | ||
from io import BytesIO | ||
|
||
import atheris | ||
|
||
with atheris.instrument_imports(): | ||
# We instrument `test_utils` as well, so it doesn't block coverage analysis in Fuzz Introspector: | ||
from test_utils import EnhancedFuzzedDataProvider, is_expected_exception | ||
|
||
from dulwich.errors import ObjectFormatException | ||
from dulwich.objects import S_IFGITLINK, Blob, Commit, Tree | ||
from dulwich.patch import write_tree_diff | ||
from dulwich.repo import ( | ||
InvalidUserIdentity, | ||
MemoryRepo, | ||
) | ||
|
||
|
||
def TestOneInput(data): | ||
fdp = EnhancedFuzzedDataProvider(data) | ||
repo = MemoryRepo() | ||
blob = Blob.from_string(fdp.ConsumeRandomBytes()) | ||
tree = Tree() | ||
tree.add( | ||
fdp.ConsumeRandomBytes(), | ||
fdp.PickValueInList([stat.S_IFREG, stat.S_IFLNK, stat.S_IFDIR, S_IFGITLINK]), | ||
blob.id, | ||
) | ||
commit = Commit() | ||
commit.tree = tree.id | ||
commit.author = fdp.ConsumeRandomBytes() | ||
commit.committer = fdp.ConsumeRandomBytes() | ||
commit.commit_time = fdp.ConsumeRandomInt() | ||
commit.commit_timezone = fdp.ConsumeRandomInt() | ||
commit.author_time = fdp.ConsumeRandomInt() | ||
commit.author_timezone = fdp.ConsumeRandomInt() | ||
commit.message = fdp.ConsumeRandomBytes() | ||
|
||
object_store = repo.object_store | ||
|
||
try: | ||
object_store.add_object(blob) | ||
object_store.add_object(tree) | ||
object_store.add_object(commit) | ||
except (InvalidUserIdentity, ObjectFormatException): | ||
return -1 | ||
except ValueError as e: | ||
expected_exceptions = [ | ||
"subsection not found", | ||
"Unable to handle non-minute offset", | ||
] | ||
if is_expected_exception(expected_exceptions, e): | ||
return -1 | ||
else: | ||
raise e | ||
|
||
commit2 = Commit() | ||
commit2.tree = tree.id | ||
commit2.parents = [commit.id] | ||
commit2.author = commit.author | ||
commit2.committer = commit.committer | ||
commit2.commit_time = fdp.ConsumeRandomInt() | ||
commit2.commit_timezone = fdp.ConsumeRandomInt() | ||
commit2.author_time = fdp.ConsumeRandomInt() | ||
commit2.author_timezone = fdp.ConsumeRandomInt() | ||
commit2.message = fdp.ConsumeRandomBytes() | ||
|
||
try: | ||
blob.data = fdp.ConsumeRandomBytes() | ||
repo.object_store.add_object(blob) | ||
repo.object_store.add_object(tree) | ||
repo.object_store.add_object(commit2) | ||
out = BytesIO() | ||
write_tree_diff(out, repo.object_store, commit.tree, tree.id) | ||
except (InvalidUserIdentity, ObjectFormatException): | ||
return -1 | ||
except ValueError as e: | ||
expected_exceptions = [ | ||
"Unable to handle non-minute offset", | ||
] | ||
if is_expected_exception(expected_exceptions, e): | ||
return -1 | ||
else: | ||
raise e | ||
|
||
|
||
def main(): | ||
atheris.Setup(sys.argv, TestOneInput) | ||
atheris.Fuzz() | ||
|
||
|
||
if __name__ == "__main__": | ||
main() |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,62 @@ | ||
import os | ||
import sys | ||
import tempfile | ||
|
||
import atheris | ||
|
||
with atheris.instrument_imports(): | ||
# We instrument `test_utils` as well, so it doesn't block coverage analysis in Fuzz Introspector: | ||
from test_utils import EnhancedFuzzedDataProvider | ||
|
||
from dulwich.repo import ( | ||
InvalidUserIdentity, | ||
Repo, | ||
) | ||
|
||
|
||
def TestOneInput(data): | ||
fdp = EnhancedFuzzedDataProvider(data) | ||
with tempfile.TemporaryDirectory() as temp_dir: | ||
repo = Repo.init(temp_dir) | ||
repo.set_description(fdp.ConsumeRandomBytes()) | ||
repo.get_description() | ||
|
||
# Generate a minimal set of files based on fuzz data to minimize I/O operations. | ||
file_paths = [ | ||
os.path.join(temp_dir, f"File{i}") | ||
for i in range(min(3, fdp.ConsumeIntInRange(1, 3))) | ||
] | ||
for file_path in file_paths: | ||
with open(file_path, "wb") as f: | ||
f.write(fdp.ConsumeRandomBytes()) | ||
|
||
try: | ||
repo.do_commit( | ||
message=fdp.ConsumeRandomBytes(), | ||
committer=fdp.ConsumeRandomBytes(), | ||
author=fdp.ConsumeRandomBytes(), | ||
commit_timestamp=fdp.ConsumeRandomInt(), | ||
commit_timezone=fdp.ConsumeRandomInt(), | ||
author_timestamp=fdp.ConsumeRandomInt(), | ||
author_timezone=fdp.ConsumeRandomInt(), | ||
) | ||
except InvalidUserIdentity: | ||
return -1 | ||
|
||
for file_path in file_paths: | ||
with open(file_path, "wb") as f: | ||
f.write(fdp.ConsumeRandomBytes()) | ||
|
||
repo.stage(file_paths) | ||
repo.do_commit( | ||
message=fdp.ConsumeRandomBytes(), | ||
) | ||
|
||
|
||
def main(): | ||
atheris.Setup(sys.argv, TestOneInput) | ||
atheris.Fuzz() | ||
|
||
|
||
if __name__ == "__main__": | ||
main() |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,86 @@ | ||
from typing import List # pragma: no cover | ||
|
||
import atheris # pragma: no cover | ||
|
||
|
||
@atheris.instrument_func | ||
def is_expected_exception( | ||
error_message_list: List[str], exception: Exception | ||
): # pragma: no cover | ||
"""Checks if the message of a given exception matches any of the expected error messages. | ||
Args: | ||
error_message_list (List[str]): A list of error message substrings to check against the exception's message. | ||
exception (Exception): The exception object raised during execution. | ||
Returns: | ||
bool: True if the exception's message contains any of the substrings from the error_message_list, otherwise False. | ||
""" | ||
for error in error_message_list: | ||
if error in str(exception): | ||
return True | ||
return False | ||
|
||
|
||
class EnhancedFuzzedDataProvider(atheris.FuzzedDataProvider): # pragma: no cover | ||
"""Extends atheris.FuzzedDataProvider to offer additional methods to make fuzz testing slightly more DRY.""" | ||
|
||
def __init__(self, data): | ||
"""Initializes the EnhancedFuzzedDataProvider with fuzzing data from the argument provided to TestOneInput. | ||
Args: | ||
data (bytes): The binary data used for fuzzing. | ||
""" | ||
super().__init__(data) | ||
|
||
def ConsumeRemainingBytes(self) -> bytes: | ||
"""Consume the remaining bytes in the bytes container. | ||
Returns: | ||
bytes: Zero or more bytes. | ||
""" | ||
return self.ConsumeBytes(self.remaining_bytes()) | ||
|
||
def ConsumeRandomBytes(self, max_length=None) -> bytes: | ||
"""Consume a random count of bytes from the bytes container. | ||
Args: | ||
max_length (int, optional): The maximum length of the string. Defaults to the number of remaining bytes. | ||
Returns: | ||
bytes: Zero or more bytes. | ||
""" | ||
if max_length is None: | ||
max_length = self.remaining_bytes() | ||
else: | ||
max_length = min(max_length, self.remaining_bytes()) | ||
|
||
return self.ConsumeBytes(self.ConsumeIntInRange(0, max_length)) | ||
|
||
def ConsumeRandomString(self, max_length=None) -> str: | ||
"""Consume bytes to produce a Unicode string. | ||
Args: | ||
max_length (int, optional): The maximum length of the string. Defaults to the number of remaining bytes. | ||
Returns: | ||
str: A Unicode string. | ||
""" | ||
if max_length is None: | ||
max_length = self.remaining_bytes() | ||
else: | ||
max_length = min(max_length, self.remaining_bytes()) | ||
|
||
return self.ConsumeUnicode(self.ConsumeIntInRange(0, max_length)) | ||
|
||
def ConsumeRandomInt(self, minimum=0, maximum=1234567890) -> int: | ||
"""Consume bytes to produce an integer. | ||
Args: | ||
minimum (int, optional): The minimum value of the integer. Defaults to 0. | ||
maximum (int, optional): The maximum value of the integer. Defaults to 1234567890. | ||
Returns: | ||
int: An integer. | ||
""" | ||
return self.ConsumeIntInRange(minimum, maximum) |