Fuzzing Improvements #1311

DaveLak · 2024-05-14T05:34:36Z

As a follow-up to #1304, this PR introduces additional fuzz targets, fuzz test dictionaries, and fuzzing/fuzz-targets/test_utils.py which includes test utilities to help DRY fuzzing test code.

The changes here should increase fuzzing coverage from ~2% to ~17% based on the results of my local testing.

The commit messages in this PR should describe the specific changes, but the most significant information detailed below:

New Fuzz Targets

fuzzing/fuzz-targets/fuzz_bundle.py

Tests the Bundle related functionality using fuzzer provided data.
This test is based on test_bundle.py, the unit test of the same functionality.

fuzzing/fuzz-targets/fuzz_object_store.py

Tests the Blob, Tree, and Commit classes using fuzzer provided data.
This test is based on the example code in the Object Store tutorial, fuzz_object_store.py uses a MemoryRepo to avoid disk IO where possible, in the interest of test execution efficiency.

fuzzing/fuzz-targets/fuzz_repo.py

Tests basic functionality of the Repo class.
This test must perform actual disk IO to effectively test all functionality, so it is somewhat slow compared to other fuzz targets in this repo. There might be ways to improve this, but as of this PR it works well enough.

`fuzzing/fuzz-targets/test_utils.py`

Adds a EnhancedFuzzedDataProvider class that extends atheris.FuzzedDataProvider to abstract some common use-cases into DRY method calls.
The is_expected_error helper function was extracted from fuzz_configfile.py into this dedicated test utility file so it can be reused by other fuzz harnesses in fuzz-targets/.
- Also renamed and better documented the is_expected_error function now that it is shared.

Other Notes

I've tested all of the changes proposed here extensively in my local environment. They are working well enough that I feel they are a net value add to the fuzz test suite, but these tests can likely be further optimized to improve coverage and efficiency. I plan to keep an eye on their performance and further optimize the tests & supporting code as needed.

jelmer

Changes look good, but please "ruff format ." the source code

DaveLak · 2024-05-14T07:25:53Z

please "ruff format ." the source code

Done. Sorry about that, I've been running ruff format fuzzing/fuzz-targets regularly, but it seems I've been using a globally installed version of Ruff. I ran make fix on 3899518 and that seems to have done the trick.

DaveLak · 2024-05-14T07:34:43Z

FWIW, running both ruff format . or make reformat on top of 3899518 reformat a hand full of files unrelated to the changes here:

❯ git status   
On branch fuzzing-improvements
Your branch is up to date with 'origin/fuzzing-improvements'.

Changes not staged for commit:
  (use "git add <file>..." to update what will be committed)
  (use "git restore <file>..." to discard changes in working directory)
        modified:   dulwich/cli.py
        modified:   dulwich/config.py
        modified:   dulwich/diff_tree.py
        modified:   dulwich/repo.py
        modified:   dulwich/tests/test_object_store.py
        modified:   tests/compat/utils.py
        modified:   tests/contrib/__init__.py
        modified:   tests/contrib/test_swift_smoke.py
        modified:   tests/test_client.py
        modified:   tests/test_pack.py
        modified:   tests/test_patch.py
        modified:   tests/test_refs.py
        modified:   tests/test_repository.py

Expand for diff

diff --git a/dulwich/cli.py b/dulwich/cli.py
index 89096db6..e856df45 100644
--- a/dulwich/cli.py
+++ b/dulwich/cli.py
@@ -624,7 +624,9 @@ class SuperCommand(Command):
 
     def run(self, args):
         if not args and not self.default_command:
-            print("Supported subcommands: {}".format(", ".join(self.subcommands.keys())))
+            print(
+                "Supported subcommands: {}".format(", ".join(self.subcommands.keys()))
+            )
             return False
         cmd = args[0]
         try:
diff --git a/dulwich/config.py b/dulwich/config.py
index c0a3de11..0d7a205a 100644
--- a/dulwich/config.py
+++ b/dulwich/config.py
@@ -175,12 +175,12 @@ class Config:
         raise NotImplementedError(self.get_multivar)
 
     @overload
-    def get_boolean(
-        self, section: SectionLike, name: NameLike, default: bool
-    ) -> bool: ...
+    def get_boolean(self, section: SectionLike, name: NameLike, default: bool) -> bool:
+        ...
 
     @overload
-    def get_boolean(self, section: SectionLike, name: NameLike) -> Optional[bool]: ...
+    def get_boolean(self, section: SectionLike, name: NameLike) -> Optional[bool]:
+        ...
 
     def get_boolean(
         self, section: SectionLike, name: NameLike, default: Optional[bool] = None
diff --git a/dulwich/diff_tree.py b/dulwich/diff_tree.py
index 5291f9b7..aada776d 100644
--- a/dulwich/diff_tree.py
+++ b/dulwich/diff_tree.py
@@ -240,7 +240,11 @@ def _all_same(seq, key):
 
 
 def tree_changes_for_merge(
-        store: BaseObjectStore, parent_tree_ids: List[ObjectID], tree_id: ObjectID, rename_detector=None):
+    store: BaseObjectStore,
+    parent_tree_ids: List[ObjectID],
+    tree_id: ObjectID,
+    rename_detector=None,
+):
     """Get the tree changes for a merge tree relative to all its parents.
 
     Args:
diff --git a/dulwich/repo.py b/dulwich/repo.py
index e2054f89..63af3208 100644
--- a/dulwich/repo.py
+++ b/dulwich/repo.py
@@ -829,9 +829,7 @@ class BaseRepo:
           KeyError: when the specified ref or object does not exist
         """
         if not isinstance(name, bytes):
-            raise TypeError(
-                f"'name' must be bytestring, not {type(name).__name__:.80}"
-            )
+            raise TypeError(f"'name' must be bytestring, not {type(name).__name__:.80}")
         if len(name) in (20, 40):
             try:
                 return self.object_store[name]
@@ -1473,9 +1471,7 @@ class Repo(BaseRepo):
                     del index[tree_path]
                     continue
                 except KeyError as exc:
-                    raise KeyError(
-                        f"file '{tree_path.decode()}' not in index"
-                    ) from exc
+                    raise KeyError(f"file '{tree_path.decode()}' not in index") from exc
 
             st = None
             try:
diff --git a/dulwich/tests/test_object_store.py b/dulwich/tests/test_object_store.py
index a33bf678..313be5d3 100644
--- a/dulwich/tests/test_object_store.py
+++ b/dulwich/tests/test_object_store.py
@@ -294,6 +294,3 @@ class PackBasedObjectStoreTests(ObjectStoreTests):
         self.assertEqual(2, self.store.repack())
         self.assertEqual(1, len(self.store.packs))
         self.assertEqual(0, self.store.pack_loose_objects())
-
-
-
diff --git a/tests/compat/utils.py b/tests/compat/utils.py
index 69cecede..7ce52c27 100644
--- a/tests/compat/utils.py
+++ b/tests/compat/utils.py
@@ -40,9 +40,7 @@ from .. import SkipTest, TestCase
 _DEFAULT_GIT = "git"
 _VERSION_LEN = 4
 _REPOS_DATA_DIR = os.path.abspath(
-    os.path.join(
-        os.path.dirname(__file__), os.pardir, os.pardir, "testdata", "repos"
-    )
+    os.path.join(os.path.dirname(__file__), os.pardir, os.pardir, "testdata", "repos")
 )
 
 
diff --git a/tests/contrib/__init__.py b/tests/contrib/__init__.py
index 92b1e5b9..a67c44cb 100644
--- a/tests/contrib/__init__.py
+++ b/tests/contrib/__init__.py
@@ -18,6 +18,7 @@
 # License, Version 2.0.
 #
 
+
 def test_suite():
     import unittest
 
diff --git a/tests/contrib/test_swift_smoke.py b/tests/contrib/test_swift_smoke.py
index 725e8e3b..9ce7e7a0 100644
--- a/tests/contrib/test_swift_smoke.py
+++ b/tests/contrib/test_swift_smoke.py
@@ -179,9 +179,7 @@ class SwiftRepoSmokeTest(unittest.TestCase):
         )
         swift_repo = swift.SwiftRepo("fakerepo", self.conf)
         for branch in ("master", "mybranch", "pullr-108"):
-            remote_shas[branch] = swift_repo.refs.read_loose_ref(
-                f"refs/heads/{branch}"
-            )
+            remote_shas[branch] = swift_repo.refs.read_loose_ref(f"refs/heads/{branch}")
         self.assertDictEqual(local_shas, remote_shas)
 
     def test_push_data_branch(self):
diff --git a/tests/test_client.py b/tests/test_client.py
index 90e24956..4e8484b1 100644
--- a/tests/test_client.py
+++ b/tests/test_client.py
@@ -772,9 +772,9 @@ class SSHGitClientTests(TestCase):
         )
 
     def test_alternative_command_path_spaces(self):
-        self.client.alternative_paths[b"upload-pack"] = (
-            b"/usr/lib/git/git-upload-pack -ibla"
-        )
+        self.client.alternative_paths[
+            b"upload-pack"
+        ] = b"/usr/lib/git/git-upload-pack -ibla"
         self.assertEqual(
             b"/usr/lib/git/git-upload-pack -ibla",
             self.client._get_cmd_path(b"upload-pack"),
@@ -1207,9 +1207,7 @@ class HttpGitClientTests(TestCase):
                 preload_content=True,
             ):
                 response = HTTPResponse(
-                    headers={
-                        "Content-Type": "application/x-git-upload-pack-result"
-                    },
+                    headers={"Content-Type": "application/x-git-upload-pack-result"},
                     request_method=method,
                     request_url=url,
                     preload_content=preload_content,
diff --git a/tests/test_pack.py b/tests/test_pack.py
index efe90904..5c1e2b9b 100644
--- a/tests/test_pack.py
+++ b/tests/test_pack.py
@@ -256,7 +256,9 @@ class TestPackData(PackTests):
         self.get_pack_data(pack1_sha).close()
 
     def test_from_file(self):
-        path = os.path.join(self.datadir, "pack-{}.pack".format(pack1_sha.decode("ascii")))
+        path = os.path.join(
+            self.datadir, "pack-{}.pack".format(pack1_sha.decode("ascii"))
+        )
         with open(path, "rb") as f:
             PackData.from_file(f, os.path.getsize(path))
 
diff --git a/tests/test_patch.py b/tests/test_patch.py
index 2a1a84a6..854cb0e2 100644
--- a/tests/test_patch.py
+++ b/tests/test_patch.py
@@ -234,8 +234,7 @@ diff --git a/dulwich/tests/test_patch.py b/dulwich/tests/test_patch.py
  
  class DiffTests(TestCase):
 """
-        text = (
-            f"""\
+        text = f"""\
 From [email protected] \
 Mon Nov 29 00:58:18 2010
 Date: Sun, 28 Nov 2010 17:57:27 -0600
@@ -255,7 +254,6 @@ Unsubscribe : https://launchpad.net/~dulwich-users
 More help   : https://help.launchpad.net/ListHelp
 
 """
-        )
         c, diff, version = git_am_patch_split(BytesIO(text))
         self.assertEqual(expected_diff, diff)
         self.assertEqual(None, version)
diff --git a/tests/test_refs.py b/tests/test_refs.py
index 572be9d8..7529d775 100644
--- a/tests/test_refs.py
+++ b/tests/test_refs.py
@@ -316,9 +316,9 @@ class RefsContainerTests:
         self.assertNotIn(b"refs/tags/refs-0.2", self._refs)
 
     def test_import_refs_name(self):
-        self._refs[b"refs/remotes/origin/other"] = (
-            b"48d01bd4b77fed026b154d16493e5deab78f02ec"
-        )
+        self._refs[
+            b"refs/remotes/origin/other"
+        ] = b"48d01bd4b77fed026b154d16493e5deab78f02ec"
         self._refs.import_refs(
             b"refs/remotes/origin",
             {b"master": b"42d06bd4b77fed026b154d16493e5deab78f02ec"},
@@ -333,9 +333,9 @@ class RefsContainerTests:
         )
 
     def test_import_refs_name_prune(self):
-        self._refs[b"refs/remotes/origin/other"] = (
-            b"48d01bd4b77fed026b154d16493e5deab78f02ec"
-        )
+        self._refs[
+            b"refs/remotes/origin/other"
+        ] = b"48d01bd4b77fed026b154d16493e5deab78f02ec"
         self._refs.import_refs(
             b"refs/remotes/origin",
             {b"master": b"42d06bd4b77fed026b154d16493e5deab78f02ec"},
diff --git a/tests/test_repository.py b/tests/test_repository.py
index 46c9252e..d28f1742 100644
--- a/tests/test_repository.py
+++ b/tests/test_repository.py
@@ -405,9 +405,7 @@ class RepositoryRootTests(TestCase):
     def test_clone_no_head(self):
         temp_dir = self.mkdtemp()
         self.addCleanup(shutil.rmtree, temp_dir)
-        repo_dir = os.path.join(
-            os.path.dirname(__file__), "..", "testdata", "repos"
-        )
+        repo_dir = os.path.join(os.path.dirname(__file__), "..", "testdata", "repos")
         dest_dir = os.path.join(temp_dir, "a.git")
         shutil.copytree(os.path.join(repo_dir, "a.git"), dest_dir, symlinks=True)
         r = Repo(dest_dir)

jelmer · 2024-05-14T07:58:36Z

Ah, sorry - I was just going on the github action output. Fix pushed to master now for those other files. Please rebase and retry?

These values were recomended by LibFuzzer after several local runs of the `fuzz_configfile.py` fuzz target.

Extracts the `is_expected_error` helper function into a dedicated test utility file that can be reused by other fuzz harnesses in `fuzz-targets/`. Also renamed and better documented the function now that it is shared. The `test_utils.py` file deliberately does not include "fuzz" in the file name to help distinguish it from the fuzz test files.

`EnhancedFuzzedDataProvider` extends `atheris.FuzzedDataProvider` to abstract some common usecases into DRY method calls. `fuzz_bundle.py` tests the `Bundle` related functionality using fuzzer provided data. This test is based on the unit test of the same functionality (i.e., `test_bundle.py`.

Based off of the example code in the object store tutorial, `fuzz_object_store.py` uses a `MemoryRepo` to improve efficency of the test by avoiding disk IO. A `ConsumeRandomInt` method was added to `EnhancedFuzzedDataProvider` to make getting an `int` easier to do when we don't care about the upper and lower bounds of the value in the test.

This test must perform actual disk IO to effectively test all functionality, so it is somewhat slow compared to other fuzz targets in this repo. There might be ways to improve this, but as of this commit it works well enough.

The equality check was low value as it didn't do more than check some simple value comparisons in the source file, and it consumed a large amout of input data which made the test less efficient. Also adds `None` to the possible bundle version values to test source code branches executed when `Bundle.version = None`. Finally, adds new dict entries suggested by LibFuzzer during local runs.

This increases coverage and brings `fuzz_object_store.py` into full alignment with the Object Store tutorial in the Dulwich docs.

Values were suggested by LibFuzzer after local test runs.

This tells LibFuzzer that the input should not be added to the fuzz target corpus. We do not want to add expected error triggers to the corpus; instead we want to have a corpus consisting of inputs known to trigger more interesting code paths.

DaveLak · 2024-05-14T12:08:12Z

Please rebase and retry

Done. My changes in fuzzing/ are still passing ruff check .

Edit: disregard the comments below; it was caused by an outdated ruff version. See issue #1314 for details.

~~but it looks like #1312 missed a few files unrelated to this PR.~~

~~After running make reformat I see:~~

❯ git status
On branch fuzzing-improvements
Your branch is up to date with 'origin/fuzzing-improvements'.

Changes not staged for commit:
  (use "git add <file>..." to update what will be committed)
  (use "git restore <file>..." to discard changes in working directory)
        modified:   dulwich/config.py
        modified:   tests/test_client.py
        modified:   tests/test_refs.py

The diff:

diff --git a/dulwich/config.py b/dulwich/config.py
index c0a3de11..0d7a205a 100644
--- a/dulwich/config.py
+++ b/dulwich/config.py
@@ -175,12 +175,12 @@ class Config:
         raise NotImplementedError(self.get_multivar)
 
     @overload
-    def get_boolean(
-        self, section: SectionLike, name: NameLike, default: bool
-    ) -> bool: ...
+    def get_boolean(self, section: SectionLike, name: NameLike, default: bool) -> bool:
+        ...
 
     @overload
-    def get_boolean(self, section: SectionLike, name: NameLike) -> Optional[bool]: ...
+    def get_boolean(self, section: SectionLike, name: NameLike) -> Optional[bool]:
+        ...
 
     def get_boolean(
         self, section: SectionLike, name: NameLike, default: Optional[bool] = None
diff --git a/tests/test_client.py b/tests/test_client.py
index d219bce0..4e8484b1 100644
--- a/tests/test_client.py
+++ b/tests/test_client.py
@@ -772,9 +772,9 @@ class SSHGitClientTests(TestCase):
         )
 
     def test_alternative_command_path_spaces(self):
-        self.client.alternative_paths[b"upload-pack"] = (
-            b"/usr/lib/git/git-upload-pack -ibla"
-        )
+        self.client.alternative_paths[
+            b"upload-pack"
+        ] = b"/usr/lib/git/git-upload-pack -ibla"
         self.assertEqual(
             b"/usr/lib/git/git-upload-pack -ibla",
             self.client._get_cmd_path(b"upload-pack"),
diff --git a/tests/test_refs.py b/tests/test_refs.py
index 572be9d8..7529d775 100644
--- a/tests/test_refs.py
+++ b/tests/test_refs.py
@@ -316,9 +316,9 @@ class RefsContainerTests:
         self.assertNotIn(b"refs/tags/refs-0.2", self._refs)
 
     def test_import_refs_name(self):
-        self._refs[b"refs/remotes/origin/other"] = (
-            b"48d01bd4b77fed026b154d16493e5deab78f02ec"
-        )
+        self._refs[
+            b"refs/remotes/origin/other"
+        ] = b"48d01bd4b77fed026b154d16493e5deab78f02ec"
         self._refs.import_refs(
             b"refs/remotes/origin",
             {b"master": b"42d06bd4b77fed026b154d16493e5deab78f02ec"},
@@ -333,9 +333,9 @@ class RefsContainerTests:
         )
 
     def test_import_refs_name_prune(self):
-        self._refs[b"refs/remotes/origin/other"] = (
-            b"48d01bd4b77fed026b154d16493e5deab78f02ec"
-        )
+        self._refs[
+            b"refs/remotes/origin/other"
+        ] = b"48d01bd4b77fed026b154d16493e5deab78f02ec"
         self._refs.import_refs(
             b"refs/remotes/origin",
             {b"master": b"42d06bd4b77fed026b154d16493e5deab78f02ec"},

DaveLak · 2024-05-14T12:44:59Z

Edit: disregard the comments below; it was caused by an outdated ruff version. See issue #1314 for details.

~~I put a separate PR up to address the formatting I mentioned in my previous comment: #1313~~

DaveLak requested a review from jelmer as a code owner May 14, 2024 05:34

jelmer requested changes May 14, 2024

View reviewed changes

DaveLak requested a review from jelmer May 14, 2024 07:26

DaveLak added 11 commits May 14, 2024 07:58

Add more fuzzing engine recomended byte sequences to dictionary

78c2905

These values were recomended by LibFuzzer after several local runs of the `fuzz_configfile.py` fuzz target.

Add Repo class fuzz target

5c1b413

This test must perform actual disk IO to effectively test all functionality, so it is somewhat slow compared to other fuzz targets in this repo. There might be ways to improve this, but as of this commit it works well enough.

Test tree diffing in fuzz_object_store fuzz target

e535d58

This increases coverage and brings `fuzz_object_store.py` into full alignment with the Object Store tutorial in the Dulwich docs.

Add dict for fuzz_repo.py

4558a48

Values were suggested by LibFuzzer after local test runs.

Update NEWS with fuzzing coverage improvements note

cf90471

Fix formatting with Ruff via make fix

bf896a6

DaveLak force-pushed the fuzzing-improvements branch from 3899518 to bf896a6 Compare May 14, 2024 12:00

DaveLak mentioned this pull request May 14, 2024

Consider Pinning ruff to a Specific Version #1314

Closed

jelmer merged commit f9ad9a9 into jelmer:master May 14, 2024
19 of 22 checks passed

DaveLak deleted the fuzzing-improvements branch May 14, 2024 14:32

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fuzzing Improvements #1311

Fuzzing Improvements #1311

DaveLak commented May 14, 2024 •

edited

Loading

jelmer left a comment

DaveLak commented May 14, 2024

DaveLak commented May 14, 2024 •

edited

Loading

jelmer commented May 14, 2024

DaveLak commented May 14, 2024 •

edited

Loading

DaveLak commented May 14, 2024 •

edited

Loading

Fuzzing Improvements #1311

Fuzzing Improvements #1311

Conversation

DaveLak commented May 14, 2024 • edited Loading

New Fuzz Targets

fuzzing/fuzz-targets/test_utils.py

Other Notes

jelmer left a comment

Choose a reason for hiding this comment

DaveLak commented May 14, 2024

DaveLak commented May 14, 2024 • edited Loading

jelmer commented May 14, 2024

DaveLak commented May 14, 2024 • edited Loading

DaveLak commented May 14, 2024 • edited Loading

DaveLak commented May 14, 2024 •

edited

Loading

`fuzzing/fuzz-targets/test_utils.py`

DaveLak commented May 14, 2024 •

edited

Loading

DaveLak commented May 14, 2024 •

edited

Loading

DaveLak commented May 14, 2024 •

edited

Loading