Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fuzzing Improvements #1311

Merged
merged 11 commits into from
May 14, 2024
Merged

Fuzzing Improvements #1311

merged 11 commits into from
May 14, 2024

Conversation

DaveLak
Copy link
Contributor

@DaveLak DaveLak commented May 14, 2024

As a follow-up to #1304, this PR introduces additional fuzz targets, fuzz test dictionaries, and fuzzing/fuzz-targets/test_utils.py which includes test utilities to help DRY fuzzing test code.

The changes here should increase fuzzing coverage from ~2% to ~17% based on the results of my local testing.

The commit messages in this PR should describe the specific changes, but the most significant information detailed below:

New Fuzz Targets

fuzzing/fuzz-targets/fuzz_bundle.py

  • Tests the Bundle related functionality using fuzzer provided data.
  • This test is based on test_bundle.py, the unit test of the same functionality.

fuzzing/fuzz-targets/fuzz_object_store.py

  • Tests the Blob, Tree, and Commit classes using fuzzer provided data.
  • This test is based on the example code in the Object Store tutorial, fuzz_object_store.py uses a MemoryRepo to avoid disk IO where possible, in the interest of test execution efficiency.

fuzzing/fuzz-targets/fuzz_repo.py

  • Tests basic functionality of the Repo class.
  • This test must perform actual disk IO to effectively test all functionality, so it is somewhat slow compared to other fuzz targets in this repo. There might be ways to improve this, but as of this PR it works well enough.

fuzzing/fuzz-targets/test_utils.py

  • Adds a EnhancedFuzzedDataProvider class that extends atheris.FuzzedDataProvider to abstract some common use-cases into DRY method calls.
  • The is_expected_error helper function was extracted from fuzz_configfile.py into this dedicated test utility file so it can be reused by other fuzz harnesses in fuzz-targets/.
    • Also renamed and better documented the is_expected_error function now that it is shared.

Other Notes

I've tested all of the changes proposed here extensively in my local environment. They are working well enough that I feel they are a net value add to the fuzz test suite, but these tests can likely be further optimized to improve coverage and efficiency. I plan to keep an eye on their performance and further optimize the tests & supporting code as needed.

@DaveLak DaveLak requested a review from jelmer as a code owner May 14, 2024 05:34
Copy link
Owner

@jelmer jelmer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes look good, but please "ruff format ." the source code

@DaveLak
Copy link
Contributor Author

DaveLak commented May 14, 2024

please "ruff format ." the source code

Done. Sorry about that, I've been running ruff format fuzzing/fuzz-targets regularly, but it seems I've been using a globally installed version of Ruff. I ran make fix on 3899518 and that seems to have done the trick.

@DaveLak DaveLak requested a review from jelmer May 14, 2024 07:26
@DaveLak
Copy link
Contributor Author

DaveLak commented May 14, 2024

FWIW, running both ruff format . or make reformat on top of 3899518 reformat a hand full of files unrelated to the changes here:

❯ git status   
On branch fuzzing-improvements
Your branch is up to date with 'origin/fuzzing-improvements'.

Changes not staged for commit:
  (use "git add <file>..." to update what will be committed)
  (use "git restore <file>..." to discard changes in working directory)
        modified:   dulwich/cli.py
        modified:   dulwich/config.py
        modified:   dulwich/diff_tree.py
        modified:   dulwich/repo.py
        modified:   dulwich/tests/test_object_store.py
        modified:   tests/compat/utils.py
        modified:   tests/contrib/__init__.py
        modified:   tests/contrib/test_swift_smoke.py
        modified:   tests/test_client.py
        modified:   tests/test_pack.py
        modified:   tests/test_patch.py
        modified:   tests/test_refs.py
        modified:   tests/test_repository.py
Expand for diff
diff --git a/dulwich/cli.py b/dulwich/cli.py
index 89096db6..e856df45 100644
--- a/dulwich/cli.py
+++ b/dulwich/cli.py
@@ -624,7 +624,9 @@ class SuperCommand(Command):
 
     def run(self, args):
         if not args and not self.default_command:
-            print("Supported subcommands: {}".format(", ".join(self.subcommands.keys())))
+            print(
+                "Supported subcommands: {}".format(", ".join(self.subcommands.keys()))
+            )
             return False
         cmd = args[0]
         try:
diff --git a/dulwich/config.py b/dulwich/config.py
index c0a3de11..0d7a205a 100644
--- a/dulwich/config.py
+++ b/dulwich/config.py
@@ -175,12 +175,12 @@ class Config:
         raise NotImplementedError(self.get_multivar)
 
     @overload
-    def get_boolean(
-        self, section: SectionLike, name: NameLike, default: bool
-    ) -> bool: ...
+    def get_boolean(self, section: SectionLike, name: NameLike, default: bool) -> bool:
+        ...
 
     @overload
-    def get_boolean(self, section: SectionLike, name: NameLike) -> Optional[bool]: ...
+    def get_boolean(self, section: SectionLike, name: NameLike) -> Optional[bool]:
+        ...
 
     def get_boolean(
         self, section: SectionLike, name: NameLike, default: Optional[bool] = None
diff --git a/dulwich/diff_tree.py b/dulwich/diff_tree.py
index 5291f9b7..aada776d 100644
--- a/dulwich/diff_tree.py
+++ b/dulwich/diff_tree.py
@@ -240,7 +240,11 @@ def _all_same(seq, key):
 
 
 def tree_changes_for_merge(
-        store: BaseObjectStore, parent_tree_ids: List[ObjectID], tree_id: ObjectID, rename_detector=None):
+    store: BaseObjectStore,
+    parent_tree_ids: List[ObjectID],
+    tree_id: ObjectID,
+    rename_detector=None,
+):
     """Get the tree changes for a merge tree relative to all its parents.
 
     Args:
diff --git a/dulwich/repo.py b/dulwich/repo.py
index e2054f89..63af3208 100644
--- a/dulwich/repo.py
+++ b/dulwich/repo.py
@@ -829,9 +829,7 @@ class BaseRepo:
           KeyError: when the specified ref or object does not exist
         """
         if not isinstance(name, bytes):
-            raise TypeError(
-                f"'name' must be bytestring, not {type(name).__name__:.80}"
-            )
+            raise TypeError(f"'name' must be bytestring, not {type(name).__name__:.80}")
         if len(name) in (20, 40):
             try:
                 return self.object_store[name]
@@ -1473,9 +1471,7 @@ class Repo(BaseRepo):
                     del index[tree_path]
                     continue
                 except KeyError as exc:
-                    raise KeyError(
-                        f"file '{tree_path.decode()}' not in index"
-                    ) from exc
+                    raise KeyError(f"file '{tree_path.decode()}' not in index") from exc
 
             st = None
             try:
diff --git a/dulwich/tests/test_object_store.py b/dulwich/tests/test_object_store.py
index a33bf678..313be5d3 100644
--- a/dulwich/tests/test_object_store.py
+++ b/dulwich/tests/test_object_store.py
@@ -294,6 +294,3 @@ class PackBasedObjectStoreTests(ObjectStoreTests):
         self.assertEqual(2, self.store.repack())
         self.assertEqual(1, len(self.store.packs))
         self.assertEqual(0, self.store.pack_loose_objects())
-
-
-
diff --git a/tests/compat/utils.py b/tests/compat/utils.py
index 69cecede..7ce52c27 100644
--- a/tests/compat/utils.py
+++ b/tests/compat/utils.py
@@ -40,9 +40,7 @@ from .. import SkipTest, TestCase
 _DEFAULT_GIT = "git"
 _VERSION_LEN = 4
 _REPOS_DATA_DIR = os.path.abspath(
-    os.path.join(
-        os.path.dirname(__file__), os.pardir, os.pardir, "testdata", "repos"
-    )
+    os.path.join(os.path.dirname(__file__), os.pardir, os.pardir, "testdata", "repos")
 )
 
 
diff --git a/tests/contrib/__init__.py b/tests/contrib/__init__.py
index 92b1e5b9..a67c44cb 100644
--- a/tests/contrib/__init__.py
+++ b/tests/contrib/__init__.py
@@ -18,6 +18,7 @@
 # License, Version 2.0.
 #
 
+
 def test_suite():
     import unittest
 
diff --git a/tests/contrib/test_swift_smoke.py b/tests/contrib/test_swift_smoke.py
index 725e8e3b..9ce7e7a0 100644
--- a/tests/contrib/test_swift_smoke.py
+++ b/tests/contrib/test_swift_smoke.py
@@ -179,9 +179,7 @@ class SwiftRepoSmokeTest(unittest.TestCase):
         )
         swift_repo = swift.SwiftRepo("fakerepo", self.conf)
         for branch in ("master", "mybranch", "pullr-108"):
-            remote_shas[branch] = swift_repo.refs.read_loose_ref(
-                f"refs/heads/{branch}"
-            )
+            remote_shas[branch] = swift_repo.refs.read_loose_ref(f"refs/heads/{branch}")
         self.assertDictEqual(local_shas, remote_shas)
 
     def test_push_data_branch(self):
diff --git a/tests/test_client.py b/tests/test_client.py
index 90e24956..4e8484b1 100644
--- a/tests/test_client.py
+++ b/tests/test_client.py
@@ -772,9 +772,9 @@ class SSHGitClientTests(TestCase):
         )
 
     def test_alternative_command_path_spaces(self):
-        self.client.alternative_paths[b"upload-pack"] = (
-            b"/usr/lib/git/git-upload-pack -ibla"
-        )
+        self.client.alternative_paths[
+            b"upload-pack"
+        ] = b"/usr/lib/git/git-upload-pack -ibla"
         self.assertEqual(
             b"/usr/lib/git/git-upload-pack -ibla",
             self.client._get_cmd_path(b"upload-pack"),
@@ -1207,9 +1207,7 @@ class HttpGitClientTests(TestCase):
                 preload_content=True,
             ):
                 response = HTTPResponse(
-                    headers={
-                        "Content-Type": "application/x-git-upload-pack-result"
-                    },
+                    headers={"Content-Type": "application/x-git-upload-pack-result"},
                     request_method=method,
                     request_url=url,
                     preload_content=preload_content,
diff --git a/tests/test_pack.py b/tests/test_pack.py
index efe90904..5c1e2b9b 100644
--- a/tests/test_pack.py
+++ b/tests/test_pack.py
@@ -256,7 +256,9 @@ class TestPackData(PackTests):
         self.get_pack_data(pack1_sha).close()
 
     def test_from_file(self):
-        path = os.path.join(self.datadir, "pack-{}.pack".format(pack1_sha.decode("ascii")))
+        path = os.path.join(
+            self.datadir, "pack-{}.pack".format(pack1_sha.decode("ascii"))
+        )
         with open(path, "rb") as f:
             PackData.from_file(f, os.path.getsize(path))
 
diff --git a/tests/test_patch.py b/tests/test_patch.py
index 2a1a84a6..854cb0e2 100644
--- a/tests/test_patch.py
+++ b/tests/test_patch.py
@@ -234,8 +234,7 @@ diff --git a/dulwich/tests/test_patch.py b/dulwich/tests/test_patch.py
  
  class DiffTests(TestCase):
 """
-        text = (
-            f"""\
+        text = f"""\
 From [email protected] \
 Mon Nov 29 00:58:18 2010
 Date: Sun, 28 Nov 2010 17:57:27 -0600
@@ -255,7 +254,6 @@ Unsubscribe : https://launchpad.net/~dulwich-users
 More help   : https://help.launchpad.net/ListHelp
 
 """
-        )
         c, diff, version = git_am_patch_split(BytesIO(text))
         self.assertEqual(expected_diff, diff)
         self.assertEqual(None, version)
diff --git a/tests/test_refs.py b/tests/test_refs.py
index 572be9d8..7529d775 100644
--- a/tests/test_refs.py
+++ b/tests/test_refs.py
@@ -316,9 +316,9 @@ class RefsContainerTests:
         self.assertNotIn(b"refs/tags/refs-0.2", self._refs)
 
     def test_import_refs_name(self):
-        self._refs[b"refs/remotes/origin/other"] = (
-            b"48d01bd4b77fed026b154d16493e5deab78f02ec"
-        )
+        self._refs[
+            b"refs/remotes/origin/other"
+        ] = b"48d01bd4b77fed026b154d16493e5deab78f02ec"
         self._refs.import_refs(
             b"refs/remotes/origin",
             {b"master": b"42d06bd4b77fed026b154d16493e5deab78f02ec"},
@@ -333,9 +333,9 @@ class RefsContainerTests:
         )
 
     def test_import_refs_name_prune(self):
-        self._refs[b"refs/remotes/origin/other"] = (
-            b"48d01bd4b77fed026b154d16493e5deab78f02ec"
-        )
+        self._refs[
+            b"refs/remotes/origin/other"
+        ] = b"48d01bd4b77fed026b154d16493e5deab78f02ec"
         self._refs.import_refs(
             b"refs/remotes/origin",
             {b"master": b"42d06bd4b77fed026b154d16493e5deab78f02ec"},
diff --git a/tests/test_repository.py b/tests/test_repository.py
index 46c9252e..d28f1742 100644
--- a/tests/test_repository.py
+++ b/tests/test_repository.py
@@ -405,9 +405,7 @@ class RepositoryRootTests(TestCase):
     def test_clone_no_head(self):
         temp_dir = self.mkdtemp()
         self.addCleanup(shutil.rmtree, temp_dir)
-        repo_dir = os.path.join(
-            os.path.dirname(__file__), "..", "testdata", "repos"
-        )
+        repo_dir = os.path.join(os.path.dirname(__file__), "..", "testdata", "repos")
         dest_dir = os.path.join(temp_dir, "a.git")
         shutil.copytree(os.path.join(repo_dir, "a.git"), dest_dir, symlinks=True)
         r = Repo(dest_dir)

@jelmer
Copy link
Owner

jelmer commented May 14, 2024

Ah, sorry - I was just going on the github action output. Fix pushed to master now for those other files. Please rebase and retry?

DaveLak added 11 commits May 14, 2024 07:58
These values were recomended by LibFuzzer after several local runs of
the `fuzz_configfile.py` fuzz target.
Extracts the `is_expected_error` helper function into a dedicated test
utility file that can be reused by other fuzz harnesses in
`fuzz-targets/`. Also renamed and better documented the function now
that it is shared.

The `test_utils.py` file deliberately does not include "fuzz" in the
file name to help distinguish it from the fuzz test files.
`EnhancedFuzzedDataProvider` extends `atheris.FuzzedDataProvider` to
abstract some common usecases into DRY method calls.

`fuzz_bundle.py` tests the `Bundle` related functionality using fuzzer
provided data. This test is based on the unit test of the same
functionality (i.e., `test_bundle.py`.
Based off of the example code in the object store tutorial,
`fuzz_object_store.py` uses a `MemoryRepo` to improve efficency of the
test by avoiding disk IO.

A `ConsumeRandomInt` method was added to `EnhancedFuzzedDataProvider` to
make getting an `int` easier to do when we don't care about the upper
and lower bounds of the value in the test.
This test must perform actual disk IO to effectively test all
functionality, so it is somewhat slow compared to other fuzz targets in
this repo. There might be ways to improve this, but as of this commit it
works well enough.
The equality check was low value as it didn't do more than check some
simple value comparisons in the source file, and it consumed a large
amout of input data which made the test less efficient.

Also adds `None` to the possible bundle version values to test source
code branches executed when `Bundle.version = None`.

Finally, adds new dict entries suggested by LibFuzzer during local runs.
This increases coverage and brings `fuzz_object_store.py` into full
alignment with the Object Store tutorial in the Dulwich docs.
Values were suggested by LibFuzzer after local test runs.
This tells LibFuzzer that the input should not be added to the fuzz
target corpus. We do not want to add expected error triggers to the
corpus; instead we want to have a corpus consisting of inputs known to
trigger more interesting code paths.
@DaveLak DaveLak force-pushed the fuzzing-improvements branch from 3899518 to bf896a6 Compare May 14, 2024 12:00
@DaveLak
Copy link
Contributor Author

DaveLak commented May 14, 2024

Please rebase and retry

Done. My changes in fuzzing/ are still passing ruff check .

Edit: disregard the comments below; it was caused by an outdated ruff version. See issue #1314 for details.


but it looks like #1312 missed a few files unrelated to this PR.

After running make reformat I see:

❯ git status
On branch fuzzing-improvements
Your branch is up to date with 'origin/fuzzing-improvements'.

Changes not staged for commit:
  (use "git add <file>..." to update what will be committed)
  (use "git restore <file>..." to discard changes in working directory)
        modified:   dulwich/config.py
        modified:   tests/test_client.py
        modified:   tests/test_refs.py

The diff:

diff --git a/dulwich/config.py b/dulwich/config.py
index c0a3de11..0d7a205a 100644
--- a/dulwich/config.py
+++ b/dulwich/config.py
@@ -175,12 +175,12 @@ class Config:
         raise NotImplementedError(self.get_multivar)
 
     @overload
-    def get_boolean(
-        self, section: SectionLike, name: NameLike, default: bool
-    ) -> bool: ...
+    def get_boolean(self, section: SectionLike, name: NameLike, default: bool) -> bool:
+        ...
 
     @overload
-    def get_boolean(self, section: SectionLike, name: NameLike) -> Optional[bool]: ...
+    def get_boolean(self, section: SectionLike, name: NameLike) -> Optional[bool]:
+        ...
 
     def get_boolean(
         self, section: SectionLike, name: NameLike, default: Optional[bool] = None
diff --git a/tests/test_client.py b/tests/test_client.py
index d219bce0..4e8484b1 100644
--- a/tests/test_client.py
+++ b/tests/test_client.py
@@ -772,9 +772,9 @@ class SSHGitClientTests(TestCase):
         )
 
     def test_alternative_command_path_spaces(self):
-        self.client.alternative_paths[b"upload-pack"] = (
-            b"/usr/lib/git/git-upload-pack -ibla"
-        )
+        self.client.alternative_paths[
+            b"upload-pack"
+        ] = b"/usr/lib/git/git-upload-pack -ibla"
         self.assertEqual(
             b"/usr/lib/git/git-upload-pack -ibla",
             self.client._get_cmd_path(b"upload-pack"),
diff --git a/tests/test_refs.py b/tests/test_refs.py
index 572be9d8..7529d775 100644
--- a/tests/test_refs.py
+++ b/tests/test_refs.py
@@ -316,9 +316,9 @@ class RefsContainerTests:
         self.assertNotIn(b"refs/tags/refs-0.2", self._refs)
 
     def test_import_refs_name(self):
-        self._refs[b"refs/remotes/origin/other"] = (
-            b"48d01bd4b77fed026b154d16493e5deab78f02ec"
-        )
+        self._refs[
+            b"refs/remotes/origin/other"
+        ] = b"48d01bd4b77fed026b154d16493e5deab78f02ec"
         self._refs.import_refs(
             b"refs/remotes/origin",
             {b"master": b"42d06bd4b77fed026b154d16493e5deab78f02ec"},
@@ -333,9 +333,9 @@ class RefsContainerTests:
         )
 
     def test_import_refs_name_prune(self):
-        self._refs[b"refs/remotes/origin/other"] = (
-            b"48d01bd4b77fed026b154d16493e5deab78f02ec"
-        )
+        self._refs[
+            b"refs/remotes/origin/other"
+        ] = b"48d01bd4b77fed026b154d16493e5deab78f02ec"
         self._refs.import_refs(
             b"refs/remotes/origin",
             {b"master": b"42d06bd4b77fed026b154d16493e5deab78f02ec"},

@DaveLak
Copy link
Contributor Author

DaveLak commented May 14, 2024

Edit: disregard the comments below; it was caused by an outdated ruff version. See issue #1314 for details.


I put a separate PR up to address the formatting I mentioned in my previous comment: #1313

@jelmer jelmer merged commit f9ad9a9 into jelmer:master May 14, 2024
19 of 22 checks passed
@DaveLak DaveLak deleted the fuzzing-improvements branch May 14, 2024 14:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants