Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding dumps and doc creation coverage for tomlkit #10046

Merged
merged 3 commits into from
Apr 26, 2023

Conversation

sg3-141-592
Copy link
Contributor

@sg3-141-592 sg3-141-592 commented Apr 8, 2023

Improving the coverage for the tomlkit by adding coverage for the dumps fuzzer. This is based on my existing improvements to the toml fuzzer #9834 , using a modified version of the atheris_dict.py approach to creating random dictionaries.

Doing some local runs this takes the coverage

  • Functions statically reachable by fuzzers 26% ->37%
  • Cyclomatic complexity statically reachable by fuzzers 31% -> 39%

@sg3-141-592 sg3-141-592 changed the title Adding dumps coverage for tomlkit Adding dumps and doc creation coverage for tomlkit Apr 8, 2023
@sg3-141-592 sg3-141-592 force-pushed the improving-tomlkit-coverage branch 3 times, most recently from 9978572 to 49ed5a3 Compare April 8, 2023 18:55
@sg3-141-592 sg3-141-592 marked this pull request as ready for review April 8, 2023 19:20
projects/tomlkit/atheris_tomlkit_dict.py Outdated Show resolved Hide resolved
projects/tomlkit/atheris_tomlkit_dict.py Outdated Show resolved Hide resolved
@sg3-141-592
Copy link
Contributor Author

Hey @DavidKorczynski , I've removed the document creation for now while I work out a more elegant way to generate them.

Following from an earlier comment, I've had some time to refine and improve the random dictionary generation, test it, make sure the results are deterministic and make it a self contained library pypi - dictgen. This is because there's good potential for using it to cover the other dumps methods for YAML, JSON, TOML encoders etc.

Do you mind having another look, and seeing if you think we can use this library to drive coverage for the tomlkit dumps method?

@sg3-141-592 sg3-141-592 force-pushed the improving-tomlkit-coverage branch from b6bfe44 to d76e4ba Compare April 22, 2023 15:26
@DavidKorczynski
Copy link
Collaborator

Do you mind having another look, and seeing if you think we can use this library to drive coverage for the tomlkit dumps method?

Yeah, I'll aim go over this tomorrow.

@@ -15,6 +15,7 @@
#
################################################################################
pip3 install .
pip3 install dictgen
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you move this to the Dockerfile?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have moved this to the Dockerfile and tested locally running using introspector. All working.

@DavidKorczynski
Copy link
Collaborator

lgtm with the minor nit above.

Have you compared dictgen to a simple construct e.g.

d = json.loads(random_fuzz_str)
if isinstance(d, dict):
  # use the dict as fuzz input

in terms of how much code coverage is achieved?

@sg3-141-592
Copy link
Contributor Author

I tried an initial json.dumps comparison but it's not able to generate all of the toml datatypes like datetime so I used tomlkit.loads to be a bit fairer.

fdp = atheris.FuzzedDataProvider(input_bytes)
try:
    test_data = tomlkit.loads(fdp.ConsumeUnicodeNoSurrogates(sys.maxsize))
    tomlkit.api.dumps(test_data, sort_keys=fdp.ConsumeBool())
except Exception as e:
    if "tomlkit.exceptions." in str(e.__class__):
        pass
    else:
        raise(e)

For 100k runs we get total project coverage around 59%, and with the dictgen we get 67%. And for the main files involved in dumps (_utils.py, containers.py, items.py) its 42%, compared to 60% with dictgen. So not too much in it, with a good corpus it might catch up.

@sg3-141-592 sg3-141-592 force-pushed the improving-tomlkit-coverage branch from d76e4ba to 0d66ad6 Compare April 26, 2023 17:19
@DavidKorczynski DavidKorczynski merged commit 2bf275a into google:master Apr 26, 2023
DonggeLiu pushed a commit that referenced this pull request May 19, 2023
These are some additional improvements to the toml encoder.

- Remove unneeded fuzzers `fuzz_loads` and `fuzz_dumps`. These can both
be reached by the `fuzz_load` and `fuzz_dump` methods
- Cover additional encoders `TomlOrderedEncoder` & `TomlNumpyEncoder`
- Cover additional decoders `TomlOrderedDecoder` &
`TomlPreserveCommentDecoder`
- Use the `dictgen` library approach, the same as #10046 

Using local runs this gives us a coverage of an extra 8 methods.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants