Adding dumps and doc creation coverage for tomlkit #10046

sg3-141-592 · 2023-04-08T14:32:40Z

Improving the coverage for the tomlkit by adding coverage for the dumps fuzzer. This is based on my existing improvements to the toml fuzzer #9834 , using a modified version of the atheris_dict.py approach to creating random dictionaries.

Doing some local runs this takes the coverage

Functions statically reachable by fuzzers 26% ->37%
Cyclomatic complexity statically reachable by fuzzers 31% -> 39%

projects/tomlkit/atheris_tomlkit_dict.py

sg3-141-592 · 2023-04-22T15:26:28Z

Hey @DavidKorczynski , I've removed the document creation for now while I work out a more elegant way to generate them.

Following from an earlier comment, I've had some time to refine and improve the random dictionary generation, test it, make sure the results are deterministic and make it a self contained library pypi - dictgen. This is because there's good potential for using it to cover the other dumps methods for YAML, JSON, TOML encoders etc.

Do you mind having another look, and seeing if you think we can use this library to drive coverage for the tomlkit dumps method?

DavidKorczynski · 2023-04-24T22:04:59Z

Do you mind having another look, and seeing if you think we can use this library to drive coverage for the tomlkit dumps method?

Yeah, I'll aim go over this tomorrow.

DavidKorczynski · 2023-04-26T13:17:46Z

projects/tomlkit/build.sh

@@ -15,6 +15,7 @@
 #
 ################################################################################
 pip3 install .
+pip3 install dictgen


Could you move this to the Dockerfile?

Have moved this to the Dockerfile and tested locally running using introspector. All working.

DavidKorczynski · 2023-04-26T13:21:37Z

lgtm with the minor nit above.

Have you compared dictgen to a simple construct e.g.

d = json.loads(random_fuzz_str)
if isinstance(d, dict):
  # use the dict as fuzz input

in terms of how much code coverage is achieved?

sg3-141-592 · 2023-04-26T17:17:13Z

I tried an initial json.dumps comparison but it's not able to generate all of the toml datatypes like datetime so I used tomlkit.loads to be a bit fairer.

fdp = atheris.FuzzedDataProvider(input_bytes)
try:
    test_data = tomlkit.loads(fdp.ConsumeUnicodeNoSurrogates(sys.maxsize))
    tomlkit.api.dumps(test_data, sort_keys=fdp.ConsumeBool())
except Exception as e:
    if "tomlkit.exceptions." in str(e.__class__):
        pass
    else:
        raise(e)

For 100k runs we get total project coverage around 59%, and with the dictgen we get 67%. And for the main files involved in dumps (_utils.py, containers.py, items.py) its 42%, compared to 60% with dictgen. So not too much in it, with a good corpus it might catch up.

These are some additional improvements to the toml encoder. - Remove unneeded fuzzers `fuzz_loads` and `fuzz_dumps`. These can both be reached by the `fuzz_load` and `fuzz_dump` methods - Cover additional encoders `TomlOrderedEncoder` & `TomlNumpyEncoder` - Cover additional decoders `TomlOrderedDecoder` & `TomlPreserveCommentDecoder` - Use the `dictgen` library approach, the same as #10046 Using local runs this gives us a coverage of an extra 8 methods.

sg3-141-592 changed the title ~~Adding dumps coverage for tomlkit~~ Adding dumps and doc creation coverage for tomlkit Apr 8, 2023

sg3-141-592 force-pushed the improving-tomlkit-coverage branch 3 times, most recently from 9978572 to 49ed5a3 Compare April 8, 2023 18:55

sg3-141-592 marked this pull request as ready for review April 8, 2023 19:20

DavidKorczynski requested changes Apr 11, 2023

View reviewed changes

projects/tomlkit/atheris_tomlkit_dict.py Outdated Show resolved Hide resolved

projects/tomlkit/atheris_tomlkit_dict.py Outdated Show resolved Hide resolved

sg3-141-592 and others added 2 commits April 22, 2023 16:26

Adding dumps coverage for tomlkit

abd895a

Tomlkit: Removing use of random from dict generation

644181e

sg3-141-592 force-pushed the improving-tomlkit-coverage branch from b6bfe44 to d76e4ba Compare April 22, 2023 15:26

DavidKorczynski requested changes Apr 26, 2023

View reviewed changes

Using dictgen library with tomlkit dumps

0d66ad6

sg3-141-592 force-pushed the improving-tomlkit-coverage branch from d76e4ba to 0d66ad6 Compare April 26, 2023 17:19

sg3-141-592 requested a review from DavidKorczynski April 26, 2023 17:20

DavidKorczynski approved these changes Apr 26, 2023

View reviewed changes

DavidKorczynski merged commit 2bf275a into google:master Apr 26, 2023

This was referenced Apr 27, 2023

Improving toml coverage #10183

Merged

Request to be added to Tomlkit OSS-Fuzzer project members python-poetry/tomlkit#286

Open

sg3-141-592 mentioned this pull request May 10, 2023

Updating tomlkit maintainers list #10286

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adding dumps and doc creation coverage for tomlkit #10046

Adding dumps and doc creation coverage for tomlkit #10046

sg3-141-592 commented Apr 8, 2023 •

edited

Loading

sg3-141-592 commented Apr 22, 2023

DavidKorczynski commented Apr 24, 2023

DavidKorczynski Apr 26, 2023

sg3-141-592 Apr 26, 2023

DavidKorczynski commented Apr 26, 2023

sg3-141-592 commented Apr 26, 2023

Adding dumps and doc creation coverage for tomlkit #10046

Adding dumps and doc creation coverage for tomlkit #10046

Conversation

sg3-141-592 commented Apr 8, 2023 • edited Loading

sg3-141-592 commented Apr 22, 2023

DavidKorczynski commented Apr 24, 2023

DavidKorczynski Apr 26, 2023

Choose a reason for hiding this comment

sg3-141-592 Apr 26, 2023

Choose a reason for hiding this comment

DavidKorczynski commented Apr 26, 2023

sg3-141-592 commented Apr 26, 2023

sg3-141-592 commented Apr 8, 2023 •

edited

Loading