This project is a comprehensive toolkit for fuzzing Python functions using the Atheris tool. It includes a large collection of LLM-generated inputs for testing and a Jupyter notebook for analyzing coverage across these inputs.
- run_driver.ipynb: The primary notebook for executing fuzzing drivers with the provided seed corpora. It collects and displays coverage results.
- linear_regression.ipynb: Fit both linear regresssion models and dummy average-based models to the resulting datasets and generate relevant statistics and formulas.
- harness_pipeline.py: Orchestrates the execution of Atheris fuzzers for a specified number of trials.
- datalogger.py: Generates dataframes to log and analyze the fuzzing results.
- seed_data_mapper.py: Maps driver names to their corresponding original function definitions for detailed logging and result analysis.
- drivers/: Contains Atheris fuzzing drivers for Python libraries, focusing on internet protocols and input handling.
- corpora/: Houses over 38,000 seeds generated for the drivers from various models including ChatGPT-3.5, ChatGPT-4, Claude-Opus, Claude-Instant, and Gemini Pro 1.0.
- results/: Stores raw results in both HTML and pandas dataframe formats from the coverage experiments.
To set up and run this project, follow these steps:
- Clone the repository to your local machine.
- Create a virtual environment and activate it:
python -m venv env
source env/bin/activate # On Windows use env\Scripts\activate
- Install the required dependencies, this includes the dependencies for all 50 drivers:
pip install -r requirements.txt
- Navigate to the project directory and launch Jupyter Notebook
- Open
run_driver.ipynb
and execute the cells to start fuzzing.
To use the fuzzing drivers, follow the steps in run_driver.ipynb
. The notebook will guide you through the process of loading seed data, running the fuzzers, and collecting coverage data.
This project is licensed under the MIT License - see the LICENSE file for details.