Official repository for our paper:
VulScribeR: Exploring RAG-based Vulnerability Augmentation with LLMs
Bigvul_train, Bigvul test, Bigvul_val
VGX Full dataset, Vulgen Full dataset from VGX paper
All pair matchings, including for mutation and random ones for RQ2
Filtered Datasets for All RQs,
Unfiltered Datasets for All RQs
The unfiltered dataset contains samples from the Generator and hasn't gone through the Verification phase. They also include extra metadata that shows which clean_vul pair was used for generation, plus the vul lines.
Go to the models directory, the readme for each model explains how to use each of the models