PSO-KDVA: A Lightweight Software Vulnerability Assessment Model Using Particle Swarm Optimization and Knowledge Distillation
This repository contains the implementation of PSO-KDVA, which includes fine-tuning a teacher model, compressing a student model, and performing architecture space search using Particle Swarm Optimization (PSO) for software vulnerability assessment (SVA). It also provides datasets and tools to facilitate research in SVA.
Due to the large size of the models, we have stored them in Google Drive: Google Drive Link
This folder contains scripts for compressing the teacher model into a lightweight student model using techniques such as knowledge distillation:
BPE_1000.json
andBPE_6000.json
: Byte Pair Encoding (BPE) vocabulary files used for tokenization.distill.py
: Implements the knowledge distillation process for model compression.lstm_baseline.py
: A baseline model using LSTM for comparison.models.py
: Contains model definitions for the student model.run.py
: Main script for training and evaluating the compressed student model.utils.py
: Utility functions used throughout the compression process.
This folder includes scripts for fine-tuning the teacher model for software vulnerability assessment tasks:
main.py
: Main script for fine-tuning the teacher model.model.py
: Contains model definitions for the teacher model.run2.py
: Alternative script for running experiments with different configurations.utils.py
: Utility functions for model fine-tuning.LICENSE
: Licensing information for the project.README.md
: A detailed explanation of the fine-tuning module.
This folder contains the dataset used for software vulnerability assessment:
- Dataset Details: Includes vulnerability data formatted for training, validation, and testing purposes.
- The dataset supports tasks such as vulnerability classification, severity prediction, and more.
pso.py
: This script performs architecture space search using the Particle Swarm Optimization algorithm. It must be run first to determine the optimal architecture before proceeding with fine-tuning or compression.ga.py
: Implements Genetic Algorithm for optimization.flops.py
: Calculates the Floating Point Operations Per Second (FLOPS) for evaluating model efficiency.
-
Clone the repository:
git clone https://github.com/judeomg/PSO-KDVA.git cd PSO-KDVA
-
Install dependencies:
pip install -r requirements.txt
-
Prepare the dataset: Place the dataset files in the
data/
directory following the expected format.
- Run the
pso.py
script to perform architecture space search:This script uses the Particle Swarm Optimization algorithm to search for the optimal model architecture based on performance and efficiency.python pso.py
-
Navigate to the
finetune/
directory:cd CodeBERT/sva/finetune
-
Run the fine-tuning script:
python main.py
-
Navigate to the
compress/
directory:cd CodeBERT/sva/compress
-
Run the distillation script:
python distill.py
- Architecture Space Search: Use PSO to find optimal model architectures for SVA.
- Fine-Tuning a Teacher Model: Train a robust teacher model for SVA tasks.
- Compressing a Student Model: Reduce model size while maintaining high performance using knowledge distillation.
- SVA Dataset: Provide a high-quality dataset for evaluating vulnerability assessment models.
Feel free to contribute or raise issues for further improvements!