-
Notifications
You must be signed in to change notification settings - Fork 13
Model development
Duong Tieu Dong edited this page Jul 14, 2021
·
4 revisions
Digital image analysis was chosen for the method of detecting changes. There are two main ways to check if the changes are legitimate or not:
- Abnormally detection by using an Autoencoder or Isolation Forest
- Image classification with a Convolutional Neural Network (the approach we went with).
The process of building the model can be split into 5 main steps:
- Data collection
- Data preprocessing
- Model development
- Model optimization
- Model evaluation
Normal website data
- moz.com/top500
- github.com/GSA/govt-urls
Defaced website data
- mirror-h.org
- zone.kurd-h.org
- www.zone-h.org
- www.xatrix.org/defac.php
The dataset is separated into two category of images:
- Clean (normal) websites - 6333 images
- Defaced websites - 4815 images
These images is then downscaled to 250x250px and split into two parts: 80% of them are used for training and 20% for validation. After that, the data is augmented by rotating, flipping and cropping the images and then normalized by re-scaling from the [0, 255] range of RGB values down to the [0, 1] range that neural networks are familiar with.
Overfitting can be addressed with:
- The addition of dropout layers
- Data augmentation
- The addition of batch normalization layers