Model development

Approach

Digital image analysis was chosen for the method of detecting changes. There are two main ways to check if the changes are legitimate or not:

Abnormally detection by using an Autoencoder or Isolation Forest
Image classification with a Convolutional Neural Network (the approach we went with).

The process of building the model can be split into 5 main steps:

Data collection
Data preprocessing
Model development
Model optimization
Model evaluation

Data collection

Normal website data

moz.com/top500
github.com/GSA/govt-urls

Defaced website data

mirror-h.org
zone.kurd-h.org
www.zone-h.org
www.xatrix.org/defac.php

Data preprocessing

The dataset is separated into two category of images:

Clean (normal) websites - 6333 images
Defaced websites - 4815 images

These images is then downscaled to 250x250px and split into two parts: 80% of them are used for training and 20% for validation. After that, the data is augmented by rotating, flipping and cropping the images and then normalized by re-scaling from the [0, 255] range of RGB values down to the [0, 1] range that neural networks are familiar with.

Model optimization

Overfitting can be addressed with:

The addition of dropout layers
Data augmentation
The addition of batch normalization layers

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Model development

Approach

Data collection

Data preprocessing

Model optimization

Model evaluation

Clone this wiki locally