Performed feature extraction on pre-trained
and fine-tuned
Convolution Neural Networks for enhanced
accuracy on dataset of BreakHis through application of novel Genetic Algorithm
This is a python-based project with a motivation to achieve enhanced
accuracy on predictions made by highly complex and state-of-the-art
CNNs with the use of Genetic Algorithm
. For this project, popular BreakHis
dataset was employed to verify feature extraction, classifying between benign and malignant types of tumors detected under histopathological images of affected tissue. For feature extraction, three Convolution Neural Networks are used with their pre-final
layer modified for the need of the objective. As the models get trained and evaluated on respective datasets, features with the best accuracy on validation set is retained for further operation down the line. The final features undergo different levels in Genetic Algorithm to eliminate redundant
and uncanny
features from the total feature space. The filtered features now procure significant increment in accuracy on the validation dataset.
The Breast Cancer Histopathological Image Classification (BreakHis) is composed of 9,109
microscopic images of breast tumor tissue collected from 82 patients using different magnifying factors (40X, 100X, 200X, and 400X). To date, it contains 2,480
benign and 5,429
malignant samples (700X460 pixels, 3-channel RGB, 8-bit depth in each channel, PNG format).
The dataset is available at:
https://web.inf.ufpr.br/vri/databases/breast-cancer-histopathological-database-breakhis/
In this project, the histopathological image samples of human breast tissue have been classified into two categories
Benign
Malignant
Three CNN models may be applied one at a time on the dataset for feature extraction
Visual Geometry Group (VGG-19)
ResNet-18
GoogLeNet
Three types of classifiers are employed for fitness evaluation
Supoort Vector Machines (kernel<--rbf)
K Nearest Neighbours (neighbours<--2)
Multi-layer Perceptron
Different extractors paired with MLP classifer for GA classification gives three plots of accuracy vs generations:
Epoch-10
Generations-10
Since the entire project is based on Python
programming language, it is necessary to have Python installed in the system. It is recommended to use Python with version >=3.9
.
The Python packages which are in use in this project are matplotlib
, numpy
, pandas
,scikit-learn
, torch
and torchvision
. All these dependencies can be installed just by the following command line argument
pip install requirements.txt
Current directory ----> data
|
|
|
------------------> train
| |
| -------------------------
| | | |
| V V V
| class 1 class 2 ..... class n
|
|
|
------------------> val
|
-------------------------
| | |
V V V
class 1 class 2 ..... class n
- Where the folders
train
andval
contain the folders for different classes of histopathological images of respective type of breast tissue tumor in.jpg
/.png
format.
usage: main.py [-h] [-data DATA_PATH] [-classes NUM_CLASSES] [-ext EXT_TYPE] [-classif CLASSIF_TYPE]
Application of Genetic Algorithm
optional arguments:
-h, --help show this help message and exit
-data DATA_PATH, --data_path DATA_PATH
Path to data
-classes NUM_CLASSES, --num_classes NUM_CLASSES
Number of data classes
-ext EXT_TYPE, --ext_type EXT_TYPE
Choice of extractor
-classif CLASSIF_TYPE, --classif_type CLASSIF_TYPE
Choice of classifier for GA
python main.py -data data -classes n -ext resnet -classif MLP
GoogLeNet: 'googlenet'
VGG-19: 'vgg'
ResNet-18: 'resnet'
SVM: 'SVM'
MLP: 'MLP'
KNN: 'KNN'