Welcome on our repo for the project of Deep Learning where we try to apply deep learning techniques on X-rays images to predict if a patient has been infected by COVID-19 or not.
This report has made focused on a binary classification task to identify healthy individuals from those infected by COVID-19, and also a multi-class classification where the task is to identify COVID19+ from healthy individuals or those infected with viral and bacterial pneumonia.
All the data used for this project has been placed on this drive. The structure of our datasets are in the following way:
COVID+ | COVID- | Viral Pneumonia | Bacterial Pneumonia | |
---|---|---|---|---|
Train set | 112 | 112 | 112 | 112 |
Small test set | 28 | 28 | 28 | 28 |
Large test set | 28 | 1583 | 1494 | 2788 |
In the subfolder "data/final_data", you can find the cleaned and pre-processed dataset which has been used for training the models as well as the small test set. Furthermore, the data that was extracted from the sources are found in a folder "data/kermany_OTHERS" which consists of chest-x-ray scans of patients from 2018 who were health (forming our COVID-) or had viral or bacterial types of pneumonia. The source of this dataset can be found on the dataset's website.
Regarding the pictures that form the COVD+ photos for the training have been retrieved from chest-x-ray repo and has been placed in the subfolder "data/chestxray_COVID". Please do not that the latter folder contains not only x-rays but also CT-scans as retrieved in its original form from the repo. A fourth and a fifth folder have been formed which form a larger test set more to evaluate the model on a realistic scale. Please note that the test set of COVID+ is always 28 photos whether testing on a large or small scale. Balanced classes of COVID- and both types of pneumonia have been randomly sampled to match the 140 images of COVID+. Furthermore, the script that does this sampling has been placed here.Transfer learning with the help of two pre-trained models has been deployed. The first model is VGG16 while the second one is the less parameterized DenseNet201. For both binary and multi-class classification, they have their own affiliated folders with scripts for tuning and the final re-training files. The models were tuned on the ai platform of google cloud and can be found in two subfolders and their tuning runs have been placed in the runs folder. The models have also been placed on the same drive as the datasets and the folder for the model to be placed in the same projg05 main directory.
The final report can be found in the report folder knitted to Html using the rmarkdown package. Please do note that all the testing runs can be carried out in the same folder.