Pneumonia Classification Using Chest X-Ray Images

DTU Machine Learning Operations

Team Members

Lukas Korinek (s246710)
Frederik Sartov Olsen (s204118)
Konstantinos-Athanasios Papagoras PhD (s230068)
Yessin Moakher (s250283)
Stiina Salumets (s250088)

Project Goal

The primary goal of this project is to classify pneumonia using chest X-ray images.

Framework and Usage

We will use the PyTorch Image Models (timm) framework for this project.

Dataset Information

This project aims to classify chest X-ray images into two categories: Pneumonia and Normal, using the Kaggle dataset "Chest X-Ray Images (Pneumonia)". The dataset contains 5,863 pediatric chest X-rays from Guangzhou Women and Children’s Medical Center, all captured as part of routine clinical care and graded by expert physicians. For the initial stages of the project, we will use a subset of the dataset to verify that everything is running smoothly, before scaling up to the full dataset. The dataset is split into training, validation, and testing folders, making it ideal for this task. It was chosen for its simplicity and suitability for beginner-level image classification projects, especially in healthcare, and it seems feasible to implement within a short timeframe.

Models

For this project, we will begin with a baseline model using a simple convolutional neural network (CNN) to establish a reference performance. We will then leverage pre-trained models from PyTorch’s image models framework, such as ResNet50, VGG16, and DenseNet, to improve classification accuracy. These models will be fine-tuned for our specific task by adapting the final layers to classify X-ray images into two categories: Pneumonia and Normal. We will use torchvision for accessing pre-trained models and data augmentation, and torch.optim for optimization, evaluating the models based on accuracy, precision, recall, and F1-score.

Automation Tasks

This tasks.py file contains automation tasks. It uses the invoke library.

Prerequisites

Install invoke:
```
pip install invoke
```
Create a .env file with required environment variables (e.g., WANDB_API_KEY).

Commands

Setup Tasks

Create Environment
```
invoke create-environment
```
Creates a new Conda environment for the project (don't forget to activate it before installing requirements).
Install Requirements
```
invoke requirements
```
Installs the project dependencies from requirements.txt and local pip configuration.
Install Development Requirements
```
invoke dev-requirements
```
Installs development dependencies.

Core Tasks

Preprocess Data
```
invoke preprocess-data --percentage=<float>
```
Preprocesses raw data and stores it in the processed directory. Use the --percentage argument to specify a fraction of data to process (default is 1.0).
Train Model
```
invoke train
```
Executes the model training script.
Run Tests
```
invoke test
```
Runs tests using pytest and generates a coverage report.
Test Coverage Report
```
invoke test-coverage
```
Runs tests and displays a detailed coverage report.

Docker Tasks

Build Docker Image
```
invoke docker-build --progress=<plain|auto>
```
Builds the Docker image for the project using the specified Dockerfile.
Run Docker Training
```
invoke docker-train
```
Runs the training process in a Docker container. Requires WANDB_API_KEY in the environment.

W&B Tasks

Run W&B Sweep
```
invoke wandb-sweep --config-path=<path>
```
Creates a Weights & Biases sweep from the specified config file and programmatically starts the agent. The default configuration path is configs/sweep.yaml. You can specify a different path using the --config-path argument if needed.

Formatting Tasks

Format Code with Ruff
```
invoke ruff-format
```
Formats the project files using ruff.

Docs Tasks

Build Documentation
```
invoke build-docs
```
Builds the project documentation using mkdocs.
Serve Documentation
```
invoke serve-docs
```
Serves the project documentation locally for preview.

Notes

Ensure all required tools and dependencies are properly installed before running tasks.
For additional details, refer to the tasks.py source code.

Directory Structure

The directory structure of the project looks like this:

├── .dvc/                     # Data version control
├── .github/                  # GitHub actions and dependabot
│   ├── dependabot.yaml
│   └── workflows/
│       └── tests.yaml
├── configs/                  # Configuration files
├── data/                     # Data directory
│   ├── processed
│   └── raw
├── dockerfiles/              # Dockerfiles
├── docs/                     # Documentation
├── models/                   # Trained models
├── notebooks/                # Jupyter notebooks
├── reports/                  # Reports
│   └── figures/
├── src/                      # Source code
│   └── mlops_project/
├── tests/                    # Tests
├── .dvcignore
├── .gitignore
├── .pre-commit-config.yaml
├── LICENSE
├── pyproject.toml            # Python project file
├── README.md                 # Project README
├── requirements.txt          # Project requirements
├── requirements_dev.txt      # Development requirements
└── tasks.py                  # Project tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Pneumonia Classification Using Chest X-Ray Images

Team Members

Project Goal

Framework and Usage

Dataset Information

Models

Automation Tasks

Prerequisites

Commands

Setup Tasks

Core Tasks

Docker Tasks

W&B Tasks

Formatting Tasks

Docs Tasks

Notes

Directory Structure

About

Releases

Packages

Contributors 6

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 130 Commits
.dvc		.dvc
.github		.github
configs		configs
dockerfiles		dockerfiles
docs		docs
models		models
notebooks		notebooks
reports		reports
scripts		scripts
src/mlops_project		src/mlops_project
tests		tests
workflows		workflows
.dvcignore		.dvcignore
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
LICENSE		LICENSE
README.md		README.md
data.dvc		data.dvc
policy.yaml		policy.yaml
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
requirements_dev.txt		requirements_dev.txt
tasks.py		tasks.py

License

lkorinek/mlops-project

Folders and files

Latest commit

History

Repository files navigation

Pneumonia Classification Using Chest X-Ray Images

Team Members

Project Goal

Framework and Usage

Dataset Information

Models

Automation Tasks

Prerequisites

Commands

Setup Tasks

Core Tasks

Docker Tasks

W&B Tasks

Formatting Tasks

Docs Tasks

Notes

Directory Structure

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 6

Languages

Packages