Cancer Prediction

The replication of the paper Histopathology images predict multi-omics aberrations and prognoses in colorectal cancer patients.

Requirements

python==3.10
pytorch==1.12.0 
torchvision==0.13.0
tensorflow==2.10

Upon our testing on CUDA 11.6 and 11.7, the following command lines work.

conda create --name your_env_name python=3.10
conda activate your_env_name
conda install pytorch==1.12.0 torchvision==0.13.0 torchaudio==0.12.0 cudatoolkit=11.6 -c pytorch -c conda-forge
pip install tensorflow==2.10

Openslide

pip install Openslide-python

For Windows:

Download from OpenSlide, add bin and lib to environment variables.

If you encounter any errors during the import openslide process, find the file lowlevel.py from your error message, then add the code following:

import os
os.add_dll_directory("<your openslide bin path>")

For Ubuntu:

apt install python-openslide

Data Preparation

TCGA

Data source: GDC Data Portal

Filter:

Cases
- Primary Site: colon, rectum
- Program: TCGA
- Project: TCGA-COAD, TCGA-READ
Files
- Data Type: Slide Image
- Experimental Strategy: Tissue Slide

Download tool: gdc-data-transfer-tool

Data Proprocessing

Refactoring

example:

python ./data_preprocessing/refactor_tcga_files.py --json_name E:/frb-cancer-prediction/TCGA20/metadata.cart.2023-11-20.json --input_path E:/frb-cancer-prediction/TCGA20 --output_path ./dataset/TCGA20/origin

Tiling

example:

python ./data_preprocessing/tiling.py --input_path ./dataset/TCGA20/origin --output_path ./dataset/TCGA20/tiled

Normalization

There are some unresolved severe problems in code, but I have not noticed any filtering procedure in paper or original code. In temperary, I selected the normalized images manually.

example:

python .\data_preprocessing\normalization.py --input_path .\dataset\TCGA20\tiled\ --output_path .\dataset\TCGA20\normalized\ --folders True

Feature Extraction

TODO

Survival Prediction

Temporary notes by wfy

I think prediction.py will run, but with my testing data it has no enough dimentions and will get error in model.

Besides, to get it run, I manually set some parameters.

In utils/prediction.py:

loss_function = weibull_loglik_discrete
optimizer = Adam(lr=0.0001)
epochs = 1

in which loss_function can be chosen from the 3 given ones in utils/prediction_model.py

In utils/prediction_data_gen.py:

n = 1
xi = 0
xj = 1

I don't know its use.

Haven't apply codes after 'evaluation'.

Code Commit Specification

It is recommended that the commit message be written in the following format:

The message types are:

feat - new features
fix - fix bugs
docs - documentations or comments
style - code format
refactor - refactoring, optimization (neither adding new features nor fixing bugs)
perf - performance optimization
test - adding tests
chore - changes to the build process or auxiliary tools
revert - roll back
build - build package

Name		Name	Last commit message	Last commit date
Latest commit History 21 Commits
data_preprocessing		data_preprocessing
feature_extraction		feature_extraction
survival_prediction		survival_prediction
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Cancer Prediction

Requirements

Openslide

For Windows:

For Ubuntu:

Data Preparation

TCGA

Data Proprocessing

Refactoring

Tiling

Normalization

Feature Extraction

Survival Prediction

Temporary notes by wfy

Code Commit Specification

About

Releases

Packages

Contributors 3

Languages

License

Hibiki33/cancer-prediction

Folders and files

Latest commit

History

Repository files navigation

Cancer Prediction

Requirements

Openslide

For Windows:

For Ubuntu:

Data Preparation

TCGA

Data Proprocessing

Refactoring

Tiling

Normalization

Feature Extraction

Survival Prediction

Temporary notes by wfy

Code Commit Specification

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages