Name		Name	Last commit message	Last commit date
parent directory ..
README.md		README.md

README.md

Predict heart disease

The goal of this lab is to train a model for the diagnosis of coronary artery disease.

Dataset

The dataset is provided by the Cleveland Clinic Foundation for Heart Disease (more information). The dataset file to use is available here. Each row describes a patient. Below is a description of each column.

Column	Description	Feature Type	Data Type
Age	Age in years	Numerical	integer
Sex	(1 = male; 0 = female)	Categorical	integer
CP	Chest pain type (0, 1, 2, 3, 4)	Categorical	integer
Trestbpd	Resting blood pressure (in mm Hg on admission to the hospital)	Numerical	integer
Chol	Serum cholestoral in mg/dl	Numerical	integer
FBS	(fasting blood sugar > 120 mg/dl) (1 = true; 0 = false)	Categorical	integer
RestECG	Resting electrocardiographic results (0, 1, 2)	Categorical	integer
Thalach	Maximum heart rate achieved	Numerical	integer
Exang	Exercise induced angina (1 = yes; 0 = no)	Categorical	integer
Oldpeak	ST depression induced by exercise relative to rest	Numerical	float
Slope	The slope of the peak exercise ST segment	Numerical	integer
CA	Number of major vessels (0-3) colored by flourosopy	Numerical	integer
Thal	3 = normal; 6 = fixed defect; 7 = reversable defect	Categorical	string
Target	Diagnosis of heart disease (1 = true; 0 = false)	Classification	integer

Platform

You may use either a local or remote Python environment for this lab.

The easiest way to obtain a working Python setup is by using a cloud-based Jupyter notebook execution platform like Google Colaboratory, Paperspace or Kaggle Notebooks.

Tools

This lab is designed to make you discover three essential libraries of the Python ecosystem for Machine Learning: NumPy, pandas and scikit-learn.

The following tutorials will give you the first level of knowledge you need to start using these tools in your projects.

NumPy: the absolute basics for beginners
10 minutes (or maybe a bit more 😊) to pandas

If you're time-constrained, you may skip the following parts: Selection, Merge, Grouping, Reshaping and Time Series.
Getting Started with scikit-learn

While studying these tutorials, it is essential to test all code examples.

When done with the tutorials, take this test to check your understanding.

Training process

You may train any binary classification model on this task, for example a basic SGDClassifier implementing the logistic regression algorithm.

To implement the training process, you should take inspiration from the project workflow and classification performance lectures.

Extra work

Try another model, for example a decision tree, and compare their performances.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

predict_heart_disease

predict_heart_disease

README.md

Predict heart disease

Dataset

Platform

Tools

Training process

Extra work

Files

predict_heart_disease

Directory actions

More options

Directory actions

More options

Latest commit

History

predict_heart_disease

Folders and files

parent directory

README.md

Predict heart disease

Dataset

Platform

Tools

Training process

Extra work