Skip to content

CSfromCS/Water-Quality-Classification-and-Clustering

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 

Repository files navigation

Water Potability Classification and Quality Clustering

Data analysis experimentation using the dataset from Kadiwal, A. (2020)

image

Methodology

  • Data Source Collection
  • Data Cleaning and Pre-processing
    • Train Test Split of 80/20
    • Creating 3 sets, Origial, MinMax, and Standard scaled
  • Modelling
    • Nearest Neighbors
    • Decision Tree
    • K-means Clustering
  • Result analysis
    • Best classification model
      • Metrics: Accuracy, Precision, Recall, F1 Score
    • Highest Scoring cluster
      • Silhouette Score

image

Tech stack: Jupyter for the modelling and charting, Canva for the images

Results

Classification

image image

Best performing model is the Nearest Neighbors (k=10) with the Standard Scaling. This achieved an accuracy of 69.65% and precision of 71.43%. image

Clustering

Across the multiple features, only 2 features (solids vs turbidity) are used in the visualisation but all features were used in the model.

K-means clustering with K=2 using the Original dataset performed best with a silhouette score of 0.571. image

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published