This project involves the analysis of the Iris dataset and the implementation of a K-Nearest Neighbors (KNN) classifier to predict the species of iris flowers based on their features.
The project is organized into several steps, each represented by a cell in the Jupyter Notebook:
-
Import Libraries: Import necessary libraries such as
numpy
,pandas
,seaborn
,matplotlib
, andsklearn
. -
Load and Prepare Data: Load the Iris dataset and prepare it for analysis by creating a DataFrame and mapping species labels.
-
Data Exploration: Display the first few rows of the dataset and create various plots to visualize the data:
- Boxplots for petal length and petal width by species.
- Scatter plots for sepal length vs. sepal width and petal length vs. petal width, colored by species.
- Pairplot to visualize relationships between all features.
- Correlation heatmap to show the correlation between features.
-
Train-Test Split: Split the data into training and testing sets.
-
KNN Classifier: Implement and train a KNN classifier:
- Train the model with a specified number of neighbors.
- Predict the labels for the test set.
- Calculate and print the accuracy of the model.
-
Hyperparameter Tuning: Use GridSearchCV to find the best hyperparameters for the KNN classifier:
- Define a parameter grid.
- Perform grid search with cross-validation.
- Print the best parameters and the best score.
- Evaluate the best model on the test set.
- numpy
- pandas
- seaborn
- matplotlib
- scikit-learn
To run the project, open the Jupyter Notebook and execute the cells in order. The notebook will guide you through the steps of loading the data, visualizing it, training the KNN classifier, and tuning its hyperparameters.
The project demonstrates the use of KNN for classification and the impact of hyperparameter tuning on model performance. The best model achieved an accuracy of 100% on the test set.
This project is licensed under the MIT License.# iris