Skip to content

The task of the project is to perform unsupervised learning on Cifar 10 dataset. There are two task, the first is to perform K-means clustering on the raw data from scratch. The second is to perform K-means clustering on a representation generated by the Auto-Encoder method using library functions. The code should be written in Python using Keras.

Notifications You must be signed in to change notification settings

akshu15/K-Means-and-Auto-Encoder

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 

Repository files navigation

K-Means-and-Auto-Encoder

The task of the project is to perform unsupervised learning on Cifar 10 dataset. There are two task, the first is to perform K-means clustering on the raw data from scratch. The second is to perform K-means clustering on a representation generated by the Auto-Encoder method using library functions. The code should be written in Python using Keras.

The Cifar 10 dataset has a training set of 50,000 examples, and a test set of 10,000 examples. Each example is a 32x32 image, associated with a label from 10 classes. Each image is 32 pixels in height and 32 pixels in width, for a total of 1024 pixels in total. This pixel-value is an integer between 0 and 255. The training and test data sets have 1025 columns including the labels.

1.1 Implement K-Means Clustering

We will first load the cifar10 dataset and store the test and train data.

1_1

Next we will converts the images from RGB to Grayscale. The size of the last dimension becomes 1, containing the Grayscale value of the pixels.

1_2

To avoid making the computation more complex we will normalize the values to range 0 to 1. This is done by dividing the data by 255 (since the pixels range is 255).

1_3

Now we will reshape the images from a square of 32 X 32 pixels to 1024.

1_4

We will define 10 clusters and then initialize random centroids.

1_5

1_6

The next step is to define methods for updating and forming clusters.

1_7

We will now calculate the difference between the old and the new centroids. Until the difference is the least we will keep updating our centroids and form clusters with those updated centroids.

1_81

Now we will access the quality of the clusters using ASC (Average Silhouette Coefficient) and DI (Dunn’s Index) evaluation metrics.

1_8

For Dunn’s Index, I installed the validclust package separately.

1_82

1_9

1.2 Implement Auto-Encoder

The general idea of Auto-Encoders is that we have to set an encoder and a decoder as neural networks and to learn the best encoding-decoding scheme using an iterative optimisation process. So, at each iteration we feed the autoencoder architecture (the encoder followed by the decoder) with some data, we compare the encoded-decoded output with the initial data and back-propagate the error through the architecture to update the weights of the networks.

2_1

Here we have used encoder with two dense layers. One with 1024 for which we have flattened the x train data and then the other with 64. We will then define encoder model. This model will be further used for predicting the x train data. Similarly we have defined for the decoder part but in a reverse manner. That is the first layer has 64 then 1024.

2_2

We will then compile and fit our AutoEncoder Model using Adam optimizer and loss =’mse’.

2_3

Now using the Encoder model defined previously we will predict the x train data. Using K-Mean we generate clusters from the sparse representations generated by the Auto-Encoders.

2_4

Lastly we will access the quality of the clusters using ASC (Average Silhouette Coefficient).

2_5

About

The task of the project is to perform unsupervised learning on Cifar 10 dataset. There are two task, the first is to perform K-means clustering on the raw data from scratch. The second is to perform K-means clustering on a representation generated by the Auto-Encoder method using library functions. The code should be written in Python using Keras.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published