Skip to content

Describe the differences between supervised and unsupervised learning, including real-world examples of each. Preprocess data for unsupervised learning. Cluster data using the K-means algorithm. Determine the best number of centroids for K-means using the elbow curve. Use PCA to limit features and speed up the model.

Notifications You must be signed in to change notification settings

CGFS93/Cryptocurrencies

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 

Repository files navigation

Cryptocurrencies

Analysis Overview

  • Preprocessing the Data for PCA
  • Reducing Data Dimensions Using PCA
  • Clustering Cryptocurrencies Using K-means
  • visualizing classification results with 2D and 3D scatter plots.

Purpose

Accountability Accounting, a prominent investment bank, is interested in offering a new cryptocurrency investment portfolio for its customers. The company, however, is lost in the vast universe of cryptocurrencies. So, they’ve asked you to create a report that includes what cryptocurrencies are on the trading market and how they could be grouped to create a classification system for this new investment.

Resources

Data Source: crypto_data.csv.

Software: Python, Conda, Jupyter Notebook.

Results

Clustering Cryptocurrencies using K-Means - Elbow Curve

The output of this analysis is unknown so unsupervised machine learning is utilized to identify clusters for cryptocurrencies.
The K-Means method iterating on k values from 1 to 10.

The best k value appears to be 4 so in conclusion to use an output of 4 clusters to categorize the crytocurrencies.

Visualizing Cryptocurrencies

3D-Scatter plot with clusters

The 3-D scatter plot was obtained using the PCA algorithm to reduce the crytocurrencies dimensions to three principal components.

2D-Scatter plot with clusters

The 2-D scatter plot was obtained using the PCA algorithm to reduce the crytocurrencies dimensions to two principal components.

Both scatter plots show the distribution and the four clusters of cryptocurrencies.
The outliers identify the unique cryptocurrency in the class #2.

Tradable Cryptocurrencies Table

Most of the cryptocurrencies are part of class #0 and #1.
BitTorrent is the only cryptocurrency in class #2.

2D-Scatter plot with TotalCoinMined vs TotalCoinSupply

Plotting the scatter plot from two cryptocurrency features are not efficiently segregated by different classes. Using the PCA algorithm is the right method for better visualizations.

Summary

The identified classification of 532 cryptocurrencies are based on similarities among features.
Particularities of each group need to be analyzed to determined their performance and potential interest for the investment bank's clients.

About

Describe the differences between supervised and unsupervised learning, including real-world examples of each. Preprocess data for unsupervised learning. Cluster data using the K-means algorithm. Determine the best number of centroids for K-means using the elbow curve. Use PCA to limit features and speed up the model.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published