- Built a simple model that analyses the sentiment of the reviews. It helps the companies to understand the customers take on the products.
- Cleaned the data and performed EDA to get a better understanding of the data.
- Used NaiveBayes and logistic regression model to get prediction result of 93% accuracy.
Python Version: 3.7
Packages: pandas, numpy, sklearn, matplotlib, plotly, nltk, wordcloud, pillow, string
Dataset : Kaggle
- Cleaned the data by removing punctuations and stopwords.
- Performed Tokenization/Count vectorization of texts.
- Transformed the categorical variables into dummy variables. Split the data into train and tests sets with a train size of 80%.
- Tried two different models and evaluated them using F1 score.
- Chose F1 score because F1 Score is more useful than accuracy, especially in an uneven class distribution.
Both the NaiveBayes and Logistic Regression model performed on the same level on the test data.
Model | Accuracy |
---|---|
NaiveBayes | 94.44 % |
Logistic Regression | 95.39 % |