This project's aim is to predict the ratings using the users reviews in the spanish Amazon Review Corpus.
-> Project status: [ Completed ]
The importance of customer satisfaction is that it helps us to know the likelihood of a customer making a purchase in the future. Asking customers to rate the degree of satisfaction is a good way to see if they will become regular customers or even brand advocates.
Objective: Create a model of Machine Learning to analyse and predict the number of stars of any review based only on the text of it.
- Descriptive statistics
- Data visualization
- Corpus preprocessing
- Feature engineering
- Sentiment Analysis
- Machine learning
- Python
- Numpy, Pandas, Scipy
- NLTK, Spacy
- Matplotlib, Seaborn
- Scikit Learn: Naive Bayes, Linear SVC, etc.
With the modeling part we were able to obtain an accuracy of 0.55 in test, which is not a particularly high performance. The model that had the best performance was Linear SVC for a very short time. With the analysis of the coefficients we noticed that the most important features for this model are the ones related to the polarity and some of the corpus itself: perfect, good, good, great, bad, not good, meet - not meet, return, etc.
To improve the current project it would be convenient to reform the problem, since even if other machine learning algorithms are tested the result will not change much.
If you want to classify reviews of people who feel satisfied with a product in contrast to people to whom the product did not meet their expectations, it would be best to simplify the problem to a binary classification. That is, the model should predict whether the comment is positive or negative. And this would provide more information about the overall quality or satisfaction of the product.
You can visit my Personal Website, follow me on Twitter, connect with me on LinkedIn, or check out the rest of my projects on my GitHub.