This project was based on the Amazon Reviews for Sentiment Analysis dataset on Kaggle. Several models were built from scratch in order for this achieving 91% accuracy on the test dataset.
- Tokenization was done with the keras Tokenizer and the models were trained on 400,000 text samples from the dataset.
- Tokenizer used was saved in a .pickle file for future used.
- Models used are Conv1D and the LSTM models with different strengths of regularization.
- Glove pretrained embeddings was used to train the model and achieved a 92% accuracy on the test dataset (contains 40,000 text samples)
In the second notebook 'amazon review sentiment analysis with transformers', a transformer block was built from scratch using keras on the dataset. Decent performance was reached (~89% accuracy on the test) but not as good as the previous models built.
A final approach on the project would be to finetune BERT (Bidirectional Encoder Representations from Transformers) on the dataset using HuggingFace ecosystem and evaluating the performance to compare with the models built from scratch.