The data was collected with a custom Selenium Webdriver web scraper for IMDb reviews which could be found here.
The initial cleaning was done in Python Jupyter Notebook environment and then the cleaned data was stored in a .csv file. Further cleaning and analysis were done with R and the steps could be followed on Table of Contents below.
The rendered notebook is on Kaggle and it can be accessed from here.
I analyzed Spider-Man movie reviews from IMDb. As well as text analysis I also analyzed review related variables such as Review Rating, Review Helpfulness and Review Date.
In text analysis I used basic Natural Language Processing techniques such as:
- Counts and Frequency of the Words in a review
- TF-IDF Analysis
- Sentiment Analysis
- Topic Modelling with Latent Dirichlet Allocation (LDA)