Movie Ratings Predictor

We analyzed a decade of movies, developed a machine learning model, and turned it into a flask app to predict if a movie will be a hit or a flop. Just fill out the form with some basic details, such as your budget, expected running time, director, actors, even the plot. We can predict with 71% confidence if your proposed movie will be successful.

Objective

Train a supervised model to predict ratings of proposed movies.
Create an historical data analysis with ten years of movie data.
Create an interactive web app with movie rating predictions and a dashboard of visualizations.

ETL

Extract using an API calls to obtain list of movies by year from OMDb.
Transform using Python/Pandas to merge dataframes, replace null values with ‘NA’ or ‘0’ as appropriate, remove duplicates, normalize all headers
Load the final final output to CSV and our SQL database.

Exploratory Analysis

We analyzed the data in Python/Pandas with matplotlib, pyplot, seaborn, and wordcloud.

Some observations:

Surprisingly, we discovered IMDb user votes and IMDb critical ratings aren't strongly correlated.
Budgets and IMDb ratings aren't strongly correlated.
Interestingly, budgets and box office gross amount is correlated.
Ratings from Rotten Tomatoes and Metascore don't always reflect the same kind of feedback for the same movie.

Preprocessing

Create labels based on IMDb rating.
Select beneficial columns for the model.
Convert categorical data to numerical using One-Hot-Encoding.
Additional cleanup of null and missing values.
Reduce number of distributions of some features such as country, language, genre, director, writer, actor, star, and rating (i.e. PG vs. R, etc.).
Binning the IMDb rating into 3 distinct groups of <5, between 5 and 7, and >7.
Encode the rest of categorical data using get_dummies.
Process plot values into vectors.
Merge all features into a single dataframe.

Our final dataset uses the following features: runtime (mins), budget (USD), rated, director, writer, actors, language, country, genre, star, and plot.

Building the Models

We used a supervised machine learning classification model leveraging a Random Forest Classifier. The greatest challenge was in converting the plot data into a vector to join with the rest of the model. We used a multi-step process to clean and transform words into a meaningful representation, but we did it!

Deploying the App

We used a Flask app to serve our website with a dashboard highlighting interesting relationships with features of the data set and our prediction app. We wrote a function to bring together the vectorized plot data with the model which produces the prediction.

Our Presentation

Name		Name	Last commit message	Last commit date
Latest commit History 121 Commits
archive		archive
json_files		json_files
resources		resources
static		static
templates		templates
.gitignore		.gitignore
OMDB_data.ipynb		OMDB_data.ipynb
README.md		README.md
app.py		app.py
data_analysis.ipynb		data_analysis.ipynb
model.pkl		model.pkl
model.py		model.py
model_building.ipynb		model_building.ipynb
movies_ETL.ipynb		movies_ETL.ipynb
scaler.pkl		scaler.pkl
tdidf.joblib		tdidf.joblib

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Movie Ratings Predictor

Objective

ETL

Exploratory Analysis

Preprocessing

Building the Models

Deploying the App

Resources

Team

About

Releases

Packages

Contributors 4

Languages

dkletter/movie-ratings-predictor

Folders and files

Latest commit

History

Repository files navigation

Movie Ratings Predictor

Objective

ETL

Exploratory Analysis

Preprocessing

Building the Models

Deploying the App

Resources

Team

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 4

Languages

Packages