This repository contains all files relevant to the CA4022 MovieLens Analysis Assignment.
All of the following files/directories are available in this repository
- ml-latest-small: Contains the .csv files containing the data used in the cleaning/analyses
- PIG Cleaning: Contains the .pig scripts used to clean and merge the data
- PIG Analysis: Contains the .pig scripts used to analyse the data
- HIVE Analysis: Contains the .sql scripts used to analyse the data
- Output Screenshots: Contains images of the outputs for both PIG and HIVE analyses
- Visualisation: Contains any relevant files used to create visualisations/graphs of the data
- Documentation: A markdown file which expands on each PIG/HIVE query further