Skip to content

ksko02/bigdata_project_team24

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

41 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Path to the project repositary on the cluster: ~/home/team24/project/bigdata_project_team24

This repository contains the following directories:

  • data/ contains the dataset files.
  • models/ contains the Spark ML models.
  • notebooks/ has the Jupyter notebooks of the project and used for learning purposes (interactive PDA).
  • output/ represents the output directory for storing the results of the project. It can contain csv files, text files. images and any other materials you returned as an ouput of the pipeline.
  • scripts/ is a place for storing .sh scripts and .py scripts of the pipeline.
  • sql/ is a folder for keeping all .sql and .hql files.

requirements.txt lists the Python packages needed for running your Python scripts. Feel free to add more packages when necessary.

main.sh is the main script that will run all scripts of the pipeline stages which will execute the full pipeline and store the results in output/ folder. During checking your project repo, the grader will run only the main script and check the results in output/ folder.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 3

  •  
  •  
  •