GitHub

Path to the project repositary on the cluster: ~/home/team24/project/bigdata_project_team24

This repository contains the following directories:

data/ contains the dataset files.
models/ contains the Spark ML models.
notebooks/ has the Jupyter notebooks of the project and used for learning purposes (interactive PDA).
output/ represents the output directory for storing the results of the project. It can contain csv files, text files. images and any other materials you returned as an ouput of the pipeline.
scripts/ is a place for storing .sh scripts and .py scripts of the pipeline.
sql/ is a folder for keeping all .sql and .hql files.

requirements.txt lists the Python packages needed for running your Python scripts. Feel free to add more packages when necessary.

main.sh is the main script that will run all scripts of the pipeline stages which will execute the full pipeline and store the results in output/ folder. During checking your project repo, the grader will run only the main script and check the results in output/ folder.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

About

Releases

Packages

Contributors 3

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 41 Commits
data		data
models		models
notebooks		notebooks
output		output
scripts		scripts
sql		sql
.gitignore		.gitignore
README.MD		README.MD
main.sh		main.sh
requirements.txt		requirements.txt

ksko02/bigdata_project_team24

Folders and files

Latest commit

History

Repository files navigation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages