Used-AWS-EMR-to-process-big-data-using-Spark-and-Hadoop

Utlized AWS EMR to process big dataset to filter out the specific data according to my requirements using pyspark code and the resultant data was stored in S3 bucket in csv format.

AWS EMR Configuration: I set up the AWS EMR environment, configuring EC2 instances and the Spark engine to ensure optimal performance and resource utilization.

Dataset Filtering: Leveraging the power of Spark's distributed computing capabilities, I designed and executed data filtering operations on the dataset.

PySpark Development: I utilized the PySpark API, to develop the code for data filtering. This involved leveraging inbuilt PySpark's library ecosystem and powerful functions to achieve efficient and scalable data processing.

Spark Submit Command: To execute the Spark code on the AWS EMR cluster, I utilized the "spark-submit" command for submission and execution of the Spark application also monitored using spark webUI.

Faced a lot of difficulties while setting up AWS EMR due to some configurations, somehow overcame it using stackoverflow and chatgpt.

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
Data_pipeline image.png		Data_pipeline image.png
Dataset used.zip		Dataset used.zip
Output csv in S3 location.png		Output csv in S3 location.png
Output on EMR terminal.png		Output on EMR terminal.png
README.md		README.md
Sample_output_data.csv		Sample_output_data.csv
application stages on SparkwebUI.png		application stages on SparkwebUI.png
pyspark_code.py		pyspark_code.py
spark-submit on EMR.png		spark-submit on EMR.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Used-AWS-EMR-to-process-big-data-using-Spark-and-Hadoop

About

Releases

Packages

Languages

alexaustin007/Used-AWS-EMR-to-process-big-data-using-Spark-and-Hadoop

Folders and files

Latest commit

History

Repository files navigation

Used-AWS-EMR-to-process-big-data-using-Spark-and-Hadoop

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages