Data is more valuable in this era, and big companies always want to have access to the cleanest data available. However, in this generation, it is not easy to maintain such vast amounts of data. Therefore, these companies require individuals who can effectively manage and utilize this data to meet their business needs.
#Project Definition : Created a Data warehouse Project using Python, MS-SQL server, Talend(ETL) and other Data Warehouse Concepts.
For this project, we will use the Brazilian E-commerce Public Dataset by Olis. You can access this dataset on Kaggle.
- Python
- MS-SQL SERVER
- Talend (ETL)
- Machine Learning
- Power BI
This project involves the following key steps:
-
Data Loading in Python: Initial data loading and preprocessing using Python.
-
Data Cleaning: Data is cleaned and prepared for further processing to ensure its quality.
-
Data Loading in MS-SQL Server: Cleaned data is loaded into MS-SQL Server for storage and analysis.
-
Creating ETL Jobs: Extract, Transform, Load (ETL) jobs are designed and implemented to facilitate data processing and integration.
-
Data Modeling: Data modeling techniques are applied to structure and organize data effectively for analysis.
-
Creating Star Schema: Designing a star schema to optimize query performance and facilitate data retrieval.
-
Building Two Data Marts: Two separate data marts are constructed, one intended for Machine Learning models and the other for Power BI reporting.
-
Implementing Data Marts: The data marts are put to practical use, with one dedicated to Machine Learning model applications and the other for Power BI, enabling business intelligence reporting.