Link to the official documentation: Stacks Azure Data Platform.
The Ensono Stacks Azure Data Platform solution provides a framework for accelerating the deployment of a production-ready modern data platform in Azure.
- Use the Ensono Stacks CLI to generate a new data platform project.
- Build and deploy the data platform infrastructure into your Azure environment.
- Accelerate development of data workloads and ELT pipelines with the Datastacks CLI.
The Ensono Stacks Data Platform delivers a modern Lakehouse solution, based upon the medallion architecture, with Bronze, Silver and Gold layers for various stages of data preparation. The platform utilises tools including Azure Data Factory for data ingestion and orchestration, Databricks for data processing and Azure Data Lake Storage Gen2 for data lake storage. It provides a foundation for data analytics and reporting through Microsoft Fabric and Power BI.
Key elements of the solution include:
- Infrastructure as code (IaC) for all infrastructure components (Terraform).
- Deployment pipelines to enable CI/CD and DataOps for the platform and all data workloads.
- Sample data ingest pipelines that transfer data from a source into the landing (Bronze) data lake zone.
- Sample data processing pipelines performing data transformations from Bronze to Silver and Silver to Gold layers.
The solution utilises the Stacks Data Python library, which offers a suite of utilities to support:
- Data transformations using PySpark.
- Frameworks for data quality validations and automated testing.
- The Datastacks CLI - a tool enabling developers to quickly generate new data workloads.
stacks-azure-data
├── build # Deployment pipeline configuration for building and deploying the core infrastructure
├── de_build # Deployment pipeline configuration for building and deploying data engineering resources
├── de_workloads # Data engineering workload resources, including data pipelines, tests and deployment configuration
│ ├── generate_examples # Example config files for generating data engineering workloads using Datastacks
│ ├── ingest # Data ingestion workloads
│ ├── processing # Data processing and transformation workloads
│ ├── shared_resources # Shared resources used across data engineering workloads
├── deploy # TF modules to deploy core Azure resources (used by `build` directory)
├── docs # Documentation
├── stacks-cli # Example config to use when scaffolding a project using stacks-cli
├── utils # Python utilities package used across solution for local testing
├── .pre-commit-config.yaml # Configuration for pre-commit hooks
├── Makefile # Includes commands for environment setup
├── pyproject.toml # Project dependencies
├── README.md # This file
├── stackscli.yml # Tells the Stacks CLI what operations to perform when the project is scaffolded
├── taskctl.yaml # Controls the independent runner
└── yamllint.conf # Linter configuration for YAML files used by the independent runner
Please refer to the documentation for getting started with developing Stacks: Local Development Quickstart.