AWS_DataMeshFoundations

Data mesh reference architectures for AWS.

Manual Steps to Create Data Mesh

Create the producer, consumer, and central catalog account

Producer Account:

Create an S3 bucket. Load data into the bucket. Sample data can be found in the sample data folder of the repo.

Central Catalog Account:

Establish a Dake Administrator role in IAM and in Lake Formation
Create a Glue Crawler role that has permission to access S3, and attach the Glue Service Role managed policy, and Lake Formation Data Admin policy. An example policy can be found in the supplementary file folder of the repo.
Create a Glue Catalog Permissions Policy that allows the producer and consumer accounts to access data in the central catalog account though lake formation tags. This ensures the data cannot be shared to these accounts unless permission is granted in Lake Formation though a tag. An example policy can be found in the supplementary file folder of the repo.

Producer Account:

Add a bucket policy to the sample data bucket to allow access to the glue crawler role and the Data Lake administrator role.

Central Catalog Account:

Run the Glue Crawler, tables should now appear in Lake Formation.
Assign tags to the database, tables, and columns based on required criteria.
Under Lake Formation Permissions, grant appropriate access to the Producer and Consumer accounts through Lake Formation Tags.

Producer and Consumer Account:

Create a resource link that points to the shared databases/tables from the central catalog account.
Create an S3 bucket to store Athena queries.

Using this Project

Folder Structure

The following outlines the folder structure of the project:

src
- accounts
  - central
  - consumer
  - producer
- ops

Developing with CDK Alpha Modules

CDK2 changes how alpha modules are used in project. For this project, the following additional modules must be installed with PIP in order for the CDK deployment to work properly: pip install aws-cdk.aws-glue-alpha

Once this command is run, you should be able to run the deployments properly within this project. Additionally, note that the there is a distinct import command for working with these alpha modules:

from aws_cdk import (
    Duration,
    Stack,
    aws_glue as glue,
    aws_glue_alpha as glue_alpha,
    aws_iam as iam,
    aws_lambda as lambda_
)

In the above imports, aws_glue_alpha is imported as glue_alpha and serves as a separate and distinct modules within the Python execution context. So, in the code, you use that particular module for your 'alpha' components. For example:

database = glue_alpha.Database(self, id='my_database_id',
                        database_name='producer-a-db'
)

Running The BluePrint

Download aws keys for 3 separate accounts into named AWS profiles on your local machine "producer", "central", and "consumer"
Create a python virtual environment, and install the dependencies in src/ops/requirements.txt, and activate this env
in the code structure go to the src/ops and run "deploy-all.sh"
login to the console and setup the SSO user that you downloaded credentials for to the "Data lake admin" list in the lakeformation section.
Add the SSO user that you downloaded credentials fore to the "Database Creators" list in lakeoformation section of the console.
Also do some first time setup for lakeformation to remove the "IAM-only" credential settings so that TBAC works properly. These changes should be prompted/alerted by the console when you enter the lakeformation UI
perform steps 4 - 6 on both the central and consumer accounts
run the post-setup python "python src/ops/post_setup.py"

Known issues with current blueprint setup

After all setup shown here is done, there are still some permission issues with the SSO user being unable to query the linked table in the consumer account.

Next Steps for blueprint development

All of the things that are done in the post_setup.py should be moved to CDK if possible
Setup for consumer & Producer account should be refactored to work as a service catalog product. This service catalog product should be maintained in the central account, then shared with the consumer & producer accounts. Adding new data sources to a producer or consumer could potentially be added as service actions to these products as well.

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
src		src
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements-dev.txt		requirements-dev.txt
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

AWS_DataMeshFoundations

Manual Steps to Create Data Mesh

Using this Project

Folder Structure

Developing with CDK Alpha Modules

Running The BluePrint

Known issues with current blueprint setup

Next Steps for blueprint development

About

Releases

Packages

Contributors 3

Languages

License

VerticalRelevance/AWS_DataMeshFoundations

Folders and files

Latest commit

History

Repository files navigation

AWS_DataMeshFoundations

Manual Steps to Create Data Mesh

Using this Project

Folder Structure

Developing with CDK Alpha Modules

Running The BluePrint

Known issues with current blueprint setup

Next Steps for blueprint development

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages