Skip to content

Latest commit

 

History

History
44 lines (33 loc) · 1.81 KB

README.md

File metadata and controls

44 lines (33 loc) · 1.81 KB

Glue dbt Project

A demo project using dbt with AWS Glue.

Environment Setup

1. Set up Terraform infra (optional)

This project includes a Terraform module which can be used to provision the required Glue resources in any region / partition. The terraform-infra project creates the necessary Terraform infrastructure (s3 bucket, dynamodb table for locking, IAM roles).

Update the params/default.tfvars file for your region, partition, and account before running:

cd terraform/projects/terraform-infra
terraform apply -var-file params/default.tfvars

2. Set up Glue infra

The glue project can be used to create the roles, buckets, and permissions necessary to run AWS Glue with dbt.

Update the params/default.tfvars file and the main.tf backend configuration to use the backend created in step 1.

cd terraform/projects/glue
terraform apply -var-file params/default.tfvars

Then download and stage HUDI JAR files - you will need to update the bucket and region in this script to copy the JARs to your staging bucket:

sh hudi.sh

If you don't use Terraform to set up the Glue infrastructure, you'll need to set up S3 buckets for HUDI, data, and logs manually, and provision IAM roles as described in this documentation, with read/write access to the S3 buckets you created.

3. Run dbt

Set the AWS_REGION, GLUE_CLIENT_ROLE, DATA_S3_BUCKET, JAR_S3_BICKET, and LOGS_S3_BUCKET environment variables with the infrastructure set up in step 2.

Run dbt:

cd dbt
dbt run --profiles-dir ..

References