From 18bf58e46718795b8b00334a4b0cd3741b71dee8 Mon Sep 17 00:00:00 2001 From: FastLee Date: Wed, 18 Oct 2023 09:08:06 -0400 Subject: [PATCH 1/6] Created External HMS Doc. --- docs/external_hms.md | 0 1 file changed, 0 insertions(+), 0 deletions(-) create mode 100644 docs/external_hms.md diff --git a/docs/external_hms.md b/docs/external_hms.md new file mode 100644 index 0000000000..e69de29bb2 From 6bd457f19a343826d4b2a84a4bad7cb32ae14bab Mon Sep 17 00:00:00 2001 From: FastLee Date: Wed, 18 Oct 2023 09:16:05 -0400 Subject: [PATCH 2/6] Created External Location Design --- docs/external_hms.md | 37 +++++++++++++++++++++++++++++++++++++ 1 file changed, 37 insertions(+) diff --git a/docs/external_hms.md b/docs/external_hms.md index e69de29bb2..d25a6186ec 100644 --- a/docs/external_hms.md +++ b/docs/external_hms.md @@ -0,0 +1,37 @@ +# External HMS Integration +### TL;DR +The UCX toolkit by default relies on the internal workspace HMS as a source for tables and views. +
Many DB users utilize an external HMS instead of the Workspace HMS provided by DB. +
A popular external HMS is Amazon Glue. +
This document describes the considerations for UCX integration with external HMS. + +### Current Considerations +- Integration with external HMS is set up on individual clusters. +- Theoretically we can integrate separate clusters in a workspace with different HMS repositories. +- In reality most customers use a single (internal or external) HMS within a workspace. +- When migrating from an external HMS we have to consider that it is used by more than one workspace. +- Integration with external HMS has to be set on all DB Warehouses together. +- HMS connectivity is set, usually, on cluster policy. +- Typically external HMS setup relies on: + - Spark Config + - Instance Profiles + - Init scripts + +### Design Decisions +- We should set up a single HMS for UCX +- We should suggest copying the setup from an existing Cluster/Cluster policy +- We shouldn't override the set up for the DB Warehouses (that may break functionality) +- We should allow overriding cluster settings and instance profile setting to accommodate novel settings. + +### Challenges +- We cannot support multiple HMS +- Using an external HMS to persist UCX tables will break functionality for a second workspace using UCX +- We should consider using a pattern similar to our integration testing to rename the target database to allow persisting from multiple workspaces. For example WS1 --> UCX_ABC, WS2 --> UCX_DEF. + +### Suggested flow +1. Start the installer. +2. The installer looks for use of external HMS by the workspace. We review cluster policies or DBSQL warehouses settings. +3. We alert the user that an external HMS is set and request ask a YES/NO to set external HMS. +4. We alert the user if they opted for external HMS and the DB Warehouses are not set for external HMS +5. We update the configuration file with the HMS settings. +6. We set the job clusters with the required External HMS settings. From 5ed453f690479cfcd0bbac7f598d3c091f88fdf4 Mon Sep 17 00:00:00 2001 From: FastLee Date: Thu, 19 Oct 2023 05:30:21 -0400 Subject: [PATCH 3/6] Added some context --- docs/external_hms.md | 1 + 1 file changed, 1 insertion(+) diff --git a/docs/external_hms.md b/docs/external_hms.md index d25a6186ec..d758e50268 100644 --- a/docs/external_hms.md +++ b/docs/external_hms.md @@ -27,6 +27,7 @@ The UCX toolkit by default relies on the internal workspace HMS as a source for - We cannot support multiple HMS - Using an external HMS to persist UCX tables will break functionality for a second workspace using UCX - We should consider using a pattern similar to our integration testing to rename the target database to allow persisting from multiple workspaces. For example WS1 --> UCX_ABC, WS2 --> UCX_DEF. +- With external HMS it is likely that some of the tables will not be accessible by some of the workspaces. We may need to migrate certain databases from certain workspaces. ### Suggested flow 1. Start the installer. From 2368fc87e1b86bb98c32898afea4ad53c853f474 Mon Sep 17 00:00:00 2001 From: Serge Smertin <259697+nfx@users.noreply.github.com> Date: Sat, 28 Oct 2023 16:27:45 +0200 Subject: [PATCH 4/6] Update external_hms.md --- docs/external_hms.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/external_hms.md b/docs/external_hms.md index d758e50268..5f3bd5bbf1 100644 --- a/docs/external_hms.md +++ b/docs/external_hms.md @@ -11,7 +11,7 @@ The UCX toolkit by default relies on the internal workspace HMS as a source for - In reality most customers use a single (internal or external) HMS within a workspace. - When migrating from an external HMS we have to consider that it is used by more than one workspace. - Integration with external HMS has to be set on all DB Warehouses together. -- HMS connectivity is set, usually, on cluster policy. +- HMS connectivity is set, usually, on cluster policy. As well as global SQL Warehouse config - Typically external HMS setup relies on: - Spark Config - Instance Profiles From ee91dbe8409661ea83593e4d4af55c29a2f2a488 Mon Sep 17 00:00:00 2001 From: Serge Smertin <259697+nfx@users.noreply.github.com> Date: Sat, 28 Oct 2023 16:27:54 +0200 Subject: [PATCH 5/6] Update external_hms.md --- docs/external_hms.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/external_hms.md b/docs/external_hms.md index 5f3bd5bbf1..3449bf316e 100644 --- a/docs/external_hms.md +++ b/docs/external_hms.md @@ -18,7 +18,7 @@ The UCX toolkit by default relies on the internal workspace HMS as a source for - Init scripts ### Design Decisions -- We should set up a single HMS for UCX +- Should we set up a single HMS for UCX? - We should suggest copying the setup from an existing Cluster/Cluster policy - We shouldn't override the set up for the DB Warehouses (that may break functionality) - We should allow overriding cluster settings and instance profile setting to accommodate novel settings. From 7b332c4b5cff3ba75c7c7f36385fb2969096d7fb Mon Sep 17 00:00:00 2001 From: Serge Smertin <259697+nfx@users.noreply.github.com> Date: Sat, 28 Oct 2023 16:28:01 +0200 Subject: [PATCH 6/6] Update external_hms.md --- docs/external_hms.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/external_hms.md b/docs/external_hms.md index 3449bf316e..4a8127695d 100644 --- a/docs/external_hms.md +++ b/docs/external_hms.md @@ -19,7 +19,7 @@ The UCX toolkit by default relies on the internal workspace HMS as a source for ### Design Decisions - Should we set up a single HMS for UCX? -- We should suggest copying the setup from an existing Cluster/Cluster policy +- Should we suggest copying the setup from an existing Cluster/Cluster policy? - We shouldn't override the set up for the DB Warehouses (that may break functionality) - We should allow overriding cluster settings and instance profile setting to accommodate novel settings.