Install a global init script to collect HMS lineage for a few weeks. #513

dipankarkush-db · 2023-10-26T22:09:52Z

Fixes #324

codecov · 2023-10-26T22:11:54Z

Codecov Report

Merging #513 (e13414e) into main (f9b6117) will increase coverage by 0.13%.
The diff coverage is 89.58%.

@@            Coverage Diff             @@
##             main     #513      +/-   ##
==========================================
+ Coverage   80.98%   81.12%   +0.13%     
==========================================
  Files          31       33       +2     
  Lines        3392     3438      +46     
  Branches      658      667       +9     
==========================================
+ Hits         2747     2789      +42     
- Misses        491      493       +2     
- Partials      154      156       +2

Files	Coverage Δ
...x/hive_metastore/hms_lineage_global_init_script.py	`100.00% <100.00%> (ø)`
src/databricks/labs/ucx/runtime.py	`46.34% <0.00%> (ø)`
.../databricks/labs/ucx/hive_metastore/hms_lineage.py	`92.30% <92.30%> (ø)`
src/databricks/labs/ucx/install.py	`81.85% <90.00%> (+0.29%)`	⬆️

Fixes #324

dipankarkush-db · 2023-10-27T01:31:43Z

Fixes #324

FastLee · 2023-10-30T15:16:56Z

src/databricks/labs/ucx/hive_metastore/hms_lineage.py

+        return base64.b64encode(_init_script_content.encode()).decode()
+
+    def _add_global_init_script(self):
+        self._ws.global_init_scripts.create(


Do we check for existing init scripts? Can we append to existing scripts? Is there a risk?

+1, it's an easy add

Hi @FastLee and @nfx - are you referring to adding the portion in an existing Global init scripts? If there are multiple global init script then which one to choose? Keeping it separate allows the user to disable it if needed.

Keep scripts separate, yes. But don't add lineage enabler if it's already there. Manual reproduction: run installer twice and verify only one script created by ucx

Hi @nfx - This is what the code does -
if spark config present and the init script is enabled - skips creating a new one
If spark config present but the init script is disabled - asks if the user wants to enable it and enables/leave as it is based on user response
If spark config is not present then creates a new enabled global init script.

tests/unit/hive_metastore/test_hms_lineage.py

src/databricks/labs/ucx/hive_metastore/hms_lineage.py

src/databricks/labs/ucx/runtime.py

nfx · 2023-10-30T19:56:33Z

src/databricks/labs/ucx/runtime.py

@@ -187,6 +188,14 @@ def crawl_mounts(cfg: WorkspaceConfig):
    mounts.inventorize_mounts()


+@task("assessment", depends_on=[setup_schema])
+def enable_hms_lineage(cfg: WorkspaceConfig):


It'll be a better fit if the init script is rolled out in the installer. It'll be simpler to support. Ask it in the installation flow as a question

added in install.py

nfx · 2023-10-30T19:57:07Z

src/databricks/labs/ucx/hive_metastore/hms_lineage.py

+        return base64.b64encode(_init_script_content.encode()).decode()
+
+    def _add_global_init_script(self):
+        self._ws.global_init_scripts.create(


+1, it's an easy add

Fixes #324

nfx

Don't mark review comments as resolved unless you have really addressed them

nfx · 2023-10-31T07:41:48Z

src/databricks/labs/ucx/hive_metastore/hms_lineage.py

+        return created_script
+
+    def add_spark_config_for_hms_lineage(self):
+        created_script = self._add_global_init_script()


Add a mandatory check for any global init script with "spark.databricks.dataLineage.enabled" string to skip the addition of this init script

nfx · 2023-10-31T07:42:24Z

src/databricks/labs/ucx/hive_metastore/hms_lineage.py

+        self._add_sql_wh_config()
+        return created_script.script_id
+
+    def _add_sql_wh_config(self):


Don't add empty methods, they confuse people while reading the code

Fixes #324

nfx · 2023-10-31T22:05:16Z

src/databricks/labs/ucx/hive_metastore/hms_lineage.py

+    def check_lineage_spark_config_exists(self) -> GlobalInitScriptDetailsWithContent:
+        for script in self._ws.global_init_scripts.list():
+            gscript = self._ws.global_init_scripts.get(script_id=script.script_id)
+            if gscript:


Nit: Too much level of nesting, rewrite with "if gscript is None: continue"

Took care of the same.

Fixes #324

FastLee · 2023-11-01T13:51:06Z

src/databricks/labs/ucx/hive_metastore/hms_lineage.py

+            if "spark.databricks.dataLineage.enabled" in base64.b64decode(gscript.script).decode("utf-8"):
+                return gscript
+
+    def _get_init_script_content(self):


Keep the init script content in a file.

Hi @FastLee - took care of this.

FastLee · 2023-11-01T13:52:17Z

src/databricks/labs/ucx/install.py

+    def _install_spark_config_for_hms_lineage(self):
+        if (
+            self._prompts
+            and self._question(


Check first if the script exists, before asking the question.

Hi @FastLee - took care of this.

Fixes #324

nfx · 2023-11-02T14:44:06Z

merging PR, two test failures are unrelated:
#540
#541

…513) Fixes #324

**Breaking changes** (existing installations need to reinstall UCX and re-run assessment jobs) * Switched local group migration component to rename groups instead of creating backup groups ([#450](#450)). * Mitigate permissions loss in Table ACLs by folding grants belonging to the same principal, object id and object type together ([#512](#512)). **New features** * Added support for the experimental Databricks CLI launcher ([#517](#517)). * Added support for external Hive Metastores including AWS Glue ([#400](#400)). * Added more views to assessment dashboard ([#474](#474)). * Added rate limit for creating backup group to increase stability ([#500](#500)). * Added deduplication for mount point list ([#569](#569)). * Added documentation to describe interaction with external Hive Metastores ([#473](#473)). * Added failure injection for job failure message propagation ([#591](#591)). * Added uniqueness in the new warehouse name to avoid conflicts on installation ([#542](#542)). * Added a global init script to collect Hive Metastore lineage ([#513](#513)). * Added retry set/update permissions when possible and assess the changes in the workspace ([#519](#519)). * Use `~/.ucx/state.json` to store the state of both dashboards and jobs ([#561](#561)). **Bug fixes** * Fixed handling for `OWN` table permissions ([#571](#571)). * Fixed handling of keys with and without values. ([#514](#514)). * Fixed integration test failures related to concurrent group delete ([#584](#584)). * Fixed issue with workspace listing process on None type `object_type` ([#481](#481)). * Fixed missing group entitlement migration bug ([#583](#583)). * Fixed entitlement application for account-level groups ([#529](#529)). * Fixed assessment throwing an error when the owner of an object is empty ([#485](#485)). * Fixed installer to migrate between different configuration file versions ([#596](#596)). * Fixed cluster policy crawler to be aware of deleted policies ([#486](#486)). * Improved error message for not null constraints violated ([#532](#532)). * Improved integration test resiliency ([#597](#597), [#594](#594), [#586](#586)). * Introduced Safer access to workspace objects' properties. ([#530](#530)). * Mitigated permissions loss in Table ACLs by running appliers with single thread ([#518](#518)). * Running apply permission task before assessment should display message ([#487](#487)). * Split integration tests from blocking the merge queue ([#496](#496)). * Support more than one dashboard per step ([#472](#472)). * Update databricks-sdk requirement from ~=0.11.0 to ~=0.12.0 ([#505](#505)). * Update databricks-sdk requirement from ~=0.12.0 to ~=0.13.0 ([#575](#575)).

Install a global init script to collect HMS lineage for a few weeks.

790ec92

Fixes #324

dipankarkush-db had a problem deploying to account-admin October 26, 2023 22:09 — with GitHub Actions Failure

Install a global init script to collect HMS lineage for a few weeks.

0a0e614

Fixes #324

dipankarkush-db had a problem deploying to account-admin October 26, 2023 22:12 — with GitHub Actions Failure

Install a global init script to collect HMS lineage for a few weeks.

bba445a

Fixes #324

dipankarkush-db had a problem deploying to account-admin October 27, 2023 00:47 — with GitHub Actions Failure

dipankarkush-db marked this pull request as ready for review October 27, 2023 00:50

dipankarkush-db requested a review from a team October 27, 2023 00:50

Install a global init script to collect HMS lineage for a few weeks.

43d8e12

Fixes #324

dipankarkush-db had a problem deploying to account-admin October 27, 2023 01:42 — with GitHub Actions Failure

Install a global init script to collect HMS lineage for a few weeks.

b3343f2

Fixes #324

dipankarkush-db had a problem deploying to account-admin October 27, 2023 01:44 — with GitHub Actions Failure

FastLee requested changes Oct 30, 2023

View reviewed changes

william-conti suggested changes Oct 30, 2023

View reviewed changes

tests/unit/hive_metastore/test_hms_lineage.py Outdated Show resolved Hide resolved

src/databricks/labs/ucx/hive_metastore/hms_lineage.py Outdated Show resolved Hide resolved

src/databricks/labs/ucx/runtime.py Outdated Show resolved Hide resolved

nfx requested changes Oct 30, 2023

View reviewed changes

Added spark config for hms lineage enablement.

6117de8

Fixes #324

dipankarkush-db had a problem deploying to account-admin October 30, 2023 23:18 — with GitHub Actions Failure

nfx requested changes Oct 31, 2023

View reviewed changes

dipankarkush-db added 3 commits October 31, 2023 07:09

Merge branch 'main' into feature/add-hms-lineage

316e9c3

Added spark config for hms lineage enablement.

08b13db

Fixes #324

Added spark config for hms lineage enablement.

5009887

Fixes #324

dipankarkush-db temporarily deployed to account-admin October 31, 2023 16:44 — with GitHub Actions Inactive

nfx approved these changes Oct 31, 2023

View reviewed changes

dipankarkush-db added 2 commits October 31, 2023 18:34

Merge branch 'main' into feature/add-hms-lineage

d1be8da

Added spark config for hms lineage enablement.

aea2fec

Fixes #324

dipankarkush-db had a problem deploying to account-admin October 31, 2023 22:55 — with GitHub Actions Failure

Added spark config for hms lineage enablement.

5a87ae9

Fixes #324

dipankarkush-db had a problem deploying to account-admin October 31, 2023 23:18 — with GitHub Actions Failure

Added spark config for hms lineage enablement.

2b1ba7a

Fixes #324

dipankarkush-db had a problem deploying to account-admin October 31, 2023 23:47 — with GitHub Actions Failure

dipankarkush-db had a problem deploying to account-admin November 1, 2023 10:58 — with GitHub Actions Failure

dipankarkush-db requested review from FastLee and william-conti November 1, 2023 11:00

dipankarkush-db had a problem deploying to account-admin November 1, 2023 11:54 — with GitHub Actions Failure

FastLee requested changes Nov 1, 2023

View reviewed changes

Added spark config for hms lineage enablement.

be428a5

Fixes #324

dipankarkush-db had a problem deploying to account-admin November 1, 2023 18:50 — with GitHub Actions Failure

dipankarkush-db requested a review from FastLee November 1, 2023 20:41

Added spark config for hms lineage enablement.

101d669

Fixes #324

dipankarkush-db temporarily deployed to account-admin November 1, 2023 21:05 — with GitHub Actions Inactive

Added spark config for hms lineage enablement.

bfe0fbf

Fixes #324

dipankarkush-db had a problem deploying to account-admin November 1, 2023 21:29 — with GitHub Actions Failure

Added spark config for hms lineage enablement.

e13414e

Fixes #324

dipankarkush-db had a problem deploying to account-admin November 1, 2023 23:00 — with GitHub Actions Failure

dipankarkush-db had a problem deploying to account-admin November 2, 2023 13:20 — with GitHub Actions Failure

FastLee approved these changes Nov 2, 2023

View reviewed changes

nfx added the ready to merge label Nov 2, 2023

nfx merged commit e13a399 into main Nov 2, 2023

nfx deleted the feature/add-hms-lineage branch November 2, 2023 14:44

FastLee pushed a commit that referenced this pull request Nov 8, 2023

Install a global init script to collect HMS lineage for a few weeks. (#…

46097ab

…513) Fixes #324

nfx mentioned this pull request Nov 17, 2023

Release v0.6.0 #598

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Install a global init script to collect HMS lineage for a few weeks. #513

Install a global init script to collect HMS lineage for a few weeks. #513

dipankarkush-db commented Oct 26, 2023

codecov bot commented Oct 26, 2023 •

edited

Loading

dipankarkush-db commented Oct 27, 2023

FastLee Oct 30, 2023

nfx Oct 30, 2023

dipankarkush-db Oct 30, 2023 •

edited

Loading

nfx Oct 31, 2023

dipankarkush-db Oct 31, 2023

nfx Oct 30, 2023

dipankarkush-db Oct 31, 2023

nfx Oct 30, 2023

nfx left a comment

nfx Oct 31, 2023

dipankarkush-db Oct 31, 2023

nfx Oct 31, 2023

dipankarkush-db Oct 31, 2023

nfx Oct 31, 2023

dipankarkush-db Oct 31, 2023

FastLee Nov 1, 2023

dipankarkush-db Nov 1, 2023

FastLee Nov 1, 2023

dipankarkush-db Nov 1, 2023

nfx commented Nov 2, 2023

Install a global init script to collect HMS lineage for a few weeks. #513

Install a global init script to collect HMS lineage for a few weeks. #513

Conversation

dipankarkush-db commented Oct 26, 2023

codecov bot commented Oct 26, 2023 • edited Loading

Codecov Report

dipankarkush-db commented Oct 27, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dipankarkush-db Oct 30, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

nfx left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

nfx commented Nov 2, 2023

codecov bot commented Oct 26, 2023 •

edited

Loading

dipankarkush-db Oct 30, 2023 •

edited

Loading