Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: create permissions on DB import #29802

Merged
merged 2 commits into from
Aug 6, 2024
Merged

Conversation

betodealmeida
Copy link
Member

SUMMARY

A few bugs related to catalogs and permissions:

  1. When a database is imported, catalog and schema permissions are not created. Ideally we should have the import functionality call the corresponding command (CreateDatabaseCommand in this case) to prevent duplicate logic (the import/export flow was created before the introduction of commands). For now this PR just adds the logic manually.
  2. When a database is updated, datasets and charts are not having their catalog_perms updated.
  3. Databricks has a weird behavior where get_default_catalog() ⊈ get_catalog_names(), so I fixed it.

BEFORE/AFTER SCREENSHOTS OR ANIMATED GIF

N/A

TESTING INSTRUCTIONS

Added and updated tests.

ADDITIONAL INFORMATION

  • Has associated issue:
  • Required feature flags:
  • Changes UI
  • Includes DB Migration (follow approval process in SIP-59)
    • Migration is atomic, supports rollback & is backwards-compatible
    • Confirm DB migration upgrade and downgrade tested
    • Runtime estimates and downtime expectations provided
  • Introduces new feature or API
  • Removes existing feature or API

@dosubot dosubot bot added the data:databases Related to database configurations and connections label Jul 31, 2024
@betodealmeida betodealmeida force-pushed the create-perms-db-import branch 3 times, most recently from ed14750 to 3ebe186 Compare August 1, 2024 00:00
@betodealmeida betodealmeida added the merge-if-green If approved and tests are green, please go ahead and merge it for me label Aug 2, 2024
pyproject.toml Outdated
@@ -236,7 +236,7 @@ legacy_tox_ini = """
# Remember to start celery workers to run celery tests, e.g.
# celery --app=superset.tasks.celery_app:app worker -Ofair -c 2
[testenv]
basepython = python3.10
basepython = python3.11
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to double check: if this version change were to affect any other tests or logic, it should fail the GH Actions for this PR as well, right?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I need to revert this, it was for testing locally. Sorry.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @betodealmeida I believe this should be good to be approved as soon as you revert this change 🙌

Comment on lines +452 to +454
catalogs = {catalog for (catalog,) in engine.execute("SHOW CATALOGS")}
if len(catalogs) == 1:
return catalogs.pop()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if SHOW CATALOGS returns only spark_catalog, it seems we're returning it. Do we want to return spark_catalog, or do we need to return hive_metastore instead (when there's only spark_catalog)? I'm asking it because I think there are issues with other metadata calls if we try to use spark_catalog when the catalog selected in the form is hive_metastore (or if UC is not enabled).

Also, if multi_catalog is disabled, do we want to rely on SHOW CATALOGS? Should we validate if multi_catalog is disabled first, and if so return the value of current_catalog() directly (which should be the catalog from the form)?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem is that we need to create the permissions regardless of the state of multi_catalog. For Databricks, the migration process will create permissions for:

  1. Whatever get_default_catalog() returns.
  2. All the catalogs from get_catalog_names different from the default catalog from (1).

For Databricks, this means we're creating 2 permissions when Unity Catalog is not enabled:

  1. [Databricks].[hive_metastore]
  2. [Databricks].[spark_catalog]

This in itself is already a problem for the admin — which would should they assign roles to? Either one is problematic, depending on the status of multi_catalog:

With multi_catalog disabled the catalog sent by API calls is null, so it gets replaced with the default catalog, hive_metastore. If the admin assign roles to [Databricks].[spark_catalog] people won't be able to access datasets, because of the name mismatch.

With multi_catalog enabled the only option in the dropdown is spark_catalog. Which means if the admin assigned roles to [Databricks].[hive_metastore] people won't be able to access datasets.

With the change in this PR, everything is consistent, because the default catalog is part of the list of catalogs.

@sadpandajoe
Copy link
Member

@supersetbot label 4.1

@github-actions github-actions bot added the v4.1 Label added by the release manager to track PRs to be included in the 4.1 branch label Aug 5, 2024
@betodealmeida betodealmeida force-pushed the create-perms-db-import branch from 3ebe186 to 3705e5d Compare August 5, 2024 19:50
@betodealmeida betodealmeida merged commit 61c0970 into master Aug 6, 2024
33 checks passed
sadpandajoe pushed a commit that referenced this pull request Aug 13, 2024
@rusackas rusackas deleted the create-perms-db-import branch September 27, 2024 20:54
@github-actions github-actions bot added 🍒 4.1.0 🏷️ bot A label used by `supersetbot` to keep track of which PR where auto-tagged with release labels labels Nov 14, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🏷️ bot A label used by `supersetbot` to keep track of which PR where auto-tagged with release labels data:databases Related to database configurations and connections merge-if-green If approved and tests are green, please go ahead and merge it for me size/L v4.1 Label added by the release manager to track PRs to be included in the 4.1 branch 🍒 4.1.0 🍒 4.1.1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants