Improve logging when skipping interactive cluster grant in `create-catalogs-schemas` #2917

JCZuurmond · 2024-10-10T11:52:12Z

Improve logging when skipping grant a in create-catalogs-schemas

nfx

lgtm

github-actions · 2024-10-10T12:36:01Z

✅ 47/47 passed, 3 skipped, 1h36m11s total

* Added `google-cloud-storage` to known list ([#2827](#2827)). In this release, we have added the `google-cloud-storage` library, along with its various modules and sub-modules, to our project's known list in a JSON file. Additionally, we have included the `google-crc32c` and `google-resumable-media` libraries. These libraries provide functionalities such as content addressable storage, checksum calculation, and resumable media upload and download. This change is a partial resolution to issue [#1931](#1931), which is likely related to the integration or usage of these libraries in the project. Software engineers should take note of these additions and how they may impact the project's functionality. * Added `google-crc32c` to known list ([#2828](#2828)). With this commit, we have added the `google-crc32c` library to our system's known list, addressing part of issue [#1931](#1931). This addition enhances the overall functionality of the system by providing efficient and high-speed CRC32C computation when utilized. The `google-crc32c` library is known for its performance and reliability, and by incorporating it into our system, we aim to improve the efficiency and robustness of the CRC32C computation process. This enhancement is part of our ongoing efforts to optimize the system and ensure a more efficient experience for our end-users. With this change, users can expect faster and more reliable CRC32C computations in their applications. * Added `holidays` to known list ([#2906](#2906)). In this release, we have expanded the known list in our open-source library to include a new `holidays` category, aimed at supporting tracking of holidays for different countries, religions, and financial institutions. This category includes several subcategories, such as calendars, countries, deprecation, financial holidays, groups, helpers, holiday base, mixins, observed holiday base, registry, and utils. Each subcategory contains an empty list, allowing for future data storage related to holidays. This change partially resolves issue [#1931](#1931), and represents a significant step towards supporting a more comprehensive range of holiday tracking needs in our library. Software engineers may utilize this new feature to build applications that require tracking and management of various holidays and related data. * Added `htmlmin` to known list ([#2907](#2907)). In this update, we have added the `htmlmin` library to the `known.json` configuration file's list of known libraries. This addition enables the use and management of `htmlmin` and its components, including `htmlmin.command`, `htmlmin.decorator`, `htmlmin.escape`, `htmlmin.main`, `htmlmin.middleware`, `htmlmin.parser`, `htmlmin.python3html`, and `htmlmin.python3html.parser`. This change partially addresses issue [#1931](#1931), which may have been caused by the integration or usage of `htmlmin`. Software engineers can now utilize `htmlmin` and its features in their projects, thanks to this enhancement. * Document preparing external locations when creating catalogs ([#2915](#2915)). Databricks Labs' UCX tool has been updated to incorporate the preparation of external locations when creating catalogs during the upgrade to Unity Catalog (UC). This enhancement involves the addition of new documentation outlining how to physically separate data in storage within UC, adhering to Databricks' best practices. The `create-catalogs-schemas` command has been updated to create UC catalogs and schemas based on a mapping file, allowing users to reuse previously created external locations or establish new ones outside of UCX. For data separation, users can leverage external locations when using subpaths, providing flexibility in data management during the upgrade process. * Fixed `KeyError` from `assess_workflows` task ([#2919](#2919)). In this release, we have made significant improvements to error handling in our open-source library. We have fixed a KeyError in the `assess_workflows` task and modified the `_safe_infer_internal` and `_unsafe_infer_internal` methods to handle both `InferenceError` and `KeyError` during inference. When an error occurs, we now log the error message with the node and yield a `Uninferable` object. Additionally, we have updated the `do_infer_values` method of the `_LocalInferredValue` class to yield an iterator of iterables of `NodeNG` objects. We have added multiple unit tests for inferring values in Python code, including cases for handling externally defined values and their absence. These changes ensure that our library can handle errors more gracefully and provide more informative feedback during inference, making it more robust and easier to use in software engineering projects. * Fixed `OSError: [Errno 95]` bug in `assess_workflows` task by skipping GIT-sourced workflows from static code analysis ([#2924](#2924)). In this release, we have resolved the `OSError: [Errno 95]` bug in the `assess_workflows` task that occurred while performing static code analysis on GIT-sourced workflows. A new attribute `Source` has been introduced in the `jobs` module of the `databricks.sdk.service` package to identify the source of a notebook task. If the notebook task source is GIT, a new `DependencyProblem` is raised, indicating that notebooks in GIT should be analyzed using the `databricks labs ucx lint-local-code` CLI command. The `_register_notebook` method has been updated to check if the notebook task source is GIT and return an appropriate `DependencyProblem` message. This change enhances the reliability of the `assess_workflows` task by avoiding the aforementioned bug and provides a more informative message when notebooks are sourced from GIT. This change is part of our ongoing effort to improve the project's quality and reliability and benefits software engineers who adopt the project. * Fixed absolute path normalisation in source code analysis ([#2920](#2920)). In this release, we have addressed an issue with the Workspace API not supporting relative subpaths such as "/a/b/../c", which has been resolved by resolving workspace paths before calling the API. This fix is backward compatible and ensures the correct behavior of the source code analysis. Additionally, we have added integration tests and co-authored this commit with Eric Vergnaud and Serge Smertin. Furthermore, we have added a new test case that supports relative grand-parent paths in the dependency graph construction, utilizing a new `NotebookLoader` class. This loader is responsible for loading the notebook content and metadata given a path, and this new test case exercises the path resolution logic when a notebook depends on another notebook located two levels up in the directory hierarchy. These changes improve the robustness and reliability of the source code analysis in the presence of relative paths. * Fixed downloading wheel libraries from DBFS on mounted Azure Storage fail with access denied ([#2918](#2918)). In this release, we have introduced enhancements to the library's handling of registering and downloading wheel libraries from DBFS on mounted Azure Storage, addressing an issue that resulted in access denied errors. The changes include improved error handling with the addition of a `try-except` block to handle potential `BadRequest` exceptions and the inclusion of three new methods to register different types of libraries. The `_register_requirements_txt` method reads requirements files and registers each library specified in the file, logging a warning message for any references to other requirements or constraints files. The `_register_whl` method creates a temporary copy of the given wheel file in the local file system and registers it, while the `_register_egg` method checks the runtime version and yields a `DependencyProblem` if the version is greater than (14, 0). These changes simplify the code and enhance error handling while addressing the reported issues related to registering libraries. The changes are implemented in the `jobs.py` file located in the `databricks/labs/ucx/source_code` directory, which also includes the import of the `BadRequest` exception class from `databricks.sdk.errors`. * Fixed issue with migrating MANAGED hive_metastore table to UC ([#2892](#2892)). In this release, we have implemented changes to address the issue of migrating HMS (Hive Metastore) managed tables to UC (Unity Catalog) as EXTERNAL. Historically, deleting a managed table also removed the underlying data, leading to potential data loss and making the UC table unusable. The new approach provides options to mitigate these issues, including migrating as EXTERNAL or cloning the data to maintain integrity. These changes aim to prevent accidental data deletion, ensure data recoverability, and avoid inconsistencies when new data is added to either HMS or UC. We have introduced new class attributes, methods, and parameters in relevant modules such as `WorkspaceConfig`, `Table`, `migrate_tables`, and `install.py`. These modifications support the new migration strategies and allow for more flexibility in managing how tables are migrated and how data is handled. The upgrade process can be triggered using the `migrate-tables` UCX command or by running the table migration workflows deployed to the workspace. Thorough testing and documentation have been performed to minimize risks of data inconsistencies during migration. It is crucial to understand the implications of these changes and carefully consider the trade-offs before migrating managed tables to UC as EXTERNAL. * Improve creating UC catalogs ([#2898](#2898)). In this release, the process of creating Unity Catalog (UC) catalogs has been significantly improved with the resolution of several issues discovered during manual testing. The `databricks labs ucx create-ucx-catalog/create-catalogs-schemas` command has been updated to ensure a better user experience and enhance consistency. Changes include requesting the catalog location even if the catalog already exists, eliminating multiple loops over storage locations, and improving logging and matching storage locations. The code now includes new checks to avoid requesting a catalog's storage location if it already exists and updates the behavior of the `_create_catalog_validate` and `_validate_location` methods. Additionally, new unit tests have been added to verify these changes. Under the hood, a new method, `get_catalog`, has been introduced to the `WorkspaceClient` class, and several test functions, such as `test_create_ucx_catalog_skips_when_ucx_catalogs_exists` and `test_create_all_catalogs_schemas_creates_catalogs`, have been implemented to ensure the proper functioning of the updated command. This release addresses issue [#2879](#2879) and enhances the overall process of creating UC catalogs, making it more efficient and reliable. * Improve logging when skipping grant a in `create-catalogs-schemas` ([#2917](#2917)). In this release, the logging for skipping grants in the `_update_principal_acl` method of the `CatalogSchema` class has been improved. The code now logs a more detailed message when it cannot identify a UC grant for a specific grant object, indicating that the grant is a legacy grant that is not supported in UC, along with the grant's action type and associated object. This change provides more context for debugging and troubleshooting purposes. Additionally, the functionality of using a `DENY` grant instead of a `USAGE` grant for a specific principal and schema in the hive metastore has been introduced. The test case `test_catalog_schema_acl()` in the `test_catalog_schema.py` file has been updated to reflect this new behavior. A new test case `test_create_all_catalogs_schemas_logs_untranslatable_grant(caplog)` has also been added to verify the new logging behavior for skipping legacy grants that are not supported in UC. These changes improve the logging system and enhance the `CatalogSchema` class functionality in the open-source library. * Verify migration progress prerequisites during UCX catalog creation ([#2912](#2912)). In this update, a new method `verify()` has been added to the `verify_progress_tracking` object in the `workspace_context` object to verify the prerequisites for UCX catalog creation. The prerequisites include the existence of a UC metastore, a UCX catalog, and a successful `assessment` job run. If the assessment job is pending or running, the code will wait up to 1 hour for it to finish before considering the prerequisites unmet. This feature includes modifications to the `create-ucx-catalog` CLI command and adds unit tests. This resolves issue [#2816](#2816) and ensures that the migration progress prerequisites are met before creating the UCX catalog. The `VerifyProgressTracking` class has been added to the `databricks.labs.ucx.progress.install` module and is used in the `Application` class. The changes include a new `timeout` argument to specify the waiting time for pending or running assessment jobs. The commit also includes several new unit tests for the `VerifyProgressTracking` class and modifications to the `test_install.py` file in the `tests/unit/progress` directory. The code has been manually tested and meets the requirements.

…s` (#2933) Improve logging when skipping legacy grant in `create-catalogs-schemas` Follow up on #2917 Progresses #2932

* Added `imageio` to known list ([#2942](#2942)). In this release, we have added `imageio` to our library's known list, which includes all its modules, sub-modules, testing, and typing packages. This change addresses issue [#1931](#1931), which may have been caused by a dependency or compatibility issue. The `imageio` library offers I/O functionality for scientific imaging data, and its addition is expected to expand the library's supported formats and functionality. As a result, software engineers can leverage the enhanced capabilities to handle scientific imaging data more effectively. * Added `ipyflow-core` to known list ([#2945](#2945)). In this release, the project has expanded its capabilities by adding two open-source libraries to a known list contained in a JSON file. The first library, `ipyflow-core`, brings a range of modules for the data model, experimental features, frontend, kernel, patches, shell, slicing, tracing, types, and utils. The second library, `pyccolo`, offers fast and adaptable code transformation using abstract syntax trees, with functionalities including code rewriting, import hooks, syntax augmentation, and tracing, along with various utility functions. By incorporating these libraries into the project, we aim to enhance its overall efficiency and versatility, providing software engineers with access to a broader set of tools and capabilities. * Added `isodate` to known list ([#2946](#2946)). In this release, we have added the `isodate` package to our library's known package list, which resolves part of issue [#1931](#1931). The `isodate` package provides several modules for parsing and manipulating ISO 8601 dated strings, including `isodate`, `isodate.duration`, `isodate.isodates`, `isodate.isodatetime`, `isodate.isoduration`, `isodate.isoerror`, `isodate.isostrf`, `isodate.isotime`, `isodate.isotzinfo`, and `isodate.tzinfo`. This addition enhances our compatibility and integration with the `isodate` package in the larger system, enabling users to utilize the full functionality of the `isodate` package in their applications. * Experimental command for enabling HMS federation ([#2939](#2939)). In this release, we have introduced an experimental feature for enabling HMS (Hive Metastore) federation through a new `enable-hms-federation` command in the labs.yml file. This command, when enabled, will create a federated HMS catalog synced with the workspace HMS in a hierarchical manner, facilitating migration and integration of HMS models. Additionally, we have added an optional `enable_hms_federation` constructor argument to the `Locations` class in the locations.py file. Setting this flag to True enables a fallback mode for AWS resources to use HMS for data access. The `HiveMetastoreFederationEnabler` class is introduced with an `enable()` method to modify the workspace configuration and enable HMS federation. These changes aim to provide a more streamlined experience for users working with complex modeling systems, and careful testing and feedback are encouraged on this experimental feature. * Experimental support for HMS federation ([#2283](#2283)). In this release, we introduce experimental support for Hive Metastore (HMS) federation in our open-source library. A new `HiveMetastoreFederation` class has been implemented, enabling the registration of an internal HMS as a federated catalog. This class utilizes the `WorkspaceClient` object from the `databricks.sdk` library to create necessary connections and handles permissions for successful federation. Additionally, a new file `test_federation.py` has been added, containing unit tests to demonstrate the functionality of HMS federation, including the creation of federated catalogs and handling of existing connections. As this is an experimental feature, users should expect potential issues and are encouraged to provide feedback to help improve its functionality. * Fixed `InvalidParameterValue` failure for scanning jobs running on interactive clusters that got deleted ([#2935](#2935)). In this release, we have addressed an issue where an `InvalidParameterValue` error was not being handled properly during scanning jobs run on interactive clusters that were deleted. This error has now been added to the exceptions handled in the `_register_existing_cluster_id` and `_register_cluster_info` methods. These methods retrieve information about an existing cluster or its ID, and if the cluster is not found or an invalid parameter value is provided, they now yield a `DependencyProblem` object with an appropriate error message. This `DependencyProblem` object is used to indicate that there is a problem with the dependencies required for the job, preventing it from running successfully. By handling this error, the code ensures that the job can fail gracefully and provide a clear and informative error message to the user, avoiding any potential confusion or unexpected behavior. * Improve logging when skipping legacy grant in `create-catalogs-schemas` ([#2933](#2933)). In this update, the `create-catalogs-schemas` process has been improved with enhanced logging for skipped legacy grants. This change is a follow-up to previous issue [#2917](#2917) and progresses issue [#2932](#2932). The `_apply_from_legacy_table_acls` and `_update_principal_acl` methods now include more descriptive logging when a legacy grant is skipped, providing information about the type of grant being skipped and clarifying that it is not supported in the Unity Catalog. Additionally, a new method `get_interactive_cluster_grants` has been added to the `principal_acl` object, returning a list of grants specific to the interactive cluster. The `hive_acl` object is now autospec'd after the `principal_acl.get_interactive_cluster_grants` call. The `test_catalog_schema_acl` function has been updated to reflect these changes. New grants have been added to the `hive_grants` list, including grants for `user1` with `USE` action type on `hive_metastore` catalog and grants for `user2` with `USAGE` action type on `schema3` database. A new grant for `user4` with `DENY` action type on `schema3` database has also been added, but it is skipped in the logging due to it not being supported in UC. Skipped legacy grants for `DENY` action type on `catalog2` catalog and 'catalog2.schema2' database are also included in the commit. These updates improve the clarity and usefulness of the logs, making it easier for users to understand what is happening during the migration of grants to UC and ensuring that unsupported grants are not inadvertently included in the UC. * Notebook linting: ensure path-type is preserved during linting ([#2923](#2923)). In this release, we have enhanced the type safety of the `NotebookResolver` class in the `loaders.py` module by introducing a new type variable `PathT`. This change includes an update to the `_adjust_path` method, which ensures the preservation of the original file suffix when adding the ".py" suffix for Python notebooks. This addresses a potential issue where a `WorkspacePath` instance could be incorrectly converted to a generic `Path` instance, causing downstream errors. Although this change may potentially resolve issue [#2888](#2888), the reproduction steps for that issue were not provided in the commit message. It is important to note that while this change has been manually tested, it does not include any new unit tests, integration tests, or staging environment verification.

JCZuurmond added 2 commits October 10, 2024 13:45

Add DENY grant to create catalog schema

b44e5c8

Test for DENY grants being logged

6f04e79

JCZuurmond added the tech debt chores and design flaws label Oct 10, 2024

JCZuurmond self-assigned this Oct 10, 2024

JCZuurmond requested a review from a team as a code owner October 10, 2024 11:52

JCZuurmond had a problem deploying to account-admin October 10, 2024 11:52 — with GitHub Actions Error

nfx approved these changes Oct 10, 2024

View reviewed changes

Add assert on mock

219757d

JCZuurmond temporarily deployed to account-admin October 10, 2024 12:18 — with GitHub Actions Inactive

nfx merged commit 38a5df2 into main Oct 10, 2024
7 checks passed

nfx deleted the fix/improve-logging-when-applying-grant branch October 10, 2024 12:52

JCZuurmond restored the fix/improve-logging-when-applying-grant branch October 10, 2024 13:48

nfx mentioned this pull request Oct 10, 2024

Release v0.42.0 #2926

Merged

JCZuurmond changed the title ~~Improve logging when skipping grant a in create-catalogs-schemas~~ Improve logging when skipping interactive cluster grant in create-catalogs-schemas Oct 11, 2024

JCZuurmond mentioned this pull request Oct 11, 2024

Improve logging when skipping legacy grant in create-catalogs-schemas #2933

Merged

nfx pushed a commit that referenced this pull request Oct 11, 2024

Improve logging when skipping legacy grant in `create-catalogs-schema…

e95a206

…s` (#2933) Improve logging when skipping legacy grant in `create-catalogs-schemas` Follow up on #2917 Progresses #2932

nfx mentioned this pull request Oct 11, 2024

Release v0.43.0 #2948

Merged

JCZuurmond deleted the fix/improve-logging-when-applying-grant branch October 16, 2024 11:26

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve logging when skipping interactive cluster grant in `create-catalogs-schemas` #2917

Improve logging when skipping interactive cluster grant in `create-catalogs-schemas` #2917

JCZuurmond commented Oct 10, 2024

nfx left a comment

github-actions bot commented Oct 10, 2024

Improve logging when skipping interactive cluster grant in create-catalogs-schemas #2917

Improve logging when skipping interactive cluster grant in create-catalogs-schemas #2917

Conversation

JCZuurmond commented Oct 10, 2024

nfx left a comment

Choose a reason for hiding this comment

github-actions bot commented Oct 10, 2024

Improve logging when skipping interactive cluster grant in `create-catalogs-schemas` #2917

Improve logging when skipping interactive cluster grant in `create-catalogs-schemas` #2917