Skip to content

v0.14.0

Compare
Choose a tag to compare
@nfx nfx released this 04 Mar 14:29
· 1250 commits to main since this release
1a60a8d
  • Added upgraded_from_workspace_id property to migrated tables to indicated the source workspace (#987). In this release, updates have been made to the _migrate_external_table, _migrate_dbfs_root_table, and _migrate_view methods in the table_migrate.py file to include a new parameter upgraded_from_ws in the SQL commands used to alter tables, views, or managed tables. This parameter is used to store the source workspace ID in the migrated tables, indicating the migration origin. A new utility method sql_alter_from has been added to the Table class in tables.py to generate the SQL command with the new parameter. Additionally, a new class-level attribute UPGRADED_FROM_WS_PARAM has been added to the Table class in tables.py to indicate the source workspace. A new property upgraded_from_workspace_id has been added to migrated tables to store the source workspace ID. These changes resolve issue #899 and are tested through manual testing, unit tests, and integration tests. No new CLI commands, workflows, or tables have been added or modified, and there are no changes to user documentation.
  • Added a command to create account level groups if they do not exist (#763). This commit introduces a new feature that enables the creation of account-level groups if they do not already exist in the account. A new command, create-account-groups, has been added to the databricks labs ucx tool, which crawls all workspaces in the account and creates account-level groups if a corresponding workspace-local group is not found. The feature supports various scenarios, including creating account-level groups that exist in some workspaces but not in others, and creating multiple account-level groups with the same name but different members. Several new methods have been added to the account.py file to support the new feature, and the test_account.py file has been updated with new tests to ensure the correct behavior of the create_account_level_groups method. Additionally, the cli.py file has been updated to include the new create-account-groups command. With these changes, users can easily manage account-level groups and ensure that they are consistent across all workspaces in the account, improving the overall user experience.
  • Added assessment for the incompatible RunSubmit API usages (#849). In this release, the assessment functionality for incompatible RunSubmit API usages has been significantly enhanced through various changes. The 'clusters.py' file has seen improvements in clarity and consistency with the renaming of private methods check_spark_conf to _check_spark_conf and check_cluster_failures to _check_cluster_failures. The _assess_clusters method has been updated to call the renamed _check_cluster_failures method for thorough checks of cluster configurations, resulting in better assessment functionality. A new SubmitRunsCrawler class has been added to the databricks.labs.ucx.assessment.jobs module, implementing CrawlerBase, JobsMixin, and CheckClusterMixin classes. This class crawls and assesses job runs based on their submitted runs, ensuring compatibility and identifying failure issues. Additionally, a new configuration attribute, num_days_submit_runs_history, has been introduced in the WorkspaceConfig class of the config.py module, controlling the number of days for which submission history of RunSubmit API calls is retained. Lastly, various new JSON files have been added for unit testing, assessing the RunSubmit API usages related to different scenarios like dbt task runs, Git source-based job runs, JAR file runs, and more. These tests will aid in identifying and addressing potential compatibility issues with the RunSubmit API.
  • Added group members difference to the output of validate-groups-membership cli command (#995). The validate-groups-membership command has been updated to include a comparison of group memberships at both the account and workspace levels. This enhancement is implemented through the validate_group_membership function, which has been updated to calculate the difference in members between the two levels and display it in a new group_members_difference column. This allows for a more detailed analysis of group memberships and easily identifies any discrepancies between the account and workspace levels. The corresponding unit test file, "test_groups.py," has been updated to include a new test case that verifies the calculation of the group_members_difference value. The functionality of the other commands remains unchanged. The new group_members_difference value is calculated as the difference in the number of members in the workspace group and the account group, with a positive value indicating more members in the workspace group and a negative value indicating more members in the account group. The table template in the labs.yml file has also been updated to include the new column for the group membership difference.
  • Added handling for empty directory_id if managed identity encountered during the crawling of StoragePermissionMapping (#986). This PR adds a type field to the StoragePermissionMapping and Principal dataclasses to differentiate between service principals and managed identities, allowing None for the directory_id field if the principal is not a service principal. During the migration to UC storage credentials, managed identities are currently ignored. These changes improve handling of managed identities during the crawling of StoragePermissionMapping, prevent errors when creating storage credentials with managed identities, and address issue #339. The changes are tested through unit tests, manual testing, and integration tests, and only affect the StoragePermissionMapping class and related methods, without introducing new commands, workflows, or tables.
  • Added migration for Azure Service Principals with secrets stored in Databricks Secret to UC Storage Credentials (#874). In this release, we have made significant updates to migrate Azure Service Principals with their secrets stored in Databricks Secret to UC Storage Credentials, enhancing security and management of storage access. The changes include: Addition of a new migrate_credentials command in the labs.yml file to migrate credentials for storage access to UC storage credential. Modification of secrets.py to handle the case where a secret has been removed from the backend and to log warning messages for secrets with invalid Base64 bytes. Introduction of the StorageCredentialManager and ServicePrincipalMigration classes in credentials.py to manage Azure Service Principals and their associated client secrets, and to migrate them to UC Storage Credentials. Addition of a new directory_id attribute in the Principal class and its associated dataclass in resources.py to store the directory ID for creating UC storage credentials using a service principal. Creation of a new pytest fixture, make_storage_credential_spn, in fixtures.py to simplify writing tests requiring Databricks Storage Credentials with Azure Service Principal auth. Addition of a new test file for the Azure integration of the project, including new classes, methods, and test cases for testing the migration of Azure Service Principals to UC Storage Credentials. These improvements will ensure better security and management of storage access using Azure Service Principals, while providing more efficient and robust testing capabilities.
  • Added permission migration support for feature tables and the root permissions for models and feature tables (#997). This commit introduces support for migration of permissions related to feature tables and sets root permissions for models and feature tables. New functions such as feature_store_listing, feature_tables_root_page, models_root_page, and tokens_and_passwords have been added to facilitate population of a workspace access page with necessary permissions information. The factory function in manager.py has been updated to include new listings for models' root page, feature tables' root page, and the feature store for enhanced management and access control of models and feature tables. New classes and methods have been implemented to handle permissions for these resources, utilizing GenericPermissionsSupport, AccessControlRequest, and MigratedGroup classes. Additionally, new test methods have been included to verify feature tables listing functionality and root page listing functionality for feature tables and registered models. The test manager method has been updated to include feature-tables in the list of items to be checked for permissions, ensuring comprehensive testing of permission functionality related to these new feature tables.
  • Added support for serving endpoints (#990). In this release, we have made significant enhancements to support serving endpoints in our open-source library. The fixtures.py file in the databricks.labs.ucx.mixins module has been updated with new classes and functions to create and manage serving endpoints, accompanied by integration tests to verify their functionality. We have added a new listing for serving endpoints in the assessment's permissions crawling, using the ws.serving_endpoints.list function and the serving-endpoints category. A new integration test, "test_endpoints," has been added to verify that assessments now crawl permissions for serving endpoints. This test demonstrates the ability to migrate permissions from one group to another. The test suite has been updated to ensure the proper functioning of the new feature and improve the assessment of permissions for serving endpoints, ensuring compatibility with the updated test_manager.py file.
  • Expanded end-user documentation with detailed descriptions for workflows and commands (#999). The Databricks Labs UCX project has been updated with several new features to assist in upgrading to Unity Catalog, including an assessment workflow that generates a detailed compatibility report for workspace entities, a group migration workflow for upgrading all Databricks workspace assets, and utility commands for managing cross-workspace installations. The Assessment Report now includes a more detailed summary of the assessment findings, table counts, database summaries, and external locations. Additional improvements include expanded workspace group migration to handle potential conflicts with locally scoped group names, enhanced documentation for external Hive Metastore integration, a new debugging notebook, and detailed descriptions of table upgrade considerations, data access permissions, external storage, and table crawler.
  • Fixed config.yml upgrade from very old versions (#984). In this release, we've introduced enhancements to the configuration upgrading process for config.yml in our open-source library. We've replaced the previous v1_migrate class method with a new implementation that specifically handles migration from version 1. The new method retrieves the groups field, extracts the selected value, and assigns it to the include_group_names key in the configuration. The backup_group_prefix value from the groups field is assigned to the renamed_group_prefix key, and the groups field is removed, with the version number updated to 2. These changes simplify the code and improve readability, enabling users to upgrade smoothly from version 1 of the configuration. Furthermore, we've added new unit tests to the test_config.py file to ensure backward compatibility. Two new tests, test_v1_migrate_zeroconf and test_v1_migrate_some_conf, have been added, utilizing the MockInstallation class and loading the configuration using WorkspaceConfig. These tests enhance the robustness and reliability of the migration process for config.yml.
  • Renamed columns in assessment SQL queries to use actual names, not aliases (#983). In this update, we have resolved an issue where aliases used for column references in SQL queries caused errors in certain setups by renaming them to use actual names. Specifically, for assessment SQL queries, we have modified the definition of the is_delta column to use the actual table_format name instead of the alias format. This change improves compatibility and enhances the reliability of query execution. As a software engineer, you will appreciate that this modification ensures consistent interpretation of column references across various setups, thereby avoiding potential errors caused by aliases. This change does not introduce any new methods, but instead modifies existing functionality to use actual column names, ensuring a more reliable and consistent SQL query for the 05_0_all_tables assessment.
  • Updated groups permissions validation to use Table ACL cluster (#979). In this update, the validate_groups_permissions task has been modified to utilize the Table ACL cluster, as indicated by the inclusion of job_cluster="tacl". This task is responsible for ensuring that all crawled permissions are accurately applied to the destination groups by calling the permission_manager.apply_group_permissions method during the migration state. This modification enhances the validation of group permissions by performing it on the Table ACL cluster, potentially improving performance or functionality. If you are implementing this project, it is crucial to comprehend the consequences of this change on your permissions validation process and adjust your workflows appropriately.

Contributors: @nfx, @william-conti, @mwojtyczka, @FastLee, @qziyuan, @nkvuong, @larsgeorge-db