v0.14.0
- Added
upgraded_from_workspace_id
property to migrated tables to indicated the source workspace (#987). In this release, updates have been made to the_migrate_external_table
,_migrate_dbfs_root_table
, and_migrate_view
methods in thetable_migrate.py
file to include a new parameterupgraded_from_ws
in the SQL commands used to alter tables, views, or managed tables. This parameter is used to store the source workspace ID in the migrated tables, indicating the migration origin. A new utility methodsql_alter_from
has been added to theTable
class intables.py
to generate the SQL command with the new parameter. Additionally, a new class-level attributeUPGRADED_FROM_WS_PARAM
has been added to theTable
class intables.py
to indicate the source workspace. A new propertyupgraded_from_workspace_id
has been added to migrated tables to store the source workspace ID. These changes resolve issue #899 and are tested through manual testing, unit tests, and integration tests. No new CLI commands, workflows, or tables have been added or modified, and there are no changes to user documentation. - Added a command to create account level groups if they do not exist (#763). This commit introduces a new feature that enables the creation of account-level groups if they do not already exist in the account. A new command,
create-account-groups
, has been added to thedatabricks labs ucx
tool, which crawls all workspaces in the account and creates account-level groups if a corresponding workspace-local group is not found. The feature supports various scenarios, including creating account-level groups that exist in some workspaces but not in others, and creating multiple account-level groups with the same name but different members. Several new methods have been added to theaccount.py
file to support the new feature, and thetest_account.py
file has been updated with new tests to ensure the correct behavior of thecreate_account_level_groups
method. Additionally, thecli.py
file has been updated to include the newcreate-account-groups
command. With these changes, users can easily manage account-level groups and ensure that they are consistent across all workspaces in the account, improving the overall user experience. - Added assessment for the incompatible
RunSubmit
API usages (#849). In this release, the assessment functionality for incompatibleRunSubmit
API usages has been significantly enhanced through various changes. The 'clusters.py' file has seen improvements in clarity and consistency with the renaming of private methodscheck_spark_conf
to_check_spark_conf
andcheck_cluster_failures
to_check_cluster_failures
. The_assess_clusters
method has been updated to call the renamed_check_cluster_failures
method for thorough checks of cluster configurations, resulting in better assessment functionality. A newSubmitRunsCrawler
class has been added to thedatabricks.labs.ucx.assessment.jobs
module, implementingCrawlerBase
,JobsMixin
, andCheckClusterMixin
classes. This class crawls and assesses job runs based on their submitted runs, ensuring compatibility and identifying failure issues. Additionally, a new configuration attribute,num_days_submit_runs_history
, has been introduced in theWorkspaceConfig
class of theconfig.py
module, controlling the number of days for which submission history ofRunSubmit
API calls is retained. Lastly, various new JSON files have been added for unit testing, assessing theRunSubmit
API usages related to different scenarios like dbt task runs, Git source-based job runs, JAR file runs, and more. These tests will aid in identifying and addressing potential compatibility issues with theRunSubmit
API. - Added group members difference to the output of
validate-groups-membership
cli command (#995). Thevalidate-groups-membership
command has been updated to include a comparison of group memberships at both the account and workspace levels. This enhancement is implemented through thevalidate_group_membership
function, which has been updated to calculate the difference in members between the two levels and display it in a newgroup_members_difference
column. This allows for a more detailed analysis of group memberships and easily identifies any discrepancies between the account and workspace levels. The corresponding unit test file, "test_groups.py," has been updated to include a new test case that verifies the calculation of thegroup_members_difference
value. The functionality of the other commands remains unchanged. The newgroup_members_difference
value is calculated as the difference in the number of members in the workspace group and the account group, with a positive value indicating more members in the workspace group and a negative value indicating more members in the account group. The table template in the labs.yml file has also been updated to include the new column for the group membership difference. - Added handling for empty
directory_id
if managed identity encountered during the crawling of StoragePermissionMapping (#986). This PR adds atype
field to theStoragePermissionMapping
andPrincipal
dataclasses to differentiate between service principals and managed identities, allowingNone
for thedirectory_id
field if the principal is not a service principal. During the migration to UC storage credentials, managed identities are currently ignored. These changes improve handling of managed identities during the crawling ofStoragePermissionMapping
, prevent errors when creating storage credentials with managed identities, and address issue #339. The changes are tested through unit tests, manual testing, and integration tests, and only affect theStoragePermissionMapping
class and related methods, without introducing new commands, workflows, or tables. - Added migration for Azure Service Principals with secrets stored in Databricks Secret to UC Storage Credentials (#874). In this release, we have made significant updates to migrate Azure Service Principals with their secrets stored in Databricks Secret to UC Storage Credentials, enhancing security and management of storage access. The changes include: Addition of a new
migrate_credentials
command in thelabs.yml
file to migrate credentials for storage access to UC storage credential. Modification ofsecrets.py
to handle the case where a secret has been removed from the backend and to log warning messages for secrets with invalid Base64 bytes. Introduction of theStorageCredentialManager
andServicePrincipalMigration
classes incredentials.py
to manage Azure Service Principals and their associated client secrets, and to migrate them to UC Storage Credentials. Addition of a newdirectory_id
attribute in thePrincipal
class and its associated dataclass inresources.py
to store the directory ID for creating UC storage credentials using a service principal. Creation of a new pytest fixture,make_storage_credential_spn
, infixtures.py
to simplify writing tests requiring Databricks Storage Credentials with Azure Service Principal auth. Addition of a new test file for the Azure integration of the project, including new classes, methods, and test cases for testing the migration of Azure Service Principals to UC Storage Credentials. These improvements will ensure better security and management of storage access using Azure Service Principals, while providing more efficient and robust testing capabilities. - Added permission migration support for feature tables and the root permissions for models and feature tables (#997). This commit introduces support for migration of permissions related to feature tables and sets root permissions for models and feature tables. New functions such as
feature_store_listing
,feature_tables_root_page
,models_root_page
, andtokens_and_passwords
have been added to facilitate population of a workspace access page with necessary permissions information. Thefactory
function inmanager.py
has been updated to include new listings for models' root page, feature tables' root page, and the feature store for enhanced management and access control of models and feature tables. New classes and methods have been implemented to handle permissions for these resources, utilizingGenericPermissionsSupport
,AccessControlRequest
, andMigratedGroup
classes. Additionally, new test methods have been included to verify feature tables listing functionality and root page listing functionality for feature tables and registered models. The test manager method has been updated to includefeature-tables
in the list of items to be checked for permissions, ensuring comprehensive testing of permission functionality related to these new feature tables. - Added support for serving endpoints (#990). In this release, we have made significant enhancements to support serving endpoints in our open-source library. The
fixtures.py
file in thedatabricks.labs.ucx.mixins
module has been updated with new classes and functions to create and manage serving endpoints, accompanied by integration tests to verify their functionality. We have added a new listing for serving endpoints in the assessment's permissions crawling, using thews.serving_endpoints.list
function and theserving-endpoints
category. A new integration test, "test_endpoints," has been added to verify that assessments now crawl permissions for serving endpoints. This test demonstrates the ability to migrate permissions from one group to another. The test suite has been updated to ensure the proper functioning of the new feature and improve the assessment of permissions for serving endpoints, ensuring compatibility with the updatedtest_manager.py
file. - Expanded end-user documentation with detailed descriptions for workflows and commands (#999). The Databricks Labs UCX project has been updated with several new features to assist in upgrading to Unity Catalog, including an assessment workflow that generates a detailed compatibility report for workspace entities, a group migration workflow for upgrading all Databricks workspace assets, and utility commands for managing cross-workspace installations. The Assessment Report now includes a more detailed summary of the assessment findings, table counts, database summaries, and external locations. Additional improvements include expanded workspace group migration to handle potential conflicts with locally scoped group names, enhanced documentation for external Hive Metastore integration, a new debugging notebook, and detailed descriptions of table upgrade considerations, data access permissions, external storage, and table crawler.
- Fixed
config.yml
upgrade from very old versions (#984). In this release, we've introduced enhancements to the configuration upgrading process forconfig.yml
in our open-source library. We've replaced the previousv1_migrate
class method with a new implementation that specifically handles migration from version 1. The new method retrieves thegroups
field, extracts theselected
value, and assigns it to theinclude_group_names
key in the configuration. Thebackup_group_prefix
value from thegroups
field is assigned to therenamed_group_prefix
key, and thegroups
field is removed, with the version number updated to 2. These changes simplify the code and improve readability, enabling users to upgrade smoothly from version 1 of the configuration. Furthermore, we've added new unit tests to thetest_config.py
file to ensure backward compatibility. Two new tests,test_v1_migrate_zeroconf
andtest_v1_migrate_some_conf
, have been added, utilizing theMockInstallation
class and loading the configuration usingWorkspaceConfig
. These tests enhance the robustness and reliability of the migration process forconfig.yml
. - Renamed columns in assessment SQL queries to use actual names, not aliases (#983). In this update, we have resolved an issue where aliases used for column references in SQL queries caused errors in certain setups by renaming them to use actual names. Specifically, for assessment SQL queries, we have modified the definition of the
is_delta
column to use the actualtable_format
name instead of the aliasformat
. This change improves compatibility and enhances the reliability of query execution. As a software engineer, you will appreciate that this modification ensures consistent interpretation of column references across various setups, thereby avoiding potential errors caused by aliases. This change does not introduce any new methods, but instead modifies existing functionality to use actual column names, ensuring a more reliable and consistent SQL query for the05_0_all_tables
assessment. - Updated groups permissions validation to use Table ACL cluster (#979). In this update, the
validate_groups_permissions
task has been modified to utilize the Table ACL cluster, as indicated by the inclusion ofjob_cluster="tacl"
. This task is responsible for ensuring that all crawled permissions are accurately applied to the destination groups by calling thepermission_manager.apply_group_permissions
method during the migration state. This modification enhances the validation of group permissions by performing it on the Table ACL cluster, potentially improving performance or functionality. If you are implementing this project, it is crucial to comprehend the consequences of this change on your permissions validation process and adjust your workflows appropriately.
Contributors: @nfx, @william-conti, @mwojtyczka, @FastLee, @qziyuan, @nkvuong, @larsgeorge-db