You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The "[UCX] migrate-data-reconciliation" workflow runs a comparison between hive and UC schemas and data. The column name comparison treats all UC column names as lowercase, even when the UC column contains uppercase characters. The source_column value has capitalized characters, and the target_column is all lowercase. That results in the job flagging this table with a schema mismatch, which is incorrect.
Example output from ucx.recon_results:
{
"is_matching": false,
"data": [
{
"source_column": "PA_Commodity_Description",
"source_datatype": "string",
"target_column": null,
"target_datatype": null,
"is_matching": false,
"notes": "Column is missing in target"
},
{
"source_column": null,
"source_datatype": null,
"target_column": "pa_commodity_description",
"target_datatype": "string",
"is_matching": false,
"notes": "Column is missing in source"
}
]
*the migrate-tables command correctly retains capitalization from hive when creating the table in UC. The issue is not with the actual table migration, just the table comparison workflow. Example column below.
describe hive_metastore.{schema}.pa_commodity_mapping: PA_Commodity_ID int
describe lakehouse_dev.{schema}.pa_commodity_mapping: PA_Commodity_ID int
Expected Behavior
The "[UCX] migrate-data-reconciliation" workflow should take into account capitalization.
Steps To Reproduce
Create hive table with upper case column name
Run migrate-tables ucx command
Run the "[UCX] migrate-data-reconciliation" workflow
Query ucx.recon_results, which should show a schema mismatch even though the column names and datatypes are correct
Cloud
Azure
Operating System
Windows
Version
latest via Databricks CLI
Relevant log output
The text was updated successfully, but these errors were encountered:
Is there an existing issue for this?
Current Behavior
The "[UCX] migrate-data-reconciliation" workflow runs a comparison between hive and UC schemas and data. The column name comparison treats all UC column names as lowercase, even when the UC column contains uppercase characters. The source_column value has capitalized characters, and the target_column is all lowercase. That results in the job flagging this table with a schema mismatch, which is incorrect.
Example output from ucx.recon_results:
{
"is_matching": false,
"data": [
{
"source_column": "PA_Commodity_Description",
"source_datatype": "string",
"target_column": null,
"target_datatype": null,
"is_matching": false,
"notes": "Column is missing in target"
},
{
"source_column": null,
"source_datatype": null,
"target_column": "pa_commodity_description",
"target_datatype": "string",
"is_matching": false,
"notes": "Column is missing in source"
}
]
*the migrate-tables command correctly retains capitalization from hive when creating the table in UC. The issue is not with the actual table migration, just the table comparison workflow. Example column below.
describe hive_metastore.{schema}.pa_commodity_mapping:
PA_Commodity_ID int
describe lakehouse_dev.{schema}.pa_commodity_mapping:
PA_Commodity_ID int
Expected Behavior
The "[UCX] migrate-data-reconciliation" workflow should take into account capitalization.
Steps To Reproduce
Cloud
Azure
Operating System
Windows
Version
latest via Databricks CLI
Relevant log output
The text was updated successfully, but these errors were encountered: