[BUG]: Column name comparison needs better handling for upper/lower case #3568

JoelMorton · 2025-01-23T20:02:37Z

Is there an existing issue for this?

I have searched the existing issues

Current Behavior

The "[UCX] migrate-data-reconciliation" workflow runs a comparison between hive and UC schemas and data. The column name comparison treats all UC column names as lowercase, even when the UC column contains uppercase characters. The source_column value has capitalized characters, and the target_column is all lowercase. That results in the job flagging this table with a schema mismatch, which is incorrect.

Example output from ucx.recon_results:
{
"is_matching": false,
"data": [
{
"source_column": "PA_Commodity_Description",
"source_datatype": "string",
"target_column": null,
"target_datatype": null,
"is_matching": false,
"notes": "Column is missing in target"
},
{
"source_column": null,
"source_datatype": null,
"target_column": "pa_commodity_description",
"target_datatype": "string",
"is_matching": false,
"notes": "Column is missing in source"
}
]

*the migrate-tables command correctly retains capitalization from hive when creating the table in UC. The issue is not with the actual table migration, just the table comparison workflow. Example column below.

describe hive_metastore.{schema}.pa_commodity_mapping:
PA_Commodity_ID int

describe lakehouse_dev.{schema}.pa_commodity_mapping:
PA_Commodity_ID int

Expected Behavior

The "[UCX] migrate-data-reconciliation" workflow should take into account capitalization.

Steps To Reproduce

Create hive table with upper case column name
Run migrate-tables ucx command
Run the "[UCX] migrate-data-reconciliation" workflow
Query ucx.recon_results, which should show a schema mismatch even though the column names and datatypes are correct

Cloud

Azure

Operating System

Windows

Version

latest via Databricks CLI

Relevant log output

JoelMorton added the needs-triage label Jan 23, 2025

github-project-automation bot added this to UCX Jan 23, 2025

github-project-automation bot moved this to Todo in UCX Jan 23, 2025

FastLee self-assigned this Jan 24, 2025

FastLee linked a pull request Jan 28, 2025 that will close this issue

Case sensitive/insensitive table validation #3580

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG]: Column name comparison needs better handling for upper/lower case #3568

[BUG]: Column name comparison needs better handling for upper/lower case #3568

JoelMorton commented Jan 23, 2025

[BUG]: Column name comparison needs better handling for upper/lower case #3568

[BUG]: Column name comparison needs better handling for upper/lower case #3568

Comments

JoelMorton commented Jan 23, 2025

Is there an existing issue for this?

Current Behavior

Expected Behavior

Steps To Reproduce

Cloud

Operating System

Version

Relevant log output