Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG]: Column name comparison needs better handling for upper/lower case #3568

Open
1 task done
JoelMorton opened this issue Jan 23, 2025 · 0 comments · May be fixed by #3580
Open
1 task done

[BUG]: Column name comparison needs better handling for upper/lower case #3568

JoelMorton opened this issue Jan 23, 2025 · 0 comments · May be fixed by #3580
Assignees

Comments

@JoelMorton
Copy link

Is there an existing issue for this?

  • I have searched the existing issues

Current Behavior

The "[UCX] migrate-data-reconciliation" workflow runs a comparison between hive and UC schemas and data. The column name comparison treats all UC column names as lowercase, even when the UC column contains uppercase characters. The source_column value has capitalized characters, and the target_column is all lowercase. That results in the job flagging this table with a schema mismatch, which is incorrect.

Example output from ucx.recon_results:
{
"is_matching": false,
"data": [
{
"source_column": "PA_Commodity_Description",
"source_datatype": "string",
"target_column": null,
"target_datatype": null,
"is_matching": false,
"notes": "Column is missing in target"
},
{
"source_column": null,
"source_datatype": null,
"target_column": "pa_commodity_description",
"target_datatype": "string",
"is_matching": false,
"notes": "Column is missing in source"
}
]

*the migrate-tables command correctly retains capitalization from hive when creating the table in UC. The issue is not with the actual table migration, just the table comparison workflow. Example column below.

describe hive_metastore.{schema}.pa_commodity_mapping:
PA_Commodity_ID int

describe lakehouse_dev.{schema}.pa_commodity_mapping:
PA_Commodity_ID int

Expected Behavior

The "[UCX] migrate-data-reconciliation" workflow should take into account capitalization.

Steps To Reproduce

  1. Create hive table with upper case column name
  2. Run migrate-tables ucx command
  3. Run the "[UCX] migrate-data-reconciliation" workflow
  4. Query ucx.recon_results, which should show a schema mismatch even though the column names and datatypes are correct

Cloud

Azure

Operating System

Windows

Version

latest via Databricks CLI

Relevant log output

@github-project-automation github-project-automation bot moved this to Todo in UCX Jan 23, 2025
@FastLee FastLee self-assigned this Jan 24, 2025
@FastLee FastLee linked a pull request Jan 28, 2025 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Todo
Development

Successfully merging a pull request may close this issue.

2 participants