Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[590] Add Hudi Glue Catalog Sync Implementation #649

Merged
merged 1 commit into from
Feb 24, 2025

Conversation

vamsikarnika
Copy link

@vamsikarnika vamsikarnika commented Feb 13, 2025

Important Read

  • Please ensure the GitHub issue is mentioned at the beginning of the PR

What is the purpose of the pull request

Add Implementation for Hudi Glue Catalog Sync

Brief change log

  • Add Implementation for HudiGlueCatalogTableBuilder to support Glue Catalog Sync
  • Add Implementation for GlueCatalogPartitionSyncOperations to support syncing hudi partitions to Glue

Verify this pull request

This change added tests and can be verified as follows:

  • Added unit tests
  • Manually verified the change by running a job locally.

@vinishjail97
Copy link
Contributor

@vamsikarnika Can you rebase with latest master ? I will merge this tomorrow.

tableProperties.putAll(sparkTableProperties);
return tableProperties;
}

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved this code to HudiCatalogTablePropertiesExtractor since this was common to Glue as well

if (response.errors().stream()
.allMatch(
(error) ->
"AlreadyExistsException".equals(error.errorDetail().errorCode()))) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AlreadyExistsException - is there a constant or code you can import from the Glue SDK for this?

Copy link
Author

@vamsikarnika vamsikarnika Feb 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, I couldn't find any error code for this exceptions. Glue SDK is using the name itself as the error code.

Table table =
GlueCatalogTableUtils.getTable(
glueClient, glueCatalogConfig.getCatalogId(), tableIdentifier);
;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove this extra ;

Comment on lines -105 to -107
assertNotNull(table.getParameters());
assertFalse(table.getParameters().isEmpty());
assertEquals(table.getParameters().get(HUDI_METADATA_CONFIG), "true");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why remove the assertions in this test?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have moved the logic to fetch table properties to HudiCatalogTablePropertiesExtractor class which populate the fields we're asserting here. Have added the tests in TestHudiCatalogTablePropertiesExractor class to validate these.

@vinishjail97
Copy link
Contributor

@vamsikarnika Can you squash into a single commit ?

@vinishjail97 vinishjail97 merged commit 03e5b02 into apache:main Feb 24, 2025
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants