-
Notifications
You must be signed in to change notification settings - Fork 162
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[590] Add Hudi HMS Catalog Sync Implementation #648
[590] Add Hudi HMS Catalog Sync Implementation #648
Conversation
c778a19
to
619672e
Compare
@@ -43,7 +43,7 @@ | |||
import org.apache.xtable.model.storage.CatalogType; | |||
import org.apache.xtable.model.storage.TableFormat; | |||
|
|||
public class HMSCatalogSyncClientTestBase { | |||
public class HMSCatalogSyncTestBase { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
renamed this class since this class used by not just test for HMS Client classes as well as partiton sync classes. This naming conventions also matches with GlueCatalogSyncTestBase
@@ -45,6 +45,7 @@ datasets: | |||
- sourceCatalogTableIdentifier: | |||
tableIdentifier: | |||
hierarchicalId: "source-database-1.source-1" | |||
partitionSpec: "cs_sold_date_sk:VALUE" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the source is not hudi, then this shouldn't be required right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We also need partitionSpec when target is HUDI, since during catalogSync we convert target table to source table before syncing to catalogs.
incubator-xtable/xtable-core/src/main/java/org/apache/xtable/conversion/ConversionController.java
Line 240 in 67946b9
conversionSourceProvider |
...le-hive-metastore/src/main/java/org/apache/xtable/hms/HMSCatalogPartitionSyncOperations.java
Outdated
Show resolved
Hide resolved
...le-hive-metastore/src/main/java/org/apache/xtable/hms/HMSCatalogPartitionSyncOperations.java
Outdated
Show resolved
Hide resolved
...le-hive-metastore/src/main/java/org/apache/xtable/hms/HMSCatalogPartitionSyncOperations.java
Outdated
Show resolved
Hide resolved
...le-hive-metastore/src/main/java/org/apache/xtable/hms/HMSCatalogPartitionSyncOperations.java
Outdated
Show resolved
Hide resolved
tableProperties.put(HUDI_METADATA_CONFIG, "true"); | ||
Map<String, String> sparkTableProperties = | ||
HudiCatalogTableUtils.getSparkTableProperties( | ||
partitionFields, "", hmsCatalogConfig.getSchemaLengthThreshold(), schema); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The sparkVersion is passed as an empty string, should it have a value?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sparkVersion is an optional metadata field. It's not actually used by spark while reading the table. Since. we don't have a clear way of fetching spark version, have set it to empty for now.
xtable-hive-metastore/src/main/java/org/apache/xtable/hms/table/HudiHMSCatalogTableBuilder.java
Outdated
Show resolved
Hide resolved
xtable-hive-metastore/src/main/java/org/apache/xtable/hms/table/HudiHMSCatalogTableBuilder.java
Outdated
Show resolved
Hide resolved
...hive-metastore/src/test/java/org/apache/xtable/hms/table/TestHudiHMSCatalogTableBuilder.java
Outdated
Show resolved
Hide resolved
...hive-metastore/src/test/java/org/apache/xtable/hms/table/TestHudiHMSCatalogTableBuilder.java
Outdated
Show resolved
Hide resolved
...e-core/src/main/java/org/apache/xtable/hudi/catalog/HudiCatalogTablePropertiesExtractor.java
Show resolved
Hide resolved
private CatalogPartitionSyncOperations mockHMSPartitionSyncOperations; | ||
|
||
void setupCommonMocks() { | ||
mockHMSPartitionSyncOperations = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still find the naming confusing here, why do we call the actual instance the mock here?
...ive-metastore/src/test/java/org/apache/xtable/hms/TestHMSCatalogPartitionSyncOperations.java
Outdated
Show resolved
Hide resolved
xtable-hive-metastore/src/test/java/org/apache/xtable/hms/TestHMSCatalogSyncClient.java
Outdated
Show resolved
Hide resolved
c3543fa
to
95f0329
Compare
CatalogPartitionSyncOperations hmsCatalogPartitionSyncOperations = | ||
new HMSCatalogPartitionSyncOperations(metaStoreClient, hmsCatalogConfig); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can avoid this line by passing
new HMSCatalogPartitionSyncOperations(metaStoreClient, hmsCatalogConfig)
inline. Is this called only during _init ? Then it should be part of the same function to avoid mis-use of getPartitionSyncTool
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this is only used from _init method now. I've moved it to the same function.
887aab8
to
2b4eef2
Compare
Important Read
What is the purpose of the pull request
Adds support for syncing Hudi table to HMS Catalog
Brief change log
Verify this pull request
This change added tests and can be verified as follows: