Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] kyuubi failed to read the iceberg of another cluster across domains #6890

Open
3 of 4 tasks
zxl-333 opened this issue Jan 14, 2025 · 13 comments
Open
3 of 4 tasks
Labels
kind:bug This is a clearly a bug priority:major

Comments

@zxl-333
Copy link

zxl-333 commented Jan 14, 2025

Code of Conduct

Search before asking

  • I have searched in the issues and found no similar issues.

Describe the bug

I configured spark-sql to read the iceberg table across domains, which is normal. However, when I read the iceberg table across domains using kyuubi, metastore failed to be connected

Affects Version(s)

1.8.2

Kyuubi Server Log Output

The kyuubi server logs are normal

Kyuubi Engine Log Output

2025-01-14 09:15:09.229 INFO KyuubiSessionManager-exec-pool: Thread-81 org.apache.kyuubi.operation.ExecuteStatement: Query[d698c059-a241-4e81-8ddc-fb4646a07f44] in RUNNING_STATE
25/01/14 09:15:11 INFO metastore: Trying to connect to metastore with URI thrift://bigdata-1734405115-0xt70:9083
25/01/14 09:15:11 WARN metastore: Failed to connect to the MetaStore Server...
25/01/14 09:15:11 INFO metastore: Trying to connect to metastore with URI thrift://bigdata-1734405115-lhalh:9083
25/01/14 09:15:11 WARN metastore: Failed to connect to the MetaStore Server...
25/01/14 09:15:11 INFO metastore: Waiting 3 seconds before next connection attempt.
2025-01-14 09:15:14.232 INFO KyuubiSessionManager-exec-pool: Thread-81 org.apache.kyuubi.operation.ExecuteStatement: Query[d698c059-a241-4e81-8ddc-fb4646a07f44] in RUNNING_STATE
25/01/14 09:15:14 INFO DAGScheduler: Asked to cancel job group d698c059-a241-4e81-8ddc-fb4646a07f44
25/01/14 09:15:14 ERROR ExecuteStatement: Error operating ExecuteStatement: org.apache.iceberg.hive.RuntimeMetaException: Failed to connect to Hive Metastore
	at org.apache.iceberg.hive.HiveClientPool.newClient(HiveClientPool.java:84)
	at org.apache.iceberg.hive.HiveClientPool.newClient(HiveClientPool.java:34)
	at org.apache.iceberg.ClientPoolImpl.get(ClientPoolImpl.java:125)
	at org.apache.iceberg.ClientPoolImpl.run(ClientPoolImpl.java:56)
	at org.apache.iceberg.ClientPoolImpl.run(ClientPoolImpl.java:51)
	at org.apache.iceberg.hive.CachedClientPool.run(CachedClientPool.java:122)
	at org.apache.iceberg.hive.HiveTableOperations.doRefresh(HiveTableOperations.java:158)
	at org.apache.iceberg.BaseMetastoreTableOperations.refresh(BaseMetastoreTableOperations.java:97)
	at org.apache.iceberg.BaseMetastoreTableOperations.current(BaseMetastoreTableOperations.java:80)
	at org.apache.iceberg.BaseMetastoreCatalog.loadTable(BaseMetastoreCatalog.java:47)
	at org.apache.iceberg.shaded.com.github.benmanes.caffeine.cache.BoundedLocalCache.lambda$doComputeIfAbsent$14(BoundedLocalCache.java:2406)
	at java.util.concurrent.ConcurrentHashMap.compute(ConcurrentHashMap.java:1853)
	at org.apache.iceberg.shaded.com.github.benmanes.caffeine.cache.BoundedLocalCache.doComputeIfAbsent(BoundedLocalCache.java:2404)
	at org.apache.iceberg.shaded.com.github.benmanes.caffeine.cache.BoundedLocalCache.computeIfAbsent(BoundedLocalCache.java:2387)
	at org.apache.iceberg.shaded.com.github.benmanes.caffeine.cache.LocalCache.computeIfAbsent(LocalCache.java:108)
	at org.apache.iceberg.shaded.com.github.benmanes.caffeine.cache.LocalManualCache.get(LocalManualCache.java:62)
	at org.apache.iceberg.CachingCatalog.loadTable(CachingCatalog.java:166)
	at org.apache.iceberg.spark.SparkCatalog.load(SparkCatalog.java:642)
	at org.apache.iceberg.spark.SparkCatalog.loadTable(SparkCatalog.java:160)
	at org.apache.spark.sql.connector.catalog.CatalogV2Util$.loadTable(CatalogV2Util.scala:311)
	at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$.$anonfun$lookupRelation$3(Analyzer.scala:1197)
	at scala.Option.orElse(Option.scala:447)
	at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$.$anonfun$lookupRelation$1(Analyzer.scala:1196)
	at scala.Option.orElse(Option.scala:447)
	at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$.org$apache$spark$sql$catalyst$analysis$Analyzer$ResolveRelations$$lookupRelation(Analyzer.scala:1188)
	at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$$anonfun$apply$13.applyOrElse(Analyzer.scala:1059)
	at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$$anonfun$apply$13.applyOrElse(Analyzer.scala:1023)
	at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.$anonfun$resolveOperatorsUpWithPruning$3(AnalysisHelper.scala:138)
	at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:176)
	at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.$anonfun$resolveOperatorsUpWithPruning$1(AnalysisHelper.scala:138)
	at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$.allowInvokingTransformsInAnalyzer(AnalysisHelper.scala:323)
	at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.resolveOperatorsUpWithPruning(AnalysisHelper.scala:134)
	at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.resolveOperatorsUpWithPruning$(AnalysisHelper.scala:130)
	at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.resolveOperatorsUpWithPruning(LogicalPlan.scala:30)
	at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.$anonfun$resolveOperatorsUpWithPruning$2(AnalysisHelper.scala:135)
	at org.apache.spark.sql.catalyst.trees.UnaryLike.mapChildren(TreeNode.scala:1228)
	at org.apache.spark.sql.catalyst.trees.UnaryLike.mapChildren$(TreeNode.scala:1227)
	at org.apache.spark.sql.catalyst.plans.logical.OrderPreservingUnaryNode.mapChildren(LogicalPlan.scala:208)
	at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.$anonfun$resolveOperatorsUpWithPruning$1(AnalysisHelper.scala:135)
	at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$.allowInvokingTransformsInAnalyzer(AnalysisHelper.scala:323)
	at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.resolveOperatorsUpWithPruning(AnalysisHelper.scala:134)
	at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.resolveOperatorsUpWithPruning$(AnalysisHelper.scala:130)
	at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.resolveOperatorsUpWithPruning(LogicalPlan.scala:30)
	at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$.apply(Analyzer.scala:1023)
	at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$.apply(Analyzer.scala:982)
	at org.apache.spark.sql.catalyst.rules.RuleExecutor.$anonfun$execute$2(RuleExecutor.scala:211)
	at scala.collection.LinearSeqOptimized.foldLeft(LinearSeqOptimized.scala:126)
	at scala.collection.LinearSeqOptimized.foldLeft$(LinearSeqOptimized.scala:122)
	at scala.collection.immutable.List.foldLeft(List.scala:91)
	at org.apache.spark.sql.catalyst.rules.RuleExecutor.$anonfun$execute$1(RuleExecutor.scala:208)
	at org.apache.spark.sql.catalyst.rules.RuleExecutor.$anonfun$execute$1$adapted(RuleExecutor.scala:200)
	at scala.collection.immutable.List.foreach(List.scala:431)
	at org.apache.spark.sql.catalyst.rules.RuleExecutor.execute(RuleExecutor.scala:200)
	at org.apache.spark.sql.catalyst.analysis.Analyzer.org$apache$spark$sql$catalyst$analysis$Analyzer$$executeSameContext(Analyzer.scala:231)
	at org.apache.spark.sql.catalyst.analysis.Analyzer.$anonfun$execute$1(Analyzer.scala:227)
	at org.apache.spark.sql.catalyst.analysis.AnalysisContext$.withNewAnalysisContext(Analyzer.scala:173)
	at org.apache.spark.sql.catalyst.analysis.Analyzer.execute(Analyzer.scala:227)
	at org.apache.spark.sql.catalyst.analysis.Analyzer.execute(Analyzer.scala:188)
	at org.apache.spark.sql.catalyst.rules.RuleExecutor.$anonfun$executeAndTrack$1(RuleExecutor.scala:179)
	at org.apache.spark.sql.catalyst.QueryPlanningTracker$.withTracker(QueryPlanningTracker.scala:88)
	at org.apache.spark.sql.catalyst.rules.RuleExecutor.executeAndTrack(RuleExecutor.scala:179)
	at org.apache.spark.sql.catalyst.analysis.Analyzer.$anonfun$executeAndCheck$1(Analyzer.scala:212)
	at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$.markInAnalyzer(AnalysisHelper.scala:330)
	at org.apache.spark.sql.catalyst.analysis.Analyzer.executeAndCheck(Analyzer.scala:211)
	at org.apache.spark.sql.execution.QueryExecution.$anonfun$analyzed$1(QueryExecution.scala:76)
	at org.apache.spark.sql.catalyst.QueryPlanningTracker.measurePhase(QueryPlanningTracker.scala:111)
	at org.apache.spark.sql.execution.QueryExecution.$anonfun$executePhase$2(QueryExecution.scala:185)
	at org.apache.spark.sql.execution.QueryExecution$.withInternalError(QueryExecution.scala:510)
	at org.apache.spark.sql.execution.QueryExecution.$anonfun$executePhase$1(QueryExecution.scala:185)
	at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:779)
	at org.apache.spark.sql.execution.QueryExecution.executePhase(QueryExecution.scala:184)
	at org.apache.spark.sql.execution.QueryExecution.analyzed$lzycompute(QueryExecution.scala:76)
	at org.apache.spark.sql.execution.QueryExecution.analyzed(QueryExecution.scala:74)
	at org.apache.spark.sql.execution.QueryExecution.assertAnalyzed(QueryExecution.scala:66)
	at org.apache.spark.sql.Dataset$.$anonfun$ofRows$2(Dataset.scala:98)
	at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:779)
	at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:96)
	at org.apache.spark.sql.SparkSession.$anonfun$sql$1(SparkSession.scala:622)
	at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:779)
	at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:617)
	at org.apache.kyuubi.engine.spark.operation.ExecuteStatement.$anonfun$executeStatement$1(ExecuteStatement.scala:86)
	at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
	at org.apache.kyuubi.engine.spark.operation.SparkOperation.$anonfun$withLocalProperties$1(SparkOperation.scala:147)
	at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:169)
	at org.apache.kyuubi.engine.spark.operation.SparkOperation.withLocalProperties(SparkOperation.scala:131)
	at org.apache.kyuubi.engine.spark.operation.ExecuteStatement.executeStatement(ExecuteStatement.scala:81)
	at org.apache.kyuubi.engine.spark.operation.ExecuteStatement$$anon$1.run(ExecuteStatement.scala:103)
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:750)
Caused by: java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.metastore.HiveMetaStoreClient
	at org.apache.hadoop.hive.metastore.MetaStoreUtils.newInstance(MetaStoreUtils.java:1742)
	at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.<init>(RetryingMetaStoreClient.java:83)
	at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:133)
	at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:104)
	at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:97)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at org.apache.iceberg.common.DynMethods$UnboundMethod.invokeChecked(DynMethods.java:60)
	at org.apache.iceberg.common.DynMethods$UnboundMethod.invoke(DynMethods.java:72)
	at org.apache.iceberg.common.DynMethods$StaticMethod.invoke(DynMethods.java:185)
	at org.apache.iceberg.hive.HiveClientPool.newClient(HiveClientPool.java:63)
	... 91 more
Caused by: java.lang.reflect.InvocationTargetException
	at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
	at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
	at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
	at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
	at org.apache.hadoop.hive.metastore.MetaStoreUtils.newInstance(MetaStoreUtils.java:1740)
	... 103 more
Caused by: MetaException(message:Could not connect to meta store using any of the URIs provided. Most recent failure: org.apache.thrift.transport.TTransportException: Peer indicated failure: DIGEST-MD5: IO error acquiring password
	at org.apache.thrift.transport.TSaslTransport.receiveSaslMessage(TSaslTransport.java:199)
	at org.apache.thrift.transport.TSaslTransport.open(TSaslTransport.java:277)
	at org.apache.thrift.transport.TSaslClientTransport.open(TSaslClientTransport.java:38)
	at org.apache.hadoop.hive.thrift.client.TUGIAssumingTransport$1.run(TUGIAssumingTransport.java:52)
	at org.apache.hadoop.hive.thrift.client.TUGIAssumingTransport$1.run(TUGIAssumingTransport.java:49)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:422)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1878)
	at org.apache.hadoop.hive.thrift.client.TUGIAssumingTransport.open(TUGIAssumingTransport.java:49)
	at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.open(HiveMetaStoreClient.java:478)
	at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.<init>(HiveMetaStoreClient.java:245)
	at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
	at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
	at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
	at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
	at org.apache.hadoop.hive.metastore.MetaStoreUtils.newInstance(MetaStoreUtils.java:1740)
	at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.<init>(RetryingMetaStoreClient.java:83)
	at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:133)
	at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:104)
	at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:97)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at org.apache.iceberg.common.DynMethods$UnboundMethod.invokeChecked(DynMethods.java:60)
	at org.apache.iceberg.common.DynMethods$UnboundMethod.invoke(DynMethods.java:72)
	at org.apache.iceberg.common.DynMethods$StaticMethod.invoke(DynMethods.java:185)
	at org.apache.iceberg.hive.HiveClientPool.newClient(HiveClientPool.java:63)
	at org.apache.iceberg.hive.HiveClientPool.newClient(HiveClientPool.java:34)
	at org.apache.iceberg.ClientPoolImpl.get(ClientPoolImpl.java:125)
	at org.apache.iceberg.ClientPoolImpl.run(ClientPoolImpl.java:56)
	at org.apache.iceberg.ClientPoolImpl.run(ClientPoolImpl.java:51)
	at org.apache.iceberg.hive.CachedClientPool.run(CachedClientPool.java:122)
	at org.apache.iceberg.hive.HiveTableOperations.doRefresh(HiveTableOperations.java:158)
	at org.apache.iceberg.BaseMetastoreTableOperations.refresh(BaseMetastoreTableOperations.java:97)
	at org.apache.iceberg.BaseMetastoreTableOperations.current(BaseMetastoreTableOperations.java:80)
	at org.apache.iceberg.BaseMetastoreCatalog.loadTable(BaseMetastoreCatalog.java:47)
	at org.apache.iceberg.shaded.com.github.benmanes.caffeine.cache.BoundedLocalCache.lambda$doComputeIfAbsent$14(BoundedLocalCache.java:2406)
	at java.util.concurrent.ConcurrentHashMap.compute(ConcurrentHashMap.java:1853)
	at org.apache.iceberg.shaded.com.github.benmanes.caffeine.cache.BoundedLocalCache.doComputeIfAbsent(BoundedLocalCache.java:2404)
	at org.apache.iceberg.shaded.com.github.benmanes.caffeine.cache.BoundedLocalCache.computeIfAbsent(BoundedLocalCache.java:2387)
	at org.apache.iceberg.shaded.com.github.benmanes.caffeine.cache.LocalCache.computeIfAbsent(LocalCache.java:108)
	at org.apache.iceberg.shaded.com.github.benmanes.caffeine.cache.LocalManualCache.get(LocalManualCache.java:62)
	at org.apache.iceberg.CachingCatalog.loadTable(CachingCatalog.java:166)
	at org.apache.iceberg.spark.SparkCatalog.load(SparkCatalog.java:642)
	at org.apache.iceberg.spark.SparkCatalog.loadTable(SparkCatalog.java:160)
	at org.apache.spark.sql.connector.catalog.CatalogV2Util$.loadTable(CatalogV2Util.scala:311)
	at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$.$anonfun$lookupRelation$3(Analyzer.scala:1197)
	at scala.Option.orElse(Option.scala:447)
	at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$.$anonfun$lookupRelation$1(Analyzer.scala:1196)
	at scala.Option.orElse(Option.scala:447)
	at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$.org$apache$spark$sql$catalyst$analysis$Analyzer$ResolveRelations$$lookupRelation(Analyzer.scala:1188)
	at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$$anonfun$apply$13.applyOrElse(Analyzer.scala:1059)
	at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$$anonfun$apply$13.applyOrElse(Analyzer.scala:1023)
	at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.$anonfun$resolveOperatorsUpWithPruning$3(AnalysisHelper.scala:138)
	at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:176)
	at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.$anonfun$resolveOperatorsUpWithPruning$1(AnalysisHelper.scala:138)
	at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$.allowInvokingTransformsInAnalyzer(AnalysisHelper.scala:323)
	at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.resolveOperatorsUpWithPruning(AnalysisHelper.scala:134)
	at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.resolveOperatorsUpWithPruning$(AnalysisHelper.scala:130)
	at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.resolveOperatorsUpWithPruning(LogicalPlan.scala:30)
	at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.$anonfun$resolveOperatorsUpWithPruning$2(AnalysisHelper.scala:135)
	at org.apache.spark.sql.catalyst.trees.UnaryLike.mapChildren(TreeNode.scala:1228)
	at org.apache.spark.sql.catalyst.trees.UnaryLike.mapChildren$(TreeNode.scala:1227)
	at org.apache.spark.sql.catalyst.plans.logical.OrderPreservingUnaryNode.mapChildren(LogicalPlan.scala:208)
	at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.$anonfun$resolveOperatorsUpWithPruning$1(AnalysisHelper.scala:135)
	at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$.allowInvokingTransformsInAnalyzer(AnalysisHelper.scala:323)
	at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.resolveOperatorsUpWithPruning(AnalysisHelper.scala:134)
	at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.resolveOperatorsUpWithPruning$(AnalysisHelper.scala:130)
	at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.resolveOperatorsUpWithPruning(LogicalPlan.scala:30)
	at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$.apply(Analyzer.scala:1023)
	at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$.apply(Analyzer.scala:982)
	at org.apache.spark.sql.catalyst.rules.RuleExecutor.$anonfun$execute$2(RuleExecutor.scala:211)
	at scala.collection.LinearSeqOptimized.foldLeft(LinearSeqOptimized.scala:126)
	at scala.collection.LinearSeqOptimized.foldLeft$(LinearSeqOptimized.scala:122)
	at scala.collection.immutable.List.foldLeft(List.scala:91)
	at org.apache.spark.sql.catalyst.rules.RuleExecutor.$anonfun$execute$1(RuleExecutor.scala:208)
	at org.apache.spark.sql.catalyst.rules.RuleExecutor.$anonfun$execute$1$adapted(RuleExecutor.scala:200)
	at scala.collection.immutable.List.foreach(List.scala:431)
	at org.apache.spark.sql.catalyst.rules.RuleExecutor.execute(RuleExecutor.scala:200)
	at org.apache.spark.sql.catalyst.analysis.Analyzer.org$apache$spark$sql$catalyst$analysis$Analyzer$$executeSameContext(Analyzer.scala:231)
	at org.apache.spark.sql.catalyst.analysis.Analyzer.$anonfun$execute$1(Analyzer.scala:227)
	at org.apache.spark.sql.catalyst.analysis.AnalysisContext$.withNewAnalysisContext(Analyzer.scala:173)
	at org.apache.spark.sql.catalyst.analysis.Analyzer.execute(Analyzer.scala:227)
	at org.apache.spark.sql.catalyst.analysis.Analyzer.execute(Analyzer.scala:188)
	at org.apache.spark.sql.catalyst.rules.RuleExecutor.$anonfun$executeAndTrack$1(RuleExecutor.scala:179)
	at org.apache.spark.sql.catalyst.QueryPlanningTracker$.withTracker(QueryPlanningTracker.scala:88)
	at org.apache.spark.sql.catalyst.rules.RuleExecutor.executeAndTrack(RuleExecutor.scala:179)
	at org.apache.spark.sql.catalyst.analysis.Analyzer.$anonfun$executeAndCheck$1(Analyzer.scala:212)
	at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$.markInAnalyzer(AnalysisHelper.scala:330)
	at org.apache.spark.sql.catalyst.analysis.Analyzer.executeAndCheck(Analyzer.scala:211)
	at org.apache.spark.sql.execution.QueryExecution.$anonfun$analyzed$1(QueryExecution.scala:76)
	at org.apache.spark.sql.catalyst.QueryPlanningTracker.measurePhase(QueryPlanningTracker.scala:111)
	at org.apache.spark.sql.execution.QueryExecution.$anonfun$executePhase$2(QueryExecution.scala:185)
	at org.apache.spark.sql.execution.QueryExecution$.withInternalError(QueryExecution.scala:510)
	at org.apache.spark.sql.execution.QueryExecution.$anonfun$executePhase$1(QueryExecution.scala:185)
	at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:779)
	at org.apache.spark.sql.execution.QueryExecution.executePhase(QueryExecution.scala:184)
	at org.apache.spark.sql.execution.QueryExecution.analyzed$lzycompute(QueryExecution.scala:76)
	at org.apache.spark.sql.execution.QueryExecution.analyzed(QueryExecution.scala:74)
	at org.apache.spark.sql.execution.QueryExecution.assertAnalyzed(QueryExecution.scala:66)
	at org.apache.spark.sql.Dataset$.$anonfun$ofRows$2(Dataset.scala:98)
	at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:779)
	at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:96)
	at org.apache.spark.sql.SparkSession.$anonfun$sql$1(SparkSession.scala:622)
	at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:779)
	at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:617)
	at org.apache.kyuubi.engine.spark.operation.ExecuteStatement.$anonfun$executeStatement$1(ExecuteStatement.scala:86)
	at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
	at org.apache.kyuubi.engine.spark.operation.SparkOperation.$anonfun$withLocalProperties$1(SparkOperation.scala:147)
	at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:169)
	at org.apache.kyuubi.engine.spark.operation.SparkOperation.withLocalProperties(SparkOperation.scala:131)
	at org.apache.kyuubi.engine.spark.operation.ExecuteStatement.executeStatement(ExecuteStatement.scala:81)
	at org.apache.kyuubi.engine.spark.operation.ExecuteStatement$$anon$1.run(ExecuteStatement.scala:103)
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:750)
)
	at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.open(HiveMetaStoreClient.java:527)
	at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.<init>(HiveMetaStoreClient.java:245)
	... 108 more

Kyuubi Server Configurations

spark.master=yarn
spark.submit.deployMode=client
spark.yarn.queue=default
kyuubi.backend.server.event.loggers=JSON
kyuubi.backend.server.event.json.log.path=file:///var/log/kyuubi/events
kyuubi.metrics.reporters=PROMETHEUS
kyuubi.delegation.token.renew.interval=PT24H
kyuubi.session.idle.timeout=PT20M
kyuubi.session.engine.idle.timeout=PT20M
kyuubi.kinit.principal=hive/[email protected]
kyuubi.authentication=KERBEROS
kyuubi.ha.namespace=kyuubi_new
kyuubi.kinit.keytab=/etc/security/keytabs/hive.keytab
kyuubi.ha.addresses=bigdata-1734358521-u7gjy:2181,bigdata-1734358521-gsy9x:2181,bigdata-1734358521-cpg0o:2181

Kyuubi Engine Configurations

spark.kerberos.access.hadoopFileSystems  hdfs:/myns,hdfs://mynsbackup

#local cluster metastore
spark.sql.catalog.spark_catalog=org.apache.iceberg.spark.SparkSessionCatalog
#spark.sql.catalog.spark_catalog=org.apache.iceberg.spark.SparkCatalog
spark.sql.catalog.spark_catalog.type=hive
spark.sql.catalog.spark_catalog.uri=thrift://bigdata-1734358521-u7gjy:9083,thrift://bigdata-1734358521-gsy9x:9083


#an other cluster metastore
spark.sql.catalog.spark_catalog_ky=org.apache.iceberg.spark.SparkCatalog
spark.sql.catalog.spark_catalog_ky.type=hive
spark.sql.catalog.spark_catalog_ky.uri=thrift://bigdata-1734405115-lhalh:9083,thrift://bigdata-1734405115-0xt70:9083

Additional context

No response

Are you willing to submit PR?

  • Yes. I would be willing to submit a PR with guidance from the Kyuubi community to fix.
  • No. I cannot submit a PR at this time.
@zxl-333 zxl-333 added kind:bug This is a clearly a bug priority:major labels Jan 14, 2025
Copy link

Hello @zxl-333,
Thanks for finding the time to report the issue!
We really appreciate the community's efforts to improve Apache Kyuubi.

@pan3793
Copy link
Member

pan3793 commented Jan 14, 2025

The issue is irrelevant to Kyuubi, you are expected to see the same behavior with spark-submit cluster mode.

The root cause is that Iceberg does not implement the HadoopDelegationTokenProvider, so the secondary HMS client won't fetch the token properly.

To workaround this issue, put https://mvnrepository.com/artifact/org.apache.kyuubi/kyuubi-spark-connector-hive in your SPARK_HOME/jars and add the following configuration in your spark-defaults.conf

spark.sql.catalog.spark_catalog_ky=org.apache.kyuubi.spark.connector.hive.HiveTableCatalog
spark.sql.catalog.spark_catalog_ky.hive.metastore.uris=thrift://bigdata-1734405115-lhalh:9083,thrift://bigdata-1734405115-0xt70:9083

See more technical details at #4560

@zxl-333
Copy link
Author

zxl-333 commented Jan 14, 2025

I have tested this by adding it. Once I add this configuration, kyuubi cannot query the iceberg table of the cluster

@pan3793
Copy link
Member

pan3793 commented Jan 14, 2025

... kyuubi cannot query the iceberg table of the cluster

when you say something does not work, provide the concrete configuration and stacktrace, otherwise, you should not expect active responses to your question

@zxl-333
Copy link
Author

zxl-333 commented Jan 14, 2025

spark.sql.catalog.spark_catalog_ky=org.apache.kyuubi.spark.connector.hive.HiveTableCatalog
spark.sql.catalog.spark_catalog_ky.hive.metastore.uris=thrift://bigdata-1734405115-lhalh:9083,thrift://bigdata-1734405115-0xt70:9083

When I add the above configuration, using kyuubi cross-domain query iceberg is, kyuubi server throws the following exception

2025-01-14 11:20:18.690 INFO KyuubiTBinaryFrontendHandler-Pool: Thread-80 org.apache.kyuubi.credentials.HadoopCredentialsManager: Update session credentials epoch from -1 to 0
2025-01-14 11:20:18.697 INFO KyuubiTBinaryFrontendHandler-Pool: Thread-80 org.apache.kyuubi.operation.ExecuteStatement: Processing hdfs's query[7a3dfe00-1a79-499e-80f8-64d5c673b6c7]: PENDING_STATE -> ERROR_STATE, time taken: 1.736824818697E9 seconds
2025-01-14 11:20:18.713 INFO KyuubiTBinaryFrontendHandler-Pool: Thread-80 org.apache.kyuubi.operation.ExecuteStatement: Processing hdfs's query[7a3dfe00-1a79-499e-80f8-64d5c673b6c7]: ERROR_STATE -> CLOSED_STATE, time taken: 1.736824818713E9 seconds
2025-01-14 11:20:18.715 ERROR KyuubiTBinaryFrontendHandler-Pool: Thread-80 org.apache.kyuubi.server.KyuubiTBinaryFrontendService: Error executing statement:
org.apache.kyuubi.KyuubiSQLException: Error operating ExecuteStatement: org.apache.thrift.transport.TTransportException: java.net.SocketException: Connection reset
        at org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:129)
        at org.apache.thrift.transport.TTransport.readAll(TTransport.java:86)
        at org.apache.thrift.transport.TSaslTransport.readLength(TSaslTransport.java:376)
        at org.apache.thrift.transport.TSaslTransport.readFrame(TSaslTransport.java:453)
        at org.apache.thrift.transport.TSaslTransport.read(TSaslTransport.java:435)
        at org.apache.thrift.transport.TSaslClientTransport.read(TSaslClientTransport.java:37)
        at org.apache.thrift.transport.TTransport.readAll(TTransport.java:86)
        at org.apache.thrift.protocol.TBinaryProtocol.readAll(TBinaryProtocol.java:429)
        at org.apache.thrift.protocol.TBinaryProtocol.readI32(TBinaryProtocol.java:318)
        at org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:219)
        at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:77)
        at org.apache.hive.service.rpc.thrift.TCLIService$Client.recv_ExecuteStatement(TCLIService.java:245)
        at org.apache.hive.service.rpc.thrift.TCLIService$Client.ExecuteStatement(TCLIService.java:232)
        at org.apache.kyuubi.client.KyuubiSyncThriftClient.$anonfun$executeStatement$1(KyuubiSyncThriftClient.scala:255)
        at org.apache.kyuubi.client.KyuubiSyncThriftClient.$anonfun$withLockAcquiredAsyncRequest$2(KyuubiSyncThriftClient.scala:154)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:750)
Caused by: java.net.SocketException: Connection reset
        at java.net.SocketInputStream.read(SocketInputStream.java:210)
        at java.net.SocketInputStream.read(SocketInputStream.java:141)
        at java.io.BufferedInputStream.fill(BufferedInputStream.java:246)
        at java.io.BufferedInputStream.read1(BufferedInputStream.java:286)
        at java.io.BufferedInputStream.read(BufferedInputStream.java:345)
        at org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:127)
        ... 20 more

        at org.apache.kyuubi.KyuubiSQLException$.apply(KyuubiSQLException.scala:70)
        at org.apache.kyuubi.operation.KyuubiOperation$$anonfun$onError$1.$anonfun$applyOrElse$1(KyuubiOperation.scala:94)
        at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
        at org.apache.kyuubi.Utils$.withLockRequired(Utils.scala:425)
        at org.apache.kyuubi.operation.AbstractOperation.withLockRequired(AbstractOperation.scala:52)
        at org.apache.kyuubi.operation.KyuubiOperation$$anonfun$onError$1.applyOrElse(KyuubiOperation.scala:78)
        at org.apache.kyuubi.operation.KyuubiOperation$$anonfun$onError$1.applyOrElse(KyuubiOperation.scala:75)
        at scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:38)
        at org.apache.kyuubi.operation.ExecuteStatement.executeStatement(ExecuteStatement.scala:74)
        at org.apache.kyuubi.operation.ExecuteStatement.runInternal(ExecuteStatement.scala:171)
        at org.apache.kyuubi.operation.KyuubiOperation.$anonfun$run$1(KyuubiOperation.scala:107)
        at org.apache.kyuubi.session.KyuubiSession.handleSessionException(KyuubiSession.scala:49)
        at org.apache.kyuubi.operation.KyuubiOperation.run(KyuubiOperation.scala:107)
        at org.apache.kyuubi.session.AbstractSession.runOperation(AbstractSession.scala:101)
        at org.apache.kyuubi.session.KyuubiSessionImpl.runOperation(KyuubiSessionImpl.scala:232)
        at org.apache.kyuubi.session.AbstractSession.$anonfun$executeStatement$1(AbstractSession.scala:131)
        at org.apache.kyuubi.session.AbstractSession.withAcquireRelease(AbstractSession.scala:82)
        at org.apache.kyuubi.session.AbstractSession.executeStatement(AbstractSession.scala:128)
        at org.apache.kyuubi.session.KyuubiSessionImpl.super$executeStatement(KyuubiSessionImpl.scala:306)
        at org.apache.kyuubi.session.KyuubiSessionImpl.$anonfun$executeStatement$1(KyuubiSessionImpl.scala:306)
        at org.apache.kyuubi.session.AbstractSession.withAcquireRelease(AbstractSession.scala:82)
        at org.apache.kyuubi.session.KyuubiSessionImpl.executeStatement(KyuubiSessionImpl.scala:297)
        at org.apache.kyuubi.service.AbstractBackendService.executeStatement(AbstractBackendService.scala:67)
        at org.apache.kyuubi.server.KyuubiServer$$anon$1.org$apache$kyuubi$server$BackendServiceMetric$$super$executeStatement(KyuubiServer.scala:171)
        at org.apache.kyuubi.server.BackendServiceMetric.$anonfun$executeStatement$1(BackendServiceMetric.scala:62)
        at org.apache.kyuubi.metrics.MetricsSystem$.timerTracing(MetricsSystem.scala:112)
        at org.apache.kyuubi.server.BackendServiceMetric.executeStatement(BackendServiceMetric.scala:62)
        at org.apache.kyuubi.server.BackendServiceMetric.executeStatement$(BackendServiceMetric.scala:55)
        at org.apache.kyuubi.server.KyuubiServer$$anon$1.executeStatement(KyuubiServer.scala:171)
        at org.apache.kyuubi.service.TFrontendService.ExecuteStatement(TFrontendService.scala:252)
        at org.apache.hive.service.rpc.thrift.TCLIService$Processor$ExecuteStatement.getResult(TCLIService.java:1557)
        at org.apache.hive.service.rpc.thrift.TCLIService$Processor$ExecuteStatement.getResult(TCLIService.java:1542)
        at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
        at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39)
        at org.apache.kyuubi.service.authentication.HadoopThriftAuthBridgeServer$TUGIAssumingProcessor.process(HadoopThriftAuthBridgeServer.scala:163)
        at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:286)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:750)
Caused by: org.apache.thrift.transport.TTransportException: java.net.SocketException: Connection reset
        at org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:129)
        at org.apache.thrift.transport.TTransport.readAll(TTransport.java:86)
        at org.apache.thrift.transport.TSaslTransport.readLength(TSaslTransport.java:376)
        at org.apache.thrift.transport.TSaslTransport.readFrame(TSaslTransport.java:453)
        at org.apache.thrift.transport.TSaslTransport.read(TSaslTransport.java:435)
        at org.apache.thrift.transport.TSaslClientTransport.read(TSaslClientTransport.java:37)
        at org.apache.thrift.transport.TTransport.readAll(TTransport.java:86)
        at org.apache.thrift.protocol.TBinaryProtocol.readAll(TBinaryProtocol.java:429)
        at org.apache.thrift.protocol.TBinaryProtocol.readI32(TBinaryProtocol.java:318)
        at org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:219)
        at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:77)
        at org.apache.hive.service.rpc.thrift.TCLIService$Client.recv_ExecuteStatement(TCLIService.java:245)
        at org.apache.hive.service.rpc.thrift.TCLIService$Client.ExecuteStatement(TCLIService.java:232)
        at org.apache.kyuubi.client.KyuubiSyncThriftClient.$anonfun$executeStatement$1(KyuubiSyncThriftClient.scala:255)
        at org.apache.kyuubi.client.KyuubiSyncThriftClient.$anonfun$withLockAcquiredAsyncRequest$2(KyuubiSyncThriftClient.scala:154)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
        ... 3 more
Caused by: java.net.SocketException: Connection reset
        at java.net.SocketInputStream.read(SocketInputStream.java:210)
        at java.net.SocketInputStream.read(SocketInputStream.java:141)
        at java.io.BufferedInputStream.fill(BufferedInputStream.java:246)
        at java.io.BufferedInputStream.read1(BufferedInputStream.java:286)
        at java.io.BufferedInputStream.read(BufferedInputStream.java:345)
        at org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:127)
        ... 20 more
2025-01-14 11:20:25.359 INFO KyuubiSessionManager-timeout-checker: Thread-40 org.apache.kyuubi.session.KyuubiSessionManager: Checking sessions timeout, current count: 1
2025-01-14 11:20:26.400 INFO KyuubiTBinaryFrontendHandler-Pool: Thread-80 org.apache.kyuubi.server.KyuubiTBinaryFrontendService: Received request of closing SessionHandle [09942b83-ff24-46e2-9ab6-837a85f4bdcf]
2025-01-14 11:20:26.400 INFO KyuubiTBinaryFrontendHandler-Pool: Thread-80 org.apache.kyuubi.session.KyuubiSessionManager: hdfs's KyuubiSessionImpl with SessionHandle [09942b83-ff24-46e2-9ab6-837a85f4bdcf] is closed, current opening sessions 0
2025-01-14 11:20:26.401 INFO KyuubiTBinaryFrontendHandler-Pool: Thread-80 org.apache.kyuubi.operation.LaunchEngine: Processing hdfs's query[54f6a863-7ac0-4033-82c0-3666d3c29402]: FINISHED_STATE -> CLOSED_STATE, time taken: 78.706 seconds
2025-01-14 11:20:26.404 WARN KyuubiTBinaryFrontendHandler-Pool: Thread-80 org.apache.thrift.transport.TIOStreamTransport: Error closing output stream.
java.net.SocketException: Socket closed
        at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:118)
        at java.net.SocketOutputStream.write(SocketOutputStream.java:155)
        at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82)
        at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:140)
        at java.io.FilterOutputStream.close(FilterOutputStream.java:158)
        at org.apache.thrift.transport.TIOStreamTransport.close(TIOStreamTransport.java:110)
        at org.apache.thrift.transport.TSocket.close(TSocket.java:235)
        at org.apache.thrift.transport.TSaslTransport.close(TSaslTransport.java:402)
        at org.apache.thrift.transport.TSaslClientTransport.close(TSaslClientTransport.java:37)
        at org.apache.kyuubi.client.KyuubiSyncThriftClient.$anonfun$closeSession$5(KyuubiSyncThriftClient.scala:238)
        at org.apache.kyuubi.client.KyuubiSyncThriftClient.$anonfun$closeSession$5$adapted(KyuubiSyncThriftClient.scala:237)
        at scala.collection.immutable.List.foreach(List.scala:431)
        at org.apache.kyuubi.client.KyuubiSyncThriftClient.closeSession(KyuubiSyncThriftClient.scala:237)
        at org.apache.kyuubi.session.KyuubiSessionImpl.close(KyuubiSessionImpl.scala:273)
        at org.apache.kyuubi.session.SessionManager.closeSession(SessionManager.scala:133)
        at org.apache.kyuubi.session.KyuubiSessionManager.closeSession(KyuubiSessionManager.scala:127)
        at org.apache.kyuubi.service.AbstractBackendService.closeSession(AbstractBackendService.scala:50)
        at org.apache.kyuubi.server.KyuubiServer$$anon$1.org$apache$kyuubi$server$BackendServiceMetric$$super$closeSession(KyuubiServer.scala:171)
        at org.apache.kyuubi.server.BackendServiceMetric.$anonfun$closeSession$1(BackendServiceMetric.scala:43)
        at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
        at org.apache.kyuubi.metrics.MetricsSystem$.timerTracing(MetricsSystem.scala:112)
        at org.apache.kyuubi.server.BackendServiceMetric.closeSession(BackendServiceMetric.scala:43)
        at org.apache.kyuubi.server.BackendServiceMetric.closeSession$(BackendServiceMetric.scala:41)
        at org.apache.kyuubi.server.KyuubiServer$$anon$1.closeSession(KyuubiServer.scala:171)
        at org.apache.kyuubi.service.TFrontendService.CloseSession(TFrontendService.scala:209)
        at org.apache.hive.service.rpc.thrift.TCLIService$Processor$CloseSession.getResult(TCLIService.java:1517)
        at org.apache.hive.service.rpc.thrift.TCLIService$Processor$CloseSession.getResult(TCLIService.java:1502)
        at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
        at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39)
        at org.apache.kyuubi.service.authentication.HadoopThriftAuthBridgeServer$TUGIAssumingProcessor.process(HadoopThriftAuthBridgeServer.scala:163)
        at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:286)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:750)
2025-01-14 11:20:26.412 ERROR KyuubiTBinaryFrontendHandler-Pool: Thread-80 org.apache.kyuubi.server.KyuubiTBinaryFrontendService: Error closing session:
org.apache.kyuubi.KyuubiSQLException: Error while cleaning up the engine resources
        at org.apache.kyuubi.KyuubiSQLException$.apply(KyuubiSQLException.scala:70)
        at org.apache.kyuubi.client.KyuubiSyncThriftClient.closeSession(KyuubiSyncThriftClient.scala:223)
        at org.apache.kyuubi.session.KyuubiSessionImpl.close(KyuubiSessionImpl.scala:273)
        at org.apache.kyuubi.session.SessionManager.closeSession(SessionManager.scala:133)
        at org.apache.kyuubi.session.KyuubiSessionManager.closeSession(KyuubiSessionManager.scala:127)
        at org.apache.kyuubi.service.AbstractBackendService.closeSession(AbstractBackendService.scala:50)
        at org.apache.kyuubi.server.KyuubiServer$$anon$1.org$apache$kyuubi$server$BackendServiceMetric$$super$closeSession(KyuubiServer.scala:171)
        at org.apache.kyuubi.server.BackendServiceMetric.$anonfun$closeSession$1(BackendServiceMetric.scala:43)
        at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
        at org.apache.kyuubi.metrics.MetricsSystem$.timerTracing(MetricsSystem.scala:112)
        at org.apache.kyuubi.server.BackendServiceMetric.closeSession(BackendServiceMetric.scala:43)
        at org.apache.kyuubi.server.BackendServiceMetric.closeSession$(BackendServiceMetric.scala:41)
        at org.apache.kyuubi.server.KyuubiServer$$anon$1.closeSession(KyuubiServer.scala:171)
        at org.apache.kyuubi.service.TFrontendService.CloseSession(TFrontendService.scala:209)
        at org.apache.hive.service.rpc.thrift.TCLIService$Processor$CloseSession.getResult(TCLIService.java:1517)
        at org.apache.hive.service.rpc.thrift.TCLIService$Processor$CloseSession.getResult(TCLIService.java:1502)
        at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
        at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39)
        at org.apache.kyuubi.service.authentication.HadoopThriftAuthBridgeServer$TUGIAssumingProcessor.process(HadoopThriftAuthBridgeServer.scala:163)
        at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:286)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:750)
Caused by: org.apache.thrift.transport.TTransportException: java.net.SocketException: Broken pipe (Write failed)
        at org.apache.thrift.transport.TIOStreamTransport.flush(TIOStreamTransport.java:161)
        at org.apache.thrift.transport.TSaslTransport.flush(TSaslTransport.java:501)
        at org.apache.thrift.transport.TSaslClientTransport.flush(TSaslClientTransport.java:37)
        at org.apache.thrift.TServiceClient.sendBase(TServiceClient.java:73)
        at org.apache.thrift.TServiceClient.sendBase(TServiceClient.java:62)
        at org.apache.hive.service.rpc.thrift.TCLIService$Client.send_CloseSession(TCLIService.java:193)
        at org.apache.hive.service.rpc.thrift.TCLIService$Client.CloseSession(TCLIService.java:185)
        at org.apache.kyuubi.client.KyuubiSyncThriftClient.$anonfun$closeSession$1(KyuubiSyncThriftClient.scala:218)
        at org.apache.kyuubi.client.KyuubiSyncThriftClient.$anonfun$withLockAcquiredAsyncRequest$2(KyuubiSyncThriftClient.scala:154)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
        ... 3 more
Caused by: java.net.SocketException: Broken pipe (Write failed)
        at java.net.SocketOutputStream.socketWrite0(Native Method)
        at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:111)
        at java.net.SocketOutputStream.write(SocketOutputStream.java:155)
        at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82)
        at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:140)
        at org.apache.thrift.transport.TIOStreamTransport.flush(TIOStreamTransport.java:159)
        ... 14 more

@zxl-333
Copy link
Author

zxl-333 commented Jan 14, 2025

Using the following configuration, I can read the iceberg data across domains, but I cannot read the data of the local iceberg table, and the execution is normal

spark-defaults.conf

spark.kerberos.access.hadoopFileSystems  hdfs://myns,hdfs://mynsbackup
spark.sql.catalog.hive_catalog     org.apache.kyuubi.spark.connector.hive.HiveTableCatalog
spark.sql.catalog.hive_catalog.hive.metastore.uris     thrift://bigdata-1734405115-lhalh:9083,thrift://bigdata-1734405115-0xt70:9083


#local cluster metastore
spark.sql.catalog.spark_catalog=org.apache.iceberg.spark.SparkSessionCatalog
#spark.sql.catalog.spark_catalog=org.apache.iceberg.spark.SparkCatalog
spark.sql.catalog.spark_catalog.type=hive
spark.sql.catalog.spark_catalog.uri=thrift://bigdata-1734358521-u7gjy:9083,thrift://bigdata-1734358521-gsy9x:9083
#配置下面是为了解决当前iceberg的uri和HiveConf.ConfVars.METASTOREURIS不相等问题
spark.sql.catalog.spark_catalog org.apache.kyuubi.spark.connector.hive.HiveTableCatalog
spark.sql.catalog.spark_catalog.hive.metastore.uris thrift://bigdata-1734358521-u7gjy:9083,thrift://bigdata-1734358521-gsy9x:9083

#an other cluster metastore
spark.sql.catalog.spark_catalog_ky=org.apache.iceberg.spark.SparkCatalog
spark.sql.catalog.spark_catalog_ky.type=hive
spark.sql.catalog.spark_catalog_ky.uri=thrift://bigdata-1734405115-lhalh:9083,thrift://bigdata-1734405115-0xt70:9083
spark.sql.catalog.spark_catalog_ky.hive.metastore.kerberos.principal=hive/[email protected]

kyuubi-defaults.conf

#kyuubi.credentials.hadoopfs.uris=hdfs://myns,hdfs://mynsbackup
hive.metastore.uris=thrift://bigdata-1734405115-lhalh:9083,thrift://bigdata-1734405115-0xt70:9083
#hive.metastore.kerberos.principal=hive/[email protected]

read local iceberg logs:

0: jdbc:hive2://bigdata-vm-1734358521-u7gjy:2> select * from test_iceberg;
2025-01-14 11:41:18.843 INFO KyuubiTBinaryFrontendHandler-Pool: Thread-59 org.apache.kyuubi.credentials.HadoopCredentialsManager: Send new credentials with epoch 0 to SQL engine through session 6abfc827-9d53-418b-9f8e-134776c1b552
2025-01-14 11:41:18.879 INFO KyuubiTBinaryFrontendHandler-Pool: Thread-59 org.apache.kyuubi.credentials.HadoopCredentialsManager: Update session credentials epoch from -1 to 0
2025-01-14 11:41:18.951 INFO KyuubiSessionManager-exec-pool: Thread-78 org.apache.kyuubi.operation.ExecuteStatement: Processing hdfs's query[e0c67acb-aacb-43f6-a6f9-e28ab8cfdfdb]: PENDING_STATE -> RUNNING_STATE, statement:
select * from test_iceberg
25/01/14 11:41:18 INFO ExecuteStatement: Processing hdfs's query[e0c67acb-aacb-43f6-a6f9-e28ab8cfdfdb]: PENDING_STATE -> RUNNING_STATE, statement:
select * from test_iceberg
25/01/14 11:41:18 INFO ExecuteStatement: 
           Spark application name: zxl_test
                 application ID: application_1734577912891_0159
                 application web UI: http://bigdata-1734358521-u7gjy:8088/proxy/application_1734577912891_0159,http://bigdata-1734358521-cpg0o:8088/proxy/application_1734577912891_0159
                 master: yarn
                 deploy mode: client
                 version: 3.3.3
           Start time: 2025-01-14T11:40:28.979
           User: hdfs
25/01/14 11:41:20 INFO ExecuteStatement: Execute in full collect mode
25/01/14 11:41:20 INFO V2ScanRelationPushDown: 
Output: k#12, v#13
         
25/01/14 11:41:20 INFO HiveInMemoryFileIndex: It took 71 ms to list leaf files for 1 paths.
25/01/14 11:41:20 WARN HiveConf: HiveConf of name hive.privilege.synchronizer does not exist
25/01/14 11:41:20 WARN HiveConf: HiveConf of name hive.io.file.read.all.columns does not exist
25/01/14 11:41:20 WARN HiveConf: HiveConf of name hive.server2.webui.cors.allowed.headers does not exist
25/01/14 11:41:20 WARN HiveConf: HiveConf of name hive.hook.proto.base-directory does not exist
25/01/14 11:41:20 WARN HiveConf: HiveConf of name hive.service.metrics.codahale.reporter.classes does not exist
25/01/14 11:41:20 WARN HiveConf: HiveConf of name hive.io.file.readcolumn.ids does not exist
25/01/14 11:41:20 WARN HiveConf: HiveConf of name hive.metastore.db.type does not exist
25/01/14 11:41:20 WARN HiveConf: HiveConf of name hive.metastore.warehouse.external.dir does not exist
25/01/14 11:41:20 WARN HiveConf: HiveConf of name hive.io.file.readcolumn.names does not exist
25/01/14 11:41:20 WARN HiveConf: HiveConf of name hive.server2.webui.enable.cors does not exist
25/01/14 11:41:20 INFO metastore: Trying to connect to metastore with URI thrift://bigdata-vm-1734358521-u7gjy:9083
25/01/14 11:41:21 INFO metastore: Opened a connection to metastore, current connections: 1
25/01/14 11:41:21 INFO metastore: Connected to metastore.
25/01/14 11:41:21 INFO BaseMetastoreTableOperations: Refreshing table metadata from new version: hdfs://myns/warehouse/tablespace/managed/hive/test_iceberg/metadata/00001-5390c4c7-a3cd-439b-af2f-d53ab294d6cd.metadata.json
25/01/14 11:41:21 INFO BaseMetastoreCatalog: Table loaded by catalog: default_iceberg.default.test_iceberg
25/01/14 11:41:21 INFO HiveIcebergSerDe: Using schema from existing table {"type":"struct","schema-id":0,"fields":[{"id":1,"name":"k","required":false,"type":"string"},{"id":2,"name":"v","required":false,"type":"int"}]}
25/01/14 11:41:21 INFO BaseMetastoreTableOperations: Refreshing table metadata from new version: hdfs://myns/warehouse/tablespace/managed/hive/test_iceberg/metadata/00001-5390c4c7-a3cd-439b-af2f-d53ab294d6cd.metadata.json
25/01/14 11:41:21 INFO BaseMetastoreCatalog: Table loaded by catalog: default_iceberg.default.test_iceberg
25/01/14 11:41:21 INFO MemoryStore: Block broadcast_1 stored as values in memory (estimated size 463.7 KiB, free 3.7 GiB)
25/01/14 11:41:21 INFO MemoryStore: Block broadcast_1_piece0 stored as bytes in memory (estimated size 47.7 KiB, free 3.7 GiB)
25/01/14 11:41:21 INFO SparkContext: Created broadcast 1 from broadcast at HiveScan.scala:73
25/01/14 11:41:21 WARN SQLConf: The SQL config 'spark.sql.adaptive.coalescePartitions.minPartitionNum' has been deprecated in Spark v3.2 and may be removed in the future. Use 'spark.sql.adaptive.coalescePartitions.minPartitionSize' instead.
25/01/14 11:41:21 WARN SQLConf: The SQL config 'spark.sql.adaptive.shuffle.targetPostShuffleInputSize' has been deprecated in Spark v3.0 and may be removed in the future. Use 'spark.sql.adaptive.advisoryPartitionSizeInBytes' instead of it.
25/01/14 11:41:21 INFO HiveInMemoryFileIndex: It took 1 ms to list leaf files for 1 paths.
25/01/14 11:41:22 INFO BaseMetastoreTableOperations: Refreshing table metadata from new version: hdfs://myns/warehouse/tablespace/managed/hive/test_iceberg/metadata/00001-5390c4c7-a3cd-439b-af2f-d53ab294d6cd.metadata.json
25/01/14 11:41:22 INFO BaseMetastoreCatalog: Table loaded by catalog: default_iceberg.default.test_iceberg
25/01/14 11:41:22 INFO HiveIcebergSerDe: Using schema from existing table {"type":"struct","schema-id":0,"fields":[{"id":1,"name":"k","required":false,"type":"string"},{"id":2,"name":"v","required":false,"type":"int"}]}
25/01/14 11:41:22 INFO BaseMetastoreTableOperations: Refreshing table metadata from new version: hdfs://myns/warehouse/tablespace/managed/hive/test_iceberg/metadata/00001-5390c4c7-a3cd-439b-af2f-d53ab294d6cd.metadata.json
25/01/14 11:41:22 INFO BaseMetastoreCatalog: Table loaded by catalog: default_iceberg.default.test_iceberg
25/01/14 11:41:22 INFO MemoryStore: Block broadcast_2 stored as values in memory (estimated size 463.7 KiB, free 3.7 GiB)
25/01/14 11:41:22 INFO MemoryStore: Block broadcast_2_piece0 stored as bytes in memory (estimated size 47.7 KiB, free 3.7 GiB)
25/01/14 11:41:22 INFO SparkContext: Created broadcast 2 from broadcast at HiveScan.scala:73
25/01/14 11:41:22 WARN SQLConf: The SQL config 'spark.sql.adaptive.coalescePartitions.minPartitionNum' has been deprecated in Spark v3.2 and may be removed in the future. Use 'spark.sql.adaptive.coalescePartitions.minPartitionSize' instead.
25/01/14 11:41:22 WARN SQLConf: The SQL config 'spark.sql.adaptive.shuffle.targetPostShuffleInputSize' has been deprecated in Spark v3.0 and may be removed in the future. Use 'spark.sql.adaptive.advisoryPartitionSizeInBytes' instead of it.
25/01/14 11:41:22 INFO CodeGenerator: Code generated in 24.087307 ms
25/01/14 11:41:22 INFO SparkContext: Starting job: collect at ExecuteStatement.scala:72
25/01/14 11:41:22 INFO DAGScheduler: Job 1 finished: collect at ExecuteStatement.scala:72, took 0.000679 s
25/01/14 11:41:22 INFO SQLOperationListener: Query [e0c67acb-aacb-43f6-a6f9-e28ab8cfdfdb]: Job 1 started with 0 stages, 1 active jobs running
25/01/14 11:41:22 INFO ExecuteStatement: Processing hdfs's query[e0c67acb-aacb-43f6-a6f9-e28ab8cfdfdb]: RUNNING_STATE -> FINISHED_STATE, time taken: 3.325 seconds
25/01/14 11:41:22 INFO SQLOperationListener: Query [e0c67acb-aacb-43f6-a6f9-e28ab8cfdfdb]: Job 1 succeeded, 0 active jobs running
2025-01-14 11:41:22.372 INFO KyuubiSessionManager-exec-pool: Thread-78 org.apache.kyuubi.operation.ExecuteStatement: Query[e0c67acb-aacb-43f6-a6f9-e28ab8cfdfdb] in FINISHED_STATE
2025-01-14 11:41:22.372 INFO KyuubiSessionManager-exec-pool: Thread-78 org.apache.kyuubi.operation.ExecuteStatement: Processing hdfs's query[e0c67acb-aacb-43f6-a6f9-e28ab8cfdfdb]: RUNNING_STATE -> FINISHED_STATE, time taken: 3.418 seconds
+----+----+
| k  | v  |
+----+----+
+----+----+

@pan3793
Copy link
Member

pan3793 commented Jan 14, 2025

I just noticed you are using the client mode, the KSHC workaround only takes effect on cluster mode.

the key points

  • hive.metastore.uris should point to your local HMS
  • spark.sql.catalog.hive_catalog.hive.metastore.uris should point to your secondary HMS
  • use cluster mode

turn on spark's debug logs, and monitor your spark-submit logs with keywords:

Getting Hive delegation token for

@zxl-333
Copy link
Author

zxl-333 commented Jan 15, 2025

When no data is generated in the local query, the spark debug log displays the following error. The error may be caused by this problem.

25/01/15 10:42:07 DEBUG IsolatedClientLoader: hive class: org.apache.iceberg.shaded.com.github.benmanes.caffeine.cache.BoundedLocalCache$AddTask - jar:file:/data01/hadoop/yarn/local/usercache/hdfs/filecache/188/__spark_libs__6695113016244933679.zip/iceberg-spark-runtime-3.3_2.12-1.4.3.jar!/org/apache/iceberg/shaded/com/github/benmanes/caffeine/cache/BoundedLocalCache$AddTask.class
25/01/15 10:42:07 DEBUG IsolatedClientLoader: hive class: org.apache.iceberg.shaded.com.github.benmanes.caffeine.cache.UnsafeRefArrayAccess - jar:file:/data01/hadoop/yarn/local/usercache/hdfs/filecache/188/__spark_libs__6695113016244933679.zip/iceberg-spark-runtime-3.3_2.12-1.4.3.jar!/org/apache/iceberg/shaded/com/github/benmanes/caffeine/cache/UnsafeRefArrayAccess.class
25/01/15 10:42:07 DEBUG IsolatedClientLoader: hive class: org.apache.hadoop.hive.metastore.HiveMetaHook - jar:file:/data01/hadoop/yarn/local/usercache/hdfs/filecache/188/__spark_libs__6695113016244933679.zip/hive-metastore-2.3.9.jar!/org/apache/hadoop/hive/metastore/HiveMetaHook.class
25/01/15 10:42:07 DEBUG UserGroupInformation: PrivilegedAction [as: hdfs (auth:SIMPLE)][action: org.apache.hadoop.hive.thrift.client.TUGIAssumingTransport$1@7e96b4db]
java.lang.Exception
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1875)
	at org.apache.hadoop.hive.thrift.client.TUGIAssumingTransport.open(TUGIAssumingTransport.java:49)
	at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.open(HiveMetaStoreClient.java:478)
	at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.<init>(HiveMetaStoreClient.java:245)
	at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
	at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
	at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
	at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
	at org.apache.hadoop.hive.metastore.MetaStoreUtils.newInstance(MetaStoreUtils.java:1740)
	at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.<init>(RetryingMetaStoreClient.java:83)
	at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:133)
	at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:104)
	at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:97)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at org.apache.iceberg.common.DynMethods$UnboundMethod.invokeChecked(DynMethods.java:60)
	at org.apache.iceberg.common.DynMethods$UnboundMethod.invoke(DynMethods.java:72)
	at org.apache.iceberg.common.DynMethods$StaticMethod.invoke(DynMethods.java:185)
	at org.apache.iceberg.hive.HiveClientPool.newClient(HiveClientPool.java:63)
	at org.apache.iceberg.hive.HiveClientPool.newClient(HiveClientPool.java:34)
	at org.apache.iceberg.ClientPoolImpl.get(ClientPoolImpl.java:125)
	at org.apache.iceberg.ClientPoolImpl.run(ClientPoolImpl.java:56)
	at org.apache.iceberg.ClientPoolImpl.run(ClientPoolImpl.java:51)
	at org.apache.iceberg.hive.CachedClientPool.run(CachedClientPool.java:122)
	at org.apache.iceberg.hive.HiveTableOperations.doRefresh(HiveTableOperations.java:158)
	at org.apache.iceberg.BaseMetastoreTableOperations.refresh(BaseMetastoreTableOperations.java:97)
	at org.apache.iceberg.BaseMetastoreTableOperations.current(BaseMetastoreTableOperations.java:80)
	at org.apache.iceberg.BaseMetastoreCatalog.loadTable(BaseMetastoreCatalog.java:47)
	at org.apache.iceberg.mr.Catalogs.loadTable(Catalogs.java:124)
	at org.apache.iceberg.mr.Catalogs.loadTable(Catalogs.java:111)
	at org.apache.iceberg.mr.hive.HiveIcebergSerDe.initialize(HiveIcebergSerDe.java:84)
	at org.apache.hadoop.hive.serde2.AbstractSerDe.initialize(AbstractSerDe.java:54)
	at org.apache.hadoop.hive.serde2.SerDeUtils.initializeSerDe(SerDeUtils.java:533)
	at org.apache.hadoop.hive.metastore.MetaStoreUtils.getDeserializer(MetaStoreUtils.java:453)
	at org.apache.hadoop.hive.metastore.MetaStoreUtils.getDeserializer(MetaStoreUtils.java:440)
	at org.apache.hadoop.hive.ql.metadata.Table.getDeserializerFromMetaStore(Table.java:281)
	at org.apache.hadoop.hive.ql.metadata.Table.getDeserializer(Table.java:263)
	at org.apache.hadoop.hive.ql.metadata.Table.getColsInternal(Table.java:641)
	at org.apache.hadoop.hive.ql.metadata.Table.getCols(Table.java:624)
	at org.apache.spark.sql.hive.client.HiveClientImpl.liftedTree2$1(HiveClientImpl.scala:448)
	at org.apache.spark.sql.hive.client.HiveClientImpl.org$apache$spark$sql$hive$client$HiveClientImpl$$convertHiveTableToCatalogTable(HiveClientImpl.scala:447)
	at org.apache.spark.sql.hive.client.HiveClientImpl.$anonfun$getTableOption$3(HiveClientImpl.scala:434)
	at scala.Option.map(Option.scala:230)
	at org.apache.spark.sql.hive.client.HiveClientImpl.$anonfun$getTableOption$1(HiveClientImpl.scala:434)
	at org.apache.spark.sql.hive.client.HiveClientImpl.$anonfun$withHiveState$1(HiveClientImpl.scala:298)
	at org.apache.spark.sql.hive.client.HiveClientImpl.liftedTree1$1(HiveClientImpl.scala:229)
	at org.apache.spark.sql.hive.client.HiveClientImpl.retryLocked(HiveClientImpl.scala:228)
	at org.apache.spark.sql.hive.client.HiveClientImpl.withHiveState(HiveClientImpl.scala:278)
	at org.apache.spark.sql.hive.client.HiveClientImpl.getTableOption(HiveClientImpl.scala:432)
	at org.apache.spark.sql.hive.client.HiveClient.getTable(HiveClient.scala:95)
	at org.apache.spark.sql.hive.client.HiveClient.getTable$(HiveClient.scala:94)
	at org.apache.spark.sql.hive.client.HiveClientImpl.getTable(HiveClientImpl.scala:92)
	at org.apache.spark.sql.hive.HiveExternalCatalog.getRawTable(HiveExternalCatalog.scala:122)
	at org.apache.spark.sql.hive.HiveExternalCatalog.$anonfun$getTable$1(HiveExternalCatalog.scala:729)
	at org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:101)
	at org.apache.spark.sql.hive.HiveExternalCatalog.getTable(HiveExternalCatalog.scala:729)
	at org.apache.spark.sql.catalyst.catalog.ExternalCatalogWithListener.getTable(ExternalCatalogWithListener.scala:138)
	at org.apache.spark.sql.catalyst.catalog.SessionCatalog.getTableRawMetadata(SessionCatalog.scala:515)
	at org.apache.spark.sql.catalyst.catalog.SessionCatalog.getTableMetadata(SessionCatalog.scala:500)
	at org.apache.kyuubi.spark.connector.hive.HiveTableCatalog.$anonfun$loadTable$1(HiveTableCatalog.scala:166)
	at org.apache.kyuubi.spark.connector.hive.HiveConnectorUtils$.withSQLConf(HiveConnectorUtils.scala:274)
	at org.apache.kyuubi.spark.connector.hive.HiveTableCatalog.loadTable(HiveTableCatalog.scala:166)
	at org.apache.spark.sql.connector.catalog.CatalogV2Util$.loadTable(CatalogV2Util.scala:311)
	at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$.$anonfun$lookupRelation$3(Analyzer.scala:1197)
	at scala.Option.orElse(Option.scala:447)
	at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$.$anonfun$lookupRelation$1(Analyzer.scala:1196)
	at scala.Option.orElse(Option.scala:447)
	at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$.org$apache$spark$sql$catalyst$analysis$Analyzer$ResolveRelations$$lookupRelation(Analyzer.scala:1188)
	at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$$anonfun$apply$13.applyOrElse(Analyzer.scala:1059)
	at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$$anonfun$apply$13.applyOrElse(Analyzer.scala:1023)
	at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.$anonfun$resolveOperatorsUpWithPruning$3(AnalysisHelper.scala:138)
	at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:176)
	at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.$anonfun$resolveOperatorsUpWithPruning$1(AnalysisHelper.scala:138)
	at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$.allowInvokingTransformsInAnalyzer(AnalysisHelper.scala:323)
	at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.resolveOperatorsUpWithPruning(AnalysisHelper.scala:134)
	at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.resolveOperatorsUpWithPruning$(AnalysisHelper.scala:130)
	at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.resolveOperatorsUpWithPruning(LogicalPlan.scala:30)
	at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.$anonfun$resolveOperatorsUpWithPruning$2(AnalysisHelper.scala:135)
	at org.apache.spark.sql.catalyst.trees.UnaryLike.mapChildren(TreeNode.scala:1228)
	at org.apache.spark.sql.catalyst.trees.UnaryLike.mapChildren$(TreeNode.scala:1227)
	at org.apache.spark.sql.catalyst.plans.logical.OrderPreservingUnaryNode.mapChildren(LogicalPlan.scala:208)
	at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.$anonfun$resolveOperatorsUpWithPruning$1(AnalysisHelper.scala:135)
	at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$.allowInvokingTransformsInAnalyzer(AnalysisHelper.scala:323)
	at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.resolveOperatorsUpWithPruning(AnalysisHelper.scala:134)
	at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.resolveOperatorsUpWithPruning$(AnalysisHelper.scala:130)
	at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.resolveOperatorsUpWithPruning(LogicalPlan.scala:30)
	at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$.apply(Analyzer.scala:1023)
	at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$.apply(Analyzer.scala:982)
	at org.apache.spark.sql.catalyst.rules.RuleExecutor.$anonfun$execute$2(RuleExecutor.scala:211)
	at scala.collection.LinearSeqOptimized.foldLeft(LinearSeqOptimized.scala:126)
	at scala.collection.LinearSeqOptimized.foldLeft$(LinearSeqOptimized.scala:122)
	at scala.collection.immutable.List.foldLeft(List.scala:91)
	at org.apache.spark.sql.catalyst.rules.RuleExecutor.$anonfun$execute$1(RuleExecutor.scala:208)
	at org.apache.spark.sql.catalyst.rules.RuleExecutor.$anonfun$execute$1$adapted(RuleExecutor.scala:200)
	at scala.collection.immutable.List.foreach(List.scala:431)
	at org.apache.spark.sql.catalyst.rules.RuleExecutor.execute(RuleExecutor.scala:200)
	at org.apache.spark.sql.catalyst.analysis.Analyzer.org$apache$spark$sql$catalyst$analysis$Analyzer$$executeSameContext(Analyzer.scala:231)
	at org.apache.spark.sql.catalyst.analysis.Analyzer.$anonfun$execute$1(Analyzer.scala:227)
	at org.apache.spark.sql.catalyst.analysis.AnalysisContext$.withNewAnalysisContext(Analyzer.scala:173)
	at org.apache.spark.sql.catalyst.analysis.Analyzer.execute(Analyzer.scala:227)
	at org.apache.spark.sql.catalyst.analysis.Analyzer.execute(Analyzer.scala:188)
	at org.apache.spark.sql.catalyst.rules.RuleExecutor.$anonfun$executeAndTrack$1(RuleExecutor.scala:179)
	at org.apache.spark.sql.catalyst.QueryPlanningTracker$.withTracker(QueryPlanningTracker.scala:88)
	at org.apache.spark.sql.catalyst.rules.RuleExecutor.executeAndTrack(RuleExecutor.scala:179)
	at org.apache.spark.sql.catalyst.analysis.Analyzer.$anonfun$executeAndCheck$1(Analyzer.scala:212)
	at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$.markInAnalyzer(AnalysisHelper.scala:330)
	at org.apache.spark.sql.catalyst.analysis.Analyzer.executeAndCheck(Analyzer.scala:211)
	at org.apache.spark.sql.execution.QueryExecution.$anonfun$analyzed$1(QueryExecution.scala:76)
	at org.apache.spark.sql.catalyst.QueryPlanningTracker.measurePhase(QueryPlanningTracker.scala:111)
	at org.apache.spark.sql.execution.QueryExecution.$anonfun$executePhase$2(QueryExecution.scala:185)
	at org.apache.spark.sql.execution.QueryExecution$.withInternalError(QueryExecution.scala:510)
	at org.apache.spark.sql.execution.QueryExecution.$anonfun$executePhase$1(QueryExecution.scala:185)
	at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:779)
	at org.apache.spark.sql.execution.QueryExecution.executePhase(QueryExecution.scala:184)
	at org.apache.spark.sql.execution.QueryExecution.analyzed$lzycompute(QueryExecution.scala:76)
	at org.apache.spark.sql.execution.QueryExecution.analyzed(QueryExecution.scala:74)
	at org.apache.spark.sql.execution.QueryExecution.assertAnalyzed(QueryExecution.scala:66)
	at org.apache.spark.sql.Dataset$.$anonfun$ofRows$2(Dataset.scala:98)
	at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:779)
	at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:96)
	at org.apache.spark.sql.SparkSession.$anonfun$sql$1(SparkSession.scala:622)
	at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:779)
	at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:617)
	at org.apache.kyuubi.engine.spark.operation.ExecuteStatement.$anonfun$executeStatement$1(ExecuteStatement.scala:86)
	at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
	at org.apache.kyuubi.engine.spark.operation.SparkOperation.$anonfun$withLocalProperties$1(SparkOperation.scala:147)
	at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:169)
	at org.apache.kyuubi.engine.spark.operation.SparkOperation.withLocalProperties(SparkOperation.scala:131)
	at org.apache.kyuubi.engine.spark.operation.ExecuteStatement.executeStatement(ExecuteStatement.scala:81)
	at org.apache.kyuubi.engine.spark.operation.ExecuteStatement$$anon$1.run(ExecuteStatement.scala:103)
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:750)
25/01/15 10:42:07 DEBUG IsolatedClientLoader: hive class: org.apache.iceberg.exceptions.NoSuchIcebergTableException - jar:file:/data01/hadoop/yarn/local/usercache/hdfs/filecache/188/__spark_libs__6695113016244933679.zip/iceberg-spark-runtime-3.3_2.12-1.4.3.jar!/org/apache/iceberg/exceptions/NoSuchIcebergTableException.class
25/01/15 10:42:07 DEBUG IsolatedClientLoader: hive class: org.apache.iceberg.TableMetadata - jar:file:/data01/hadoop/yarn/local/usercache/hdfs/filecache/188/__spark_libs__6695113016244933679.zip/iceberg-spark-runtime-3.3_2.12-1.4.3.jar!/org/apache/iceberg/TableMetadata.class
25/01/15 10:42:07 DEBUG IsolatedClientLoader: hive class: org.apache.iceberg.relocated.com.google.common.base.Objects - jar:file:/data01/hadoop/yarn/local/usercache/hdfs/filecache/188/__spark_libs__6695113016244933679.zip/iceberg-spark-runtime-3.3_2.12-1.4.3.jar!/org/apache/iceberg/relocated/com/google/common/base/Objects.class
25/01/15 10:42:07 DEBUG IsolatedClientLoader: hive class: org.apache.iceberg.relocated.com.google.common.base.ExtraObjectsMethodsForWeb - jar:file:/data01/hadoop/yarn/local/usercache/hdfs/filecache/188/__spark_libs__6695113016244933679.zip/iceberg-spark-runtime-3.3_2.12-1.4.3.jar!/org/apache/iceberg/relocated/com/google/common/base/ExtraObjectsMethodsForWeb.class
25/01/15 10:42:07 DEBUG IsolatedClientLoader: shared class: java.util.concurrent.atomic.AtomicReference

@wForget
Copy link
Member

wForget commented Jan 15, 2025

You may also need to configure same token signature for hive/iceberg using secondary HMS, like:

spark.sql.catalog.hive_catalog.hive.metastore.token.signature=hms2;

spark.sql.catalog.iceberg_catalog.hive.metastore.token.signature=hms2;

@zxl-333
Copy link
Author

zxl-333 commented Jan 15, 2025

You may also need to configure same token signature for hive/iceberg using secondary HMS, like:

spark.sql.catalog.hive_catalog.hive.metastore.token.signature=hms2;

spark.sql.catalog.iceberg_catalog.hive.metastore.token.signature=hms2;

I already configured it. It's no use

@zxl-333
Copy link
Author

zxl-333 commented Jan 15, 2025

When I use the following configuration, the local iceberg table can be queried, but the remote iceberg table can see that metadata cannot be queried, and the log display is normal

-------------------begin-----
spark-defaults.conf
spark.kerberos.access.hadoopFileSystems hdfs://myns,hdfs://mynsbackup
spark.sql.catalog.hive_catalog org.apache.kyuubi.spark.connector.hive.HiveTableCatalog
spark.sql.catalog.hive_catalog.hive.metastore.uris thrift://bigdata-1734405115-lhalh:9083,thrift://bigdata-1734405115-0xt70:9083
spark.sql.catalog.hive_catalog.hive.metastore.token.signature=thrift://bigdata-1734405115-lhalh:9083,thrift://bigdata-1734405115-0xt70:9083

#local cluster metastore
spark.sql.catalog.spark_catalog=org.apache.iceberg.spark.SparkSessionCatalog
#spark.sql.catalog.spark_catalog=org.apache.iceberg.spark.SparkCatalog
spark.sql.catalog.spark_catalog.type=hive
spark.sql.catalog.spark_catalog.uri=thrift://bigdata-1734358521-u7gjy:9083,thrift://bigdata-1734358521-gsy9x:9083

#an other cluster metastore,remote metastore
spark.sql.catalog.spark_catalog_ky=org.apache.iceberg.spark.SparkCatalog
spark.sql.catalog.spark_catalog_ky.type=hive
spark.sql.catalog.spark_catalog_ky.uri=thrift://bigdata-1734405115-lhalh:9083,thrift://bigdata-1734405115-0xt70:9083

spark.sql.catalog.spark_catalog_ky=org.apache.kyuubi.spark.connector.hive.HiveTableCatalog
spark.sql.catalog.spark_catalog_ky.hive.metastore.uris=thrift://bigdata-1734405115-lhalh:9083,thrift://bigdata-1734405115-0xt70:9083
spark.sql.catalog.spark_catalog_ky.hive.metastore.kerberos.principal=hive/[email protected]
spark.sql.catalog.spark_catalog_ky.hive.metastore.sasl.enabled=true
spark.sql.catalog.spark_catalog_ky.hive.metastore.token.signature=thrift://bigdata-1734405115-lhalh:9083,thrift://bigdata-1734405115-0xt70:9083

------------------end-----------------

I see the following information in kyuubi's engine log

25/01/15 19:03:49 WARN org.apache.hadoop.hive.conf.HiveConf: HiveConf of name hive.privilege.synchronizer does not exist
25/01/15 19:03:49 WARN org.apache.hadoop.hive.conf.HiveConf: HiveConf of name hive.server2.webui.cors.allowed.headers does not exist
25/01/15 19:03:49 WARN org.apache.hadoop.hive.conf.HiveConf: HiveConf of name hive.hook.proto.base-directory does not exist
25/01/15 19:03:49 WARN org.apache.hadoop.hive.conf.HiveConf: HiveConf of name hive.service.metrics.codahale.reporter.classes does not exist
25/01/15 19:03:49 WARN org.apache.hadoop.hive.conf.HiveConf: HiveConf of name hive.metastore.db.type does not exist
25/01/15 19:03:49 WARN org.apache.hadoop.hive.conf.HiveConf: HiveConf of name hive.metastore.warehouse.external.dir does not exist
25/01/15 19:03:49 WARN org.apache.hadoop.hive.conf.HiveConf: HiveConf of name hive.server2.webui.enable.cors does not exist
25/01/15 19:03:49 INFO org.apache.kyuubi.spark.connector.hive.KyuubiHiveConnectorDelegationTokenProvider: Getting Hive delegation token for hdfs against hive/[email protected] at thrift://bigdata-1734405115-lhalh:9083,thrift://bigdata-1734405115-0xt70:9083
25/01/15 19:03:49 DEBUG org.apache.hadoop.security.UserGroupInformation: PrivilegedAction [as: hive/bigdata-1734358521-cpg0o@MR.733E690FE0A842A2A587A467C9A50520.YUN.CN (auth:KERBEROS)][action: org.apache.kyuubi.spark.connector.hive.KyuubiHiveConnectorDelegationTokenProvider$$anon$1@674da77b]
java.lang.Exception
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1875)
at org.apache.kyuubi.spark.connector.hive.KyuubiHiveConnectorDelegationTokenProvider.doAsRealUser(KyuubiHiveConnectorDelegationTokenProvider.scala:188)
at org.apache.kyuubi.spark.connector.hive.KyuubiHiveConnectorDelegationTokenProvider.$anonfun$obtainDelegationTokens$7(KyuubiHiveConnectorDelegationTokenProvider.scala:153)
at org.apache.kyuubi.spark.connector.hive.KyuubiHiveConnectorDelegationTokenProvider.$anonfun$obtainDelegationTokens$7$adapted(KyuubiHiveConnectorDelegationTokenProvider.scala:136)
at scala.Option.foreach(Option.scala:407)
at org.apache.kyuubi.spark.connector.hive.KyuubiHiveConnectorDelegationTokenProvider.$anonfun$obtainDelegationTokens$6(KyuubiHiveConnectorDelegationTokenProvider.scala:136)
at org.apache.spark.util.Utils$.tryLogNonFatalError(Utils.scala:1484)
at org.apache.kyuubi.spark.connector.hive.KyuubiHiveConnectorDelegationTokenProvider.$anonfun$obtainDelegationTokens$5(KyuubiHiveConnectorDelegationTokenProvider.scala:136)
at org.apache.kyuubi.spark.connector.hive.KyuubiHiveConnectorDelegationTokenProvider.$anonfun$obtainDelegationTokens$5$adapted(KyuubiHiveConnectorDelegationTokenProvider.scala:133)
at scala.collection.immutable.Set$Set2.foreach(Set.scala:181)
at org.apache.kyuubi.spark.connector.hive.KyuubiHiveConnectorDelegationTokenProvider.obtainDelegationTokens(KyuubiHiveConnectorDelegationTokenProvider.scala:133)
at org.apache.spark.deploy.security.HadoopDelegationTokenManager.$anonfun$obtainDelegationTokens$2(HadoopDelegationTokenManager.scala:164)
at scala.collection.TraversableLike.$anonfun$flatMap$1(TraversableLike.scala:293)
at scala.collection.Iterator.foreach(Iterator.scala:943)
at scala.collection.Iterator.foreach$(Iterator.scala:943)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1431)
at scala.collection.MapLike$DefaultValuesIterable.foreach(MapLike.scala:214)
at scala.collection.TraversableLike.flatMap(TraversableLike.scala:293)
at scala.collection.TraversableLike.flatMap$(TraversableLike.scala:290)
at scala.collection.AbstractTraversable.flatMap(Traversable.scala:108)
at org.apache.spark.deploy.security.HadoopDelegationTokenManager.org$apache$spark$deploy$security$HadoopDelegationTokenManager$$obtainDelegationTokens(HadoopDelegationTokenManager.scala:162)
at org.apache.spark.deploy.security.HadoopDelegationTokenManager$$anon$2.run(HadoopDelegationTokenManager.scala:148)
at org.apache.spark.deploy.security.HadoopDelegationTokenManager$$anon$2.run(HadoopDelegationTokenManager.scala:146)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1878)
at org.apache.spark.deploy.security.HadoopDelegationTokenManager.obtainDelegationTokens(HadoopDelegationTokenManager.scala:146)
at org.apache.spark.deploy.yarn.Client.setupSecurityToken(Client.scala:352)
at org.apache.spark.deploy.yarn.Client.createContainerLaunchContext(Client.scala:1140)
at org.apache.spark.deploy.yarn.Client.submitApplication(Client.scala:220)
at org.apache.spark.deploy.yarn.Client.run(Client.scala:1327)
at org.apache.spark.deploy.yarn.YarnClusterApplication.start(Client.scala:1764)
at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:984)
at org.apache.spark.deploy.SparkSubmit$$anon$1.run(SparkSubmit.scala:175)
at org.apache.spark.deploy.SparkSubmit$$anon$1.run(SparkSubmit.scala:173)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1878)
at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:173)
at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:214)
at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90)
at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1072)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1081)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
25/01/15 19:03:49 INFO hive.metastore: Trying to connect to metastore with URI thrift://bigdata-1734405115-0xt70:9083
25/01/15 19:03:49 DEBUG org.apache.hadoop.security.UserGroupInformation: PrivilegedAction [as: hive/bigdata-1734358521-cpg0o@MR.733E690FE0A842A2A587A467C9A50520.YUN.CN (auth:KERBEROS)][action: org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge$Client$1@5934ca1e]
java.lang.Exception
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1875)
at org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge$Client.createClientTransport(HadoopThriftAuthBridge.java:208)
at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.open(HiveMetaStoreClient.java:432)
at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.(HiveMetaStoreClient.java:245)
at org.apache.kyuubi.spark.connector.hive.KyuubiHiveConnectorDelegationTokenProvider.$anonfun$obtainDelegationTokens$10(KyuubiHiveConnectorDelegationTokenProvider.scala:154)
at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
at org.apache.kyuubi.spark.connector.hive.KyuubiHiveConnectorDelegationTokenProvider$$anon$1.run(KyuubiHiveConnectorDelegationTokenProvider.scala:189)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1878)
at org.apache.kyuubi.spark.connector.hive.KyuubiHiveConnectorDelegationTokenProvider.doAsRealUser(KyuubiHiveConnectorDelegationTokenProvider.scala:188)
at org.apache.kyuubi.spark.connector.hive.KyuubiHiveConnectorDelegationTokenProvider.$anonfun$obtainDelegationTokens$7(KyuubiHiveConnectorDelegationTokenProvider.scala:153)
at org.apache.kyuubi.spark.connector.hive.KyuubiHiveConnectorDelegationTokenProvider.$anonfun$obtainDelegationTokens$7$adapted(KyuubiHiveConnectorDelegationTokenProvider.scala:136)
at scala.Option.foreach(Option.scala:407)
at org.apache.kyuubi.spark.connector.hive.KyuubiHiveConnectorDelegationTokenProvider.$anonfun$obtainDelegationTokens$6(KyuubiHiveConnectorDelegationTokenProvider.scala:136)
at org.apache.spark.util.Utils$.tryLogNonFatalError(Utils.scala:1484)
at org.apache.kyuubi.spark.connector.hive.KyuubiHiveConnectorDelegationTokenProvider.$anonfun$obtainDelegationTokens$5(KyuubiHiveConnectorDelegationTokenProvider.scala:136)
at org.apache.kyuubi.spark.connector.hive.KyuubiHiveConnectorDelegationTokenProvider.$anonfun$obtainDelegationTokens$5$adapted(KyuubiHiveConnectorDelegationTokenProvider.scala:133)
at scala.collection.immutable.Set$Set2.foreach(Set.scala:181)
at org.apache.kyuubi.spark.connector.hive.KyuubiHiveConnectorDelegationTokenProvider.obtainDelegationTokens(KyuubiHiveConnectorDelegationTokenProvider.scala:133)
at org.apache.spark.deploy.security.HadoopDelegationTokenManager.$anonfun$obtainDelegationTokens$2(HadoopDelegationTokenManager.scala:164)
at scala.collection.TraversableLike.$anonfun$flatMap$1(TraversableLike.scala:293)
at scala.collection.Iterator.foreach(Iterator.scala:943)
at scala.collection.Iterator.foreach$(Iterator.scala:943)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1431)
at scala.collection.MapLike$DefaultValuesIterable.foreach(MapLike.scala:214)
at scala.collection.TraversableLike.flatMap(TraversableLike.scala:293)
at scala.collection.TraversableLike.flatMap$(TraversableLike.scala:290)
at scala.collection.AbstractTraversable.flatMap(Traversable.scala:108)
at org.apache.spark.deploy.security.HadoopDelegationTokenManager.org$apache$spark$deploy$security$HadoopDelegationTokenManager$$obtainDelegationTokens(HadoopDelegationTokenManager.scala:162)
at org.apache.spark.deploy.security.HadoopDelegationTokenManager$$anon$2.run(HadoopDelegationTokenManager.scala:148)
at org.apache.spark.deploy.security.HadoopDelegationTokenManager$$anon$2.run(HadoopDelegationTokenManager.scala:146)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1878)
at org.apache.spark.deploy.security.HadoopDelegationTokenManager.obtainDelegationTokens(HadoopDelegationTokenManager.scala:146)
at org.apache.spark.deploy.yarn.Client.setupSecurityToken(Client.scala:352)
at org.apache.spark.deploy.yarn.Client.createContainerLaunchContext(Client.scala:1140)
at org.apache.spark.deploy.yarn.Client.submitApplication(Client.scala:220)
at org.apache.spark.deploy.yarn.Client.run(Client.scala:1327)
at org.apache.spark.deploy.yarn.YarnClusterApplication.start(Client.scala:1764)
at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:984)
at org.apache.spark.deploy.SparkSubmit$$anon$1.run(SparkSubmit.scala:175)
at org.apache.spark.deploy.SparkSubmit$$anon$1.run(SparkSubmit.scala:173)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1878)
at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:173)
at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:214)
at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90)
at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1072)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1081)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)

@wForget
Copy link
Member

wForget commented Jan 16, 2025

Did you repeatedly configure spark_catalog_ky catalog? You should remove spark.sql.catalog.spark_catalog_ky=org.apache.kyuubi.spark.connector.hive.HiveTableCatalog

Image

@zxl-333
Copy link
Author

zxl-333 commented Jan 16, 2025

Did you repeatedly configure spark_catalog_ky catalog? You should remove spark.sql.catalog.spark_catalog_ky=org.apache.kyuubi.spark.connector.hive.HiveTableCatalog

Image

When I remove the spark.sql.catalog.spark_catalog_ky=org.apache.kyuubi.spark.connector.hive.HiveTableCatalog Spark. The hive. HiveTableCatalog, throws connection metastore failed anomalies

first method -----------------------
spark-defaults.conf
#---------------begin--------------
spark.kerberos.access.hadoopFileSystems hdfs://myns,hdfs://mynsbackup
spark.sql.catalog.hive_catalog org.apache.kyuubi.spark.connector.hive.HiveTableCatalog
spark.sql.catalog.hive_catalog.hive.metastore.uris thrift://bigdata-1734405115-lhalh:9083,thrift://bigdata-1734405115-0xt70:9083
#spark.sql.catalog.hive_catalog.hive.metastore.token.signature=thrift://bigdata-1734405115-lhalh:9083,thrift://bigdata-1734405115-0xt70:9083

#local cluster metastore
spark.sql.catalog.spark_catalog=org.apache.iceberg.spark.SparkSessionCatalog
#spark.sql.catalog.spark_catalog=org.apache.iceberg.spark.SparkCatalog
spark.sql.catalog.spark_catalog.type=hive
spark.sql.catalog.spark_catalog.uri=thrift://bigdata-1734358521-u7gjy:9083,thrift://bigdata-1734358521-gsy9x:9083
#配置下面是为了解决当前iceberg的uri和HiveConf.ConfVars.METASTOREURIS不相等问题
#spark.sql.catalog.spark_catalog org.apache.kyuubi.spark.connector.hive.HiveTableCatalog
#spark.sql.catalog.spark_catalog.hive.metastore.uris thrift://bigdata-1734358521-u7gjy:9083,thrift://bigdata-1734358521-gsy9x:9083

#an other cluster metastore
spark.sql.catalog.spark_catalog_ky=org.apache.iceberg.spark.SparkCatalog
spark.sql.catalog.spark_catalog_ky.type=hive
spark.sql.catalog.spark_catalog_ky.uri=thrift://bigdata-1734405115-lhalh:9083

#spark.sql.catalog.spark_catalog_ky=org.apache.kyuubi.spark.connector.hive.HiveTableCatalog
#spark.sql.catalog.spark_catalog_ky.hive.metastore.uris=thrift://bigdata-1734405115-lhalh:9083,thrift://bigdata-1734405115-0xt70:9083
#spark.sql.catalog.spark_catalog_ky.hive.metastore.kerberos.principal=hive/[email protected]
#spark.sql.catalog.spark_catalog_ky.hive.metastore.sasl.enabled=true
#spark.sql.catalog.spark_catalog_ky.hive.metastore.token.signature=thrift://bigdata-1734405115-lhalh:9083,thrift://bigdata-1734405115-0xt70:9083
#--------------end----------------

exception:
25/01/16 15:57:42 INFO metastore: Trying to connect to metastore with URI thrift://bigdata-1734405115-lhalh:9083
25/01/16 15:57:42 WARN metastore: Failed to connect to the MetaStore Server...
25/01/16 15:57:42 INFO metastore: Waiting 3 seconds before next connection attempt.
2025-01-16 15:57:45.449 INFO KyuubiSessionManager-exec-pool: Thread-93 org.apache.kyuubi.operation.ExecuteStatement: Query[f7e6ec20-2ec4-47dc-be3a-1989a3b46bfd] in RUNNING_STATE
25/01/16 15:57:45 INFO DAGScheduler: Asked to cancel job group f7e6ec20-2ec4-47dc-be3a-1989a3b46bfd
25/01/16 15:57:45 ERROR ExecuteStatement: Error operating ExecuteStatement: org.apache.iceberg.hive.RuntimeMetaException: Failed to connect to Hive Metastore
at org.apache.iceberg.hive.HiveClientPool.newClient(HiveClientPool.java:84)
at org.apache.iceberg.hive.HiveClientPool.newClient(HiveClientPool.java:34)
at org.apache.iceberg.ClientPoolImpl.get(ClientPoolImpl.java:125)
at org.apache.iceberg.ClientPoolImpl.run(ClientPoolImpl.java:56)
at org.apache.iceberg.ClientPoolImpl.run(ClientPoolImpl.java:51)
at org.apache.iceberg.hive.CachedClientPool.run(CachedClientPool.java:122)
at org.apache.iceberg.hive.HiveTableOperations.doRefresh(HiveTableOperations.java:158)
at org.apache.iceberg.BaseMetastoreTableOperations.refresh(BaseMetastoreTableOperations.java:97)
at org.apache.iceberg.BaseMetastoreTableOperations.current(BaseMetastoreTableOperations.java:80)
at org.apache.iceberg.BaseMetastoreCatalog.loadTable(BaseMetastoreCatalog.java:47)
at org.apache.iceberg.shaded.com.github.benmanes.caffeine.cache.BoundedLocalCache.lambda$doComputeIfAbsent$14(BoundedLocalCache.java:2406)
at java.util.concurrent.ConcurrentHashMap.compute(ConcurrentHashMap.java:1853)
at org.apache.iceberg.shaded.com.github.benmanes.caffeine.cache.BoundedLocalCache.doComputeIfAbsent(BoundedLocalCache.java:2404)
at org.apache.iceberg.shaded.com.github.benmanes.caffeine.cache.BoundedLocalCache.computeIfAbsent(BoundedLocalCache.java:2387)
at org.apache.iceberg.shaded.com.github.benmanes.caffeine.cache.LocalCache.computeIfAbsent(LocalCache.java:108)
at org.apache.iceberg.shaded.com.github.benmanes.caffeine.cache.LocalManualCache.get(LocalManualCache.java:62)
at org.apache.iceberg.CachingCatalog.loadTable(CachingCatalog.java:166)
at org.apache.iceberg.spark.SparkCatalog.load(SparkCatalog.java:642)
at org.apache.iceberg.spark.SparkCatalog.loadTable(SparkCatalog.java:160)
at org.apache.spark.sql.connector.catalog.CatalogV2Util$.loadTable(CatalogV2Util.scala:311)
at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$.$anonfun$lookupRelation$3(Analyzer.scala:1197)
at scala.Option.orElse(Option.scala:447)
at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$.$anonfun$lookupRelation$1(Analyzer.scala:1196)
at scala.Option.orElse(Option.scala:447)
at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$.org$apache$spark$sql$catalyst$analysis$Analyzer$ResolveRelations$$lookupRelation(Analyzer.scala:1188)
at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$$anonfun$apply$13.applyOrElse(Analyzer.scala:1059)
at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$$anonfun$apply$13.applyOrElse(Analyzer.scala:1023)
at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.$anonfun$resolveOperatorsUpWithPruning$3(AnalysisHelper.scala:138)
at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:176)
at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.$anonfun$resolveOperatorsUpWithPruning$1(AnalysisHelper.scala:138)
at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$.allowInvokingTransformsInAnalyzer(AnalysisHelper.scala:323)
at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.resolveOperatorsUpWithPruning(AnalysisHelper.scala:134)
at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.resolveOperatorsUpWithPruning$(AnalysisHelper.scala:130)
at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.resolveOperatorsUpWithPruning(LogicalPlan.scala:30)
at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.$anonfun$resolveOperatorsUpWithPruning$2(AnalysisHelper.scala:135)
at org.apache.spark.sql.catalyst.trees.UnaryLike.mapChildren(TreeNode.scala:1228)
at org.apache.spark.sql.catalyst.trees.UnaryLike.mapChildren$(TreeNode.scala:1227)
at org.apache.spark.sql.catalyst.plans.logical.Aggregate.mapChildren(basicLogicalOperators.scala:977)
at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.$anonfun$resolveOperatorsUpWithPruning$1(AnalysisHelper.scala:135)
at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$.allowInvokingTransformsInAnalyzer(AnalysisHelper.scala:323)
at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.resolveOperatorsUpWithPruning(AnalysisHelper.scala:134)
at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.resolveOperatorsUpWithPruning$(AnalysisHelper.scala:130)
at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.resolveOperatorsUpWithPruning(LogicalPlan.scala:30)
at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$.apply(Analyzer.scala:1023)
at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$.apply(Analyzer.scala:982)
at org.apache.spark.sql.catalyst.rules.RuleExecutor.$anonfun$execute$2(RuleExecutor.scala:211)
at scala.collection.LinearSeqOptimized.foldLeft(LinearSeqOptimized.scala:126)
at scala.collection.LinearSeqOptimized.foldLeft$(LinearSeqOptimized.scala:122)
at scala.collection.immutable.List.foldLeft(List.scala:91)
at org.apache.spark.sql.catalyst.rules.RuleExecutor.$anonfun$execute$1(RuleExecutor.scala:208)
at org.apache.spark.sql.catalyst.rules.RuleExecutor.$anonfun$execute$1$adapted(RuleExecutor.scala:200)
at scala.collection.immutable.List.foreach(List.scala:431)
at org.apache.spark.sql.catalyst.rules.RuleExecutor.execute(RuleExecutor.scala:200)
at org.apache.spark.sql.catalyst.analysis.Analyzer.org$apache$spark$sql$catalyst$analysis$Analyzer$$executeSameContext(Analyzer.scala:231)
at org.apache.spark.sql.catalyst.analysis.Analyzer.$anonfun$execute$1(Analyzer.scala:227)
at org.apache.spark.sql.catalyst.analysis.AnalysisContext$.withNewAnalysisContext(Analyzer.scala:173)
at org.apache.spark.sql.catalyst.analysis.Analyzer.execute(Analyzer.scala:227)
at org.apache.spark.sql.catalyst.analysis.Analyzer.execute(Analyzer.scala:188)
at org.apache.spark.sql.catalyst.rules.RuleExecutor.$anonfun$executeAndTrack$1(RuleExecutor.scala:179)
at org.apache.spark.sql.catalyst.QueryPlanningTracker$.withTracker(QueryPlanningTracker.scala:88)
at org.apache.spark.sql.catalyst.rules.RuleExecutor.executeAndTrack(RuleExecutor.scala:179)
at org.apache.spark.sql.catalyst.analysis.Analyzer.$anonfun$executeAndCheck$1(Analyzer.scala:212)
at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$.markInAnalyzer(AnalysisHelper.scala:330)
at org.apache.spark.sql.catalyst.analysis.Analyzer.executeAndCheck(Analyzer.scala:211)
at org.apache.spark.sql.execution.QueryExecution.$anonfun$analyzed$1(QueryExecution.scala:76)
at org.apache.spark.sql.catalyst.QueryPlanningTracker.measurePhase(QueryPlanningTracker.scala:111)
at org.apache.spark.sql.execution.QueryExecution.$anonfun$executePhase$2(QueryExecution.scala:185)
at org.apache.spark.sql.execution.QueryExecution$.withInternalError(QueryExecution.scala:510)
at org.apache.spark.sql.execution.QueryExecution.$anonfun$executePhase$1(QueryExecution.scala:185)
at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:779)
at org.apache.spark.sql.execution.QueryExecution.executePhase(QueryExecution.scala:184)
at org.apache.spark.sql.execution.QueryExecution.analyzed$lzycompute(QueryExecution.scala:76)
at org.apache.spark.sql.execution.QueryExecution.analyzed(QueryExecution.scala:74)
at org.apache.spark.sql.execution.QueryExecution.assertAnalyzed(QueryExecution.scala:66)
at org.apache.spark.sql.Dataset$.$anonfun$ofRows$2(Dataset.scala:98)
at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:779)
at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:96)
at org.apache.spark.sql.SparkSession.$anonfun$sql$1(SparkSession.scala:622)
at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:779)
at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:617)
at org.apache.kyuubi.engine.spark.operation.ExecuteStatement.$anonfun$executeStatement$1(ExecuteStatement.scala:86)
at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
at org.apache.kyuubi.engine.spark.operation.SparkOperation.$anonfun$withLocalProperties$1(SparkOperation.scala:147)
at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:169)
at org.apache.kyuubi.engine.spark.operation.SparkOperation.withLocalProperties(SparkOperation.scala:131)
at org.apache.kyuubi.engine.spark.operation.ExecuteStatement.executeStatement(ExecuteStatement.scala:81)
at org.apache.kyuubi.engine.spark.operation.ExecuteStatement$$anon$1.run(ExecuteStatement.scala:103)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:750)
Caused by: java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.metastore.HiveMetaStoreClient
at org.apache.hadoop.hive.metastore.MetaStoreUtils.newInstance(MetaStoreUtils.java:1742)
at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.(RetryingMetaStoreClient.java:83)
at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:133)
at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:104)
at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:97)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.iceberg.common.DynMethods$UnboundMethod.invokeChecked(DynMethods.java:60)
at org.apache.iceberg.common.DynMethods$UnboundMethod.invoke(DynMethods.java:72)
at org.apache.iceberg.common.DynMethods$StaticMethod.invoke(DynMethods.java:185)
at org.apache.iceberg.hive.HiveClientPool.newClient(HiveClientPool.java:63)
... 91 more
Caused by: java.lang.reflect.InvocationTargetException
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at org.apache.hadoop.hive.metastore.MetaStoreUtils.newInstance(MetaStoreUtils.java:1740)
... 103 more

second method --------------
When both of my catalogs are configured, it is normal to connect to metastore. However, when reading tables, kyuubi spark hive scan is always used, and the iceberg scan cannot be used to query data
#---------------begin--------------
spark.kerberos.access.hadoopFileSystems hdfs://myns,hdfs://mynsbackup
spark.sql.catalog.hive_catalog org.apache.kyuubi.spark.connector.hive.HiveTableCatalog
spark.sql.catalog.hive_catalog.hive.metastore.uris thrift://bigdata-1734405115-lhalh:9083,thrift://bigdata-1734405115-0xt70:9083
#spark.sql.catalog.hive_catalog.hive.metastore.token.signature=thrift://bigdata-1734405115-lhalh:9083,thrift://bigdata-1734405115-0xt70:9083

#local cluster metastore
spark.sql.catalog.spark_catalog=org.apache.iceberg.spark.SparkSessionCatalog
#spark.sql.catalog.spark_catalog=org.apache.iceberg.spark.SparkCatalog
spark.sql.catalog.spark_catalog.type=hive
spark.sql.catalog.spark_catalog.uri=thrift://bigdata-1734358521-u7gjy:9083,thrift://bigdata-1734358521-gsy9x:9083
#配置下面是为了解决当前iceberg的uri和HiveConf.ConfVars.METASTOREURIS不相等问题
#spark.sql.catalog.spark_catalog org.apache.kyuubi.spark.connector.hive.HiveTableCatalog
#spark.sql.catalog.spark_catalog.hive.metastore.uris thrift://bigdata-1734358521-u7gjy:9083,thrift://bigdata-1734358521-gsy9x:9083

#an other cluster metastore
spark.sql.catalog.spark_catalog_ky=org.apache.iceberg.spark.SparkCatalog
spark.sql.catalog.spark_catalog_ky.type=hive
spark.sql.catalog.spark_catalog_ky.uri=thrift://bigdata-1734405115-lhalh:9083

spark.sql.catalog.spark_catalog_ky=org.apache.kyuubi.spark.connector.hive.HiveTableCatalog
#spark.sql.catalog.spark_catalog_ky.hive.metastore.uris=thrift://bigdata-1734405115-lhalh:9083,thrift://bigdata-1734405115-0xt70:9083
#spark.sql.catalog.spark_catalog_ky.hive.metastore.kerberos.principal=hive/[email protected]
#spark.sql.catalog.spark_catalog_ky.hive.metastore.sasl.enabled=true
#spark.sql.catalog.spark_catalog_ky.hive.metastore.token.signature=thrift://bigdata-1734405115-lhalh:9083,thrift://bigdata-1734405115-0xt70:9083
#--------------end----------------

**sql:**select count(*),k from spark_catalog_ky.default.test_iceberg_1_backup group by k
The sql execution plan is as follows:
| == Physical Plan ==
AdaptiveSparkPlan isFinalPlan=false
+- HashAggregate(keys=[k#32], functions=[count(1)])
+- Exchange hashpartitioning(k#32, 800), ENSURE_REQUIREMENTS, [plan_id=65]
+- HashAggregate(keys=[k#32], functions=[partial_count(1)])
+- Project [k#32]
+- BatchScan[k#32] HiveScan DataFilters: [], Format: hive, Location: HiveCatalogFileIndex(1 paths)[hdfs://mynsbackup/warehouse/tablespace/managed/hive/test_iceberg..., PartitionFilters: [], ReadSchema: struct<k:string> RuntimeFilters: []

|

**sql:**select count(*),k from spark_catalog.default.test_iceberg group by k;
As local iceberg can see, the execution plan is as follows
+----------------------------------------------------+
| plan |
+----------------------------------------------------+
| == Physical Plan ==
AdaptiveSparkPlan isFinalPlan=false
+- HashAggregate(keys=[k#31], functions=[count(1)])
+- HashAggregate(keys=[k#31], functions=[partial_count(1)])
+- BatchScan[k#31] spark_catalog.default.test_iceberg (branch=null) [filters=, groupedBy=] RuntimeFilters: []

|
+----------------------------------------------------+

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind:bug This is a clearly a bug priority:major
Projects
None yet
Development

No branches or pull requests

3 participants