[Bug] Kyuubi Spark authorization plugin with Iceberg tables on Iceberg snapshot retrieval Permission denied #5803

elisabetao · 2023-12-01T18:33:19Z

Code of Conduct

I agree to follow this project's Code of Conduct

Search before asking

I have searched in the issues and found no similar issues.

Describe the bug

When using Ranger hive as source for Kyuubi Spark authorization plugin with Iceberg tables we're getting "Permission denied" on Iceberg snapshot ID data retrieval, like in the example below:
"select * from iceberg.test.customers.snapshot_id_7801393477815178085",although in Ranger corresponding account has select and read rights on the test database, we are getting the following error
An error was encountered:

An error occurred while calling o165.toJavaRDD.
: org.apache.kyuubi.plugin.spark.authz.AccessControlException: Permission denied: user [svc_df_big-st] does not have [select] privilege on [test.customers/snapshot_id_7801393477815178085/id]
	at org.apache.kyuubi.plugin.spark.authz.ranger.SparkRangerAdminPlugin$.verify(SparkRangerAdminPlugin.scala:172)
	at org.apache.kyuubi.plugin.spark.authz.ranger.RuleAuthorization$.$anonfun$checkPrivileges$5(RuleAuthorization.scala:93)
	at org.apache.kyuubi.plugin.spark.authz.ranger.RuleAuthorization$.$anonfun$checkPrivileges$5$adapted(RuleAuthorization.scala:92)
	at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)
	at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55)
	at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49)
	at org.apache.kyuubi.plugin.spark.authz.ranger.RuleAuthorization$.org$apache$kyuubi$plugin$spark$authz$ranger$RuleAuthorization$$checkPrivileges(RuleAuthorization.scala:92)
	at org.apache.kyuubi.plugin.spark.authz.ranger.RuleAuthorization.apply(RuleAuthorization.scala:37)
	at org.apache.kyuubi.plugin.spark.authz.ranger.RuleAuthorization.apply(RuleAuthorization.scala:33)
	at org.apache.spark.sql.catalyst.rules.RuleExecutor.$anonfun$execute$2(RuleExecutor.scala:211)
	at scala.collection.LinearSeqOptimized.foldLeft(LinearSeqOptimized.scala:126)
	at scala.collection.LinearSeqOptimized.foldLeft$(LinearSeqOptimized.scala:122)
	at scala.collection.immutable.List.foldLeft(List.scala:91)
	at org.apache.spark.sql.catalyst.rules.RuleExecutor.$anonfun$execute$1(RuleExecutor.scala:208)
	at org.apache.spark.sql.catalyst.rules.RuleExecutor.$anonfun$execute$1$adapted(RuleExecutor.scala:200)
	at scala.collection.immutable.List.foreach(List.scala:431)
	at org.apache.spark.sql.catalyst.rules.RuleExecutor.execute(RuleExecutor.scala:200)
	at org.apache.spark.sql.catalyst.rules.RuleExecutor.$anonfun$executeAndTrack$1(RuleExecutor.scala:179)
	at org.apache.spark.sql.catalyst.QueryPlanningTracker$.withTracker(QueryPlanningTracker.scala:88)
	at org.apache.spark.sql.catalyst.rules.RuleExecutor.executeAndTrack(RuleExecutor.scala:179)
	at org.apache.spark.sql.execution.QueryExecution.$anonfun$optimizedPlan$1(QueryExecution.scala:125)
	at org.apache.spark.sql.catalyst.QueryPlanningTracker.measurePhase(QueryPlanningTracker.scala:111)
	at org.apache.spark.sql.execution.QueryExecution.$anonfun$executePhase$1(QueryExecution.scala:183)
	at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:775)
	at org.apache.spark.sql.execution.QueryExecution.executePhase(QueryExecution.scala:183)
	at org.apache.spark.sql.execution.QueryExecution.optimizedPlan$lzycompute(QueryExecution.scala:121)
	at org.apache.spark.sql.execution.QueryExecution.optimizedPlan(QueryExecution.scala:117)
	at org.apache.spark.sql.execution.QueryExecution.assertOptimized(QueryExecution.scala:135)
	at org.apache.spark.sql.execution.QueryExecution.executedPlan$lzycompute(QueryExecution.scala:153)
	at org.apache.spark.sql.execution.QueryExecution.executedPlan(QueryExecution.scala:150)
	at org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:172)
	at org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:171)
	at org.apache.spark.sql.Dataset.rdd$lzycompute(Dataset.scala:3247)
	at org.apache.spark.sql.Dataset.rdd(Dataset.scala:3245)
	at org.apache.spark.sql.Dataset.toJavaRDD(Dataset.scala:3257)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
	at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
	at py4j.Gateway.invoke(Gateway.java:282)
	at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
	at py4j.commands.CallCommand.execute(CallCommand.java:79)
	at py4j.GatewayConnection.run(GatewayConnection.java:238)
	at java.lang.Thread.run(Thread.java:748)

Traceback (most recent call last):
  File "/srv/ssd1/yarn/nm/usercache/svc_df_big-st/appcache/application_1701151368547_109009/container_e381_1701151368547_109009_01_000001/pyspark.zip/pyspark/sql/dataframe.py", line 117, in toJSON
    return RDD(rdd.toJavaRDD(), self._sc, UTF8Deserializer(use_unicode))
  File "/srv/ssd1/yarn/nm/usercache/svc_df_big-st/appcache/application_1701151368547_109009/container_e381_1701151368547_109009_01_000001/py4j-0.10.9.5-src.zip/py4j/java_gateway.py", line 1322, in __call__
    answer, self.gateway_client, self.target_id, self.name)
  File "/srv/ssd1/yarn/nm/usercache/svc_df_big-st/appcache/application_1701151368547_109009/container_e381_1701151368547_109009_01_000001/pyspark.zip/pyspark/sql/utils.py", line 111, in deco
    return f(*a, **kw)
  File "/srv/ssd1/yarn/nm/usercache/svc_df_big-st/appcache/application_1701151368547_109009/container_e381_1701151368547_109009_01_000001/py4j-0.10.9.5-src.zip/py4j/protocol.py", line 328, in get_return_value
    format(target_id, ".", name), value)
py4j.protocol.Py4JJavaError: An error occurred while calling o165.toJavaRDD.
: org.apache.kyuubi.plugin.spark.authz.AccessControlException: Permission denied: user [svc_df_big-st] does not have [select] privilege on [test.customers/snapshot_id_7801393477815178085/id]
	at org.apache.kyuubi.plugin.spark.authz.ranger.SparkRangerAdminPlugin$.verify(SparkRangerAdminPlugin.scala:172)
	at org.apache.kyuubi.plugin.spark.authz.ranger.RuleAuthorization$.$anonfun$checkPrivileges$5(RuleAuthorization.scala:93)
	at org.apache.kyuubi.plugin.spark.authz.ranger.RuleAuthorization$.$anonfun$checkPrivileges$5$adapted(RuleAuthorization.scala:92)
	at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)
	at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55)
	at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49)
	at org.apache.kyuubi.plugin.spark.authz.ranger.RuleAuthorization$.org$apache$kyuubi$plugin$spark$authz$ranger$RuleAuthorization$$checkPrivileges(RuleAuthorization.scala:92)
	at org.apache.kyuubi.plugin.spark.authz.ranger.RuleAuthorization.apply(RuleAuthorization.scala:37)
	at org.apache.kyuubi.plugin.spark.authz.ranger.RuleAuthorization.apply(RuleAuthorization.scala:33)
	at org.apache.spark.sql.catalyst.rules.RuleExecutor.$anonfun$execute$2(RuleExecutor.scala:211)
	at scala.collection.LinearSeqOptimized.foldLeft(LinearSeqOptimized.scala:126)
	at scala.collection.LinearSeqOptimized.foldLeft$(LinearSeqOptimized.scala:122)
	at scala.collection.immutable.List.foldLeft(List.scala:91)
	at org.apache.spark.sql.catalyst.rules.RuleExecutor.$anonfun$execute$1(RuleExecutor.scala:208)
	at org.apache.spark.sql.catalyst.rules.RuleExecutor.$anonfun$execute$1$adapted(RuleExecutor.scala:200)
	at scala.collection.immutable.List.foreach(List.scala:431)
	at org.apache.spark.sql.catalyst.rules.RuleExecutor.execute(RuleExecutor.scala:200)
	at org.apache.spark.sql.catalyst.rules.RuleExecutor.$anonfun$executeAndTrack$1(RuleExecutor.scala:179)
	at org.apache.spark.sql.catalyst.QueryPlanningTracker$.withTracker(QueryPlanningTracker.scala:88)
	at org.apache.spark.sql.catalyst.rules.RuleExecutor.executeAndTrack(RuleExecutor.scala:179)
	at org.apache.spark.sql.execution.QueryExecution.$anonfun$optimizedPlan$1(QueryExecution.scala:125)
	at org.apache.spark.sql.catalyst.QueryPlanningTracker.measurePhase(QueryPlanningTracker.scala:111)
	at org.apache.spark.sql.execution.QueryExecution.$anonfun$executePhase$1(QueryExecution.scala:183)
	at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:775)
	at org.apache.spark.sql.execution.QueryExecution.executePhase(QueryExecution.scala:183)
	at org.apache.spark.sql.execution.QueryExecution.optimizedPlan$lzycompute(QueryExecution.scala:121)
	at org.apache.spark.sql.execution.QueryExecution.optimizedPlan(QueryExecution.scala:117)
	at org.apache.spark.sql.execution.QueryExecution.assertOptimized(QueryExecution.scala:135)
	at org.apache.spark.sql.execution.QueryExecution.executedPlan$lzycompute(QueryExecution.scala:153)
	at org.apache.spark.sql.execution.QueryExecution.executedPlan(QueryExecution.scala:150)
	at org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:172)
	at org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:171)
	at org.apache.spark.sql.Dataset.rdd$lzycompute(Dataset.scala:3247)
	at org.apache.spark.sql.Dataset.rdd(Dataset.scala:3245)
	at org.apache.spark.sql.Dataset.toJavaRDD(Dataset.scala:3257)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
	at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
	at py4j.Gateway.invoke(Gateway.java:282)
	at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
	at py4j.commands.CallCommand.execute(CallCommand.java:79)
	at py4j.GatewayConnection.run(GatewayConnection.java:238)
	at java.lang.Thread.run(Thread.java:748)

However if the test account is granted Hive access to read all databases there's no permission issue, however the * databases read access should not be normally necessary for this access to be allowed. Is there a Kyuubi Spark plugin authorization bug preventing this?
The patch at https://github.com/apache/kyuubi/pull/3931/files doesn't seem to cover this scenario.

Thanks

Affects Version(s)

1.8.0

Kyuubi Server Log Output

No response

Kyuubi Engine Log Output

No response

Kyuubi Server Configurations

No response

Kyuubi Engine Configurations

No response

Additional context

We are using Spark Kyuubi Authorization Plugin with Spark 3.2 and Iceberg 1.0.0.1.3.1 as described here: https://kyuubi.readthedocs.io/en/master/security/authorization/spark/install.html

Are you willing to submit PR?

Yes. I would be willing to submit a PR with guidance from the Kyuubi community to fix.
No. I cannot submit a PR at this time.

The text was updated successfully, but these errors were encountered:

yaooqinn · 2023-12-05T11:33:09Z

Can you provide the plan details?

elisabetao · 2023-12-06T19:41:21Z

Hello,
Please let me know if more details are needed, this is also after applying patch #5248 724ae93
|== Physical Plan ==\n*(1) Project [id#32, name#33, age#34, address#35, cloth#36]\n+- BatchScan[id#32, name#33, age#34, address#35, cloth#36] iceberg.gns_test.customers [filters=] RuntimeFilters: []\n\n|

which appears to alleviate the access issue for iceberg.test.customers.snapshot_id_X, but introduces another issues where the metadata info like snapshots,history is freely accessible without any ranger security checks.

Thanks a lot

yaooqinn · 2023-12-07T01:49:31Z

thanks @elisabetao, we need the full plan

pravin1406 · 2025-02-13T15:37:26Z

@yaooqinn

I'm having a similar issue. We are not able to access iceberg metadata tables. Have attached table plan with and without authz plugin enabled. we have table level permission, and don't expect to give metadata table level permissions seperately ? is there in work in progress to serve this case ?

| Error occurred during query planning: | | Permission denied: user [dmu_mesh_qa1] does not have [select] privilege on [mesh_qa1_mart.Testicec1/files/content,mesh_qa1_mart.Testicec1/files/file_path,mesh_qa1_mart.Testicec1/files/file_format,mesh_qa1_mart.Testicec1/files/spec_id,mesh_qa1_mart.Testicec1/files/record_count,mesh_qa1_mart.Testicec1/files/file_size_in_bytes,mesh_qa1_mart.Testicec1/files/column_sizes,mesh_qa1_mart.Testicec1/files/value_counts,mesh_qa1_mart.Testicec1/files/null_value_counts,mesh_qa1_mart.Testicec1/files/nan_value_counts,mesh_qa1_mart.Testicec1/files/lower_bounds,mesh_qa1_mart.Testicec1/files/upper_bounds,mesh_qa1_mart.Testicec1/files/key_metadata,mesh_qa1_mart.Testicec1/files/split_offsets,mesh_qa1_mart.Testicec1/files/equality_ids,mesh_qa1_mart.Testicec1/files/sort_order_id,mesh_qa1_mart.Testicec1/files/readable_metrics] |

`
| == Parsed Logical Plan ==
'GlobalLimit 10
+- 'LocalLimit 10
+- 'Project [*]
+- 'UnresolvedRelation [mesh_qa1_mart, Testicec1, files], [], false

== Analyzed Logical Plan ==
content: int, file_path: string, file_format: string, spec_id: int, record_count: bigint, file_size_in_bytes: bigint, column_sizes: map<int,bigint>, value_counts: map<int,bigint>, null_value_counts: map<int,bigint>, nan_value_counts: map<int,bigint>, lower_bounds: map<int,binary>, upper_bounds: map<int,binary>, key_metadata: binary, split_offsets: array, equality_ids: array, sort_order_id: int
GlobalLimit 10
+- LocalLimit 10
+- Project [content#153, file_path#154, file_format#155, spec_id#156, record_count#157L, file_size_in_bytes#158L, column_sizes#159, value_counts#160, null_value_counts#161, nan_value_counts#162, lower_bounds#163, upper_bounds#164, key_metadata#165, split_offsets#166, equality_ids#167, sort_order_id#168]
+- SubqueryAlias spark_catalog.mesh_qa1_mart.Testicec1.files
+- RelationV2[content#153, file_path#154, file_format#155, spec_id#156, record_count#157L, file_size_in_bytes#158L, column_sizes#159, value_counts#160, null_value_counts#161, nan_value_counts#162, lower_bounds#163, upper_bounds#164, key_metadata#165, split_offsets#166, equality_ids#167, sort_order_id#168] spark_catalog.mesh_qa1_mart.Testicec1.files

== Optimized Logical Plan ==
GlobalLimit 10
+- LocalLimit 10
+- RelationV2[content#153, file_path#154, file_format#155, spec_id#156, record_count#157L, file_size_in_bytes#158L, column_sizes#159, value_counts#160, null_value_counts#161, nan_value_counts#162, lower_bounds#163, upper_bounds#164, key_metadata#165, split_offsets#166, equality_ids#167, sort_order_id#168] spark_catalog.mesh_qa1_mart.Testicec1.files

== Physical Plan ==
CollectLimit 10
+- *(1) Project [content#153, file_path#154, file_format#155, spec_id#156, record_count#157L, file_size_in_bytes#158L, column_sizes#159, value_counts#160, null_value_counts#161, nan_value_counts#162, lower_bounds#163, upper_bounds#164, key_metadata#165, split_offsets#166, equality_ids#167, sort_order_id#168]
+- BatchScan[content#153, file_path#154, file_format#155, spec_id#156, record_count#157L, file_size_in_bytes#158L, column_sizes#159, value_counts#160, null_value_counts#161, nan_value_counts#162, lower_bounds#163, upper_bounds#164, key_metadata#165, split_offsets#166, equality_ids#167, sort_order_id#168] spark_catalog.mesh_qa1_mart.Testicec1.files [filters=] RuntimeFilters: []
|
`

elisabetao added kind:bug This is a clearly a bug priority:major labels Dec 1, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug] Kyuubi Spark authorization plugin with Iceberg tables on Iceberg snapshot retrieval Permission denied #5803

[Bug] Kyuubi Spark authorization plugin with Iceberg tables on Iceberg snapshot retrieval Permission denied #5803

elisabetao commented Dec 1, 2023 •

edited by pan3793

Loading

yaooqinn commented Dec 5, 2023

elisabetao commented Dec 6, 2023

yaooqinn commented Dec 7, 2023

pravin1406 commented Feb 13, 2025 •

edited

Loading

[Bug] Kyuubi Spark authorization plugin with Iceberg tables on Iceberg snapshot retrieval Permission denied #5803

[Bug] Kyuubi Spark authorization plugin with Iceberg tables on Iceberg snapshot retrieval Permission denied #5803

Comments

elisabetao commented Dec 1, 2023 • edited by pan3793 Loading

Code of Conduct

Search before asking

Describe the bug

Affects Version(s)

Kyuubi Server Log Output

Kyuubi Engine Log Output

Kyuubi Server Configurations

Kyuubi Engine Configurations

Additional context

Are you willing to submit PR?

yaooqinn commented Dec 5, 2023

elisabetao commented Dec 6, 2023

yaooqinn commented Dec 7, 2023

pravin1406 commented Feb 13, 2025 • edited Loading

elisabetao commented Dec 1, 2023 •

edited by pan3793

Loading

pravin1406 commented Feb 13, 2025 •

edited

Loading