Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] Kyuubi Spark authorization plugin with Iceberg tables on Iceberg snapshot retrieval Permission denied #5803

Open
2 of 4 tasks
elisabetao opened this issue Dec 1, 2023 · 4 comments
Labels
kind:bug This is a clearly a bug priority:major

Comments

@elisabetao
Copy link

elisabetao commented Dec 1, 2023

Code of Conduct

Search before asking

  • I have searched in the issues and found no similar issues.

Describe the bug

When using Ranger hive as source for Kyuubi Spark authorization plugin with Iceberg tables we're getting "Permission denied" on Iceberg snapshot ID data retrieval, like in the example below:
"select * from iceberg.test.customers.snapshot_id_7801393477815178085",although in Ranger corresponding account has select and read rights on the test database, we are getting the following error
An error was encountered:

An error occurred while calling o165.toJavaRDD.
: org.apache.kyuubi.plugin.spark.authz.AccessControlException: Permission denied: user [svc_df_big-st] does not have [select] privilege on [test.customers/snapshot_id_7801393477815178085/id]
	at org.apache.kyuubi.plugin.spark.authz.ranger.SparkRangerAdminPlugin$.verify(SparkRangerAdminPlugin.scala:172)
	at org.apache.kyuubi.plugin.spark.authz.ranger.RuleAuthorization$.$anonfun$checkPrivileges$5(RuleAuthorization.scala:93)
	at org.apache.kyuubi.plugin.spark.authz.ranger.RuleAuthorization$.$anonfun$checkPrivileges$5$adapted(RuleAuthorization.scala:92)
	at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)
	at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55)
	at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49)
	at org.apache.kyuubi.plugin.spark.authz.ranger.RuleAuthorization$.org$apache$kyuubi$plugin$spark$authz$ranger$RuleAuthorization$$checkPrivileges(RuleAuthorization.scala:92)
	at org.apache.kyuubi.plugin.spark.authz.ranger.RuleAuthorization.apply(RuleAuthorization.scala:37)
	at org.apache.kyuubi.plugin.spark.authz.ranger.RuleAuthorization.apply(RuleAuthorization.scala:33)
	at org.apache.spark.sql.catalyst.rules.RuleExecutor.$anonfun$execute$2(RuleExecutor.scala:211)
	at scala.collection.LinearSeqOptimized.foldLeft(LinearSeqOptimized.scala:126)
	at scala.collection.LinearSeqOptimized.foldLeft$(LinearSeqOptimized.scala:122)
	at scala.collection.immutable.List.foldLeft(List.scala:91)
	at org.apache.spark.sql.catalyst.rules.RuleExecutor.$anonfun$execute$1(RuleExecutor.scala:208)
	at org.apache.spark.sql.catalyst.rules.RuleExecutor.$anonfun$execute$1$adapted(RuleExecutor.scala:200)
	at scala.collection.immutable.List.foreach(List.scala:431)
	at org.apache.spark.sql.catalyst.rules.RuleExecutor.execute(RuleExecutor.scala:200)
	at org.apache.spark.sql.catalyst.rules.RuleExecutor.$anonfun$executeAndTrack$1(RuleExecutor.scala:179)
	at org.apache.spark.sql.catalyst.QueryPlanningTracker$.withTracker(QueryPlanningTracker.scala:88)
	at org.apache.spark.sql.catalyst.rules.RuleExecutor.executeAndTrack(RuleExecutor.scala:179)
	at org.apache.spark.sql.execution.QueryExecution.$anonfun$optimizedPlan$1(QueryExecution.scala:125)
	at org.apache.spark.sql.catalyst.QueryPlanningTracker.measurePhase(QueryPlanningTracker.scala:111)
	at org.apache.spark.sql.execution.QueryExecution.$anonfun$executePhase$1(QueryExecution.scala:183)
	at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:775)
	at org.apache.spark.sql.execution.QueryExecution.executePhase(QueryExecution.scala:183)
	at org.apache.spark.sql.execution.QueryExecution.optimizedPlan$lzycompute(QueryExecution.scala:121)
	at org.apache.spark.sql.execution.QueryExecution.optimizedPlan(QueryExecution.scala:117)
	at org.apache.spark.sql.execution.QueryExecution.assertOptimized(QueryExecution.scala:135)
	at org.apache.spark.sql.execution.QueryExecution.executedPlan$lzycompute(QueryExecution.scala:153)
	at org.apache.spark.sql.execution.QueryExecution.executedPlan(QueryExecution.scala:150)
	at org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:172)
	at org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:171)
	at org.apache.spark.sql.Dataset.rdd$lzycompute(Dataset.scala:3247)
	at org.apache.spark.sql.Dataset.rdd(Dataset.scala:3245)
	at org.apache.spark.sql.Dataset.toJavaRDD(Dataset.scala:3257)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
	at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
	at py4j.Gateway.invoke(Gateway.java:282)
	at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
	at py4j.commands.CallCommand.execute(CallCommand.java:79)
	at py4j.GatewayConnection.run(GatewayConnection.java:238)
	at java.lang.Thread.run(Thread.java:748)

Traceback (most recent call last):
  File "/srv/ssd1/yarn/nm/usercache/svc_df_big-st/appcache/application_1701151368547_109009/container_e381_1701151368547_109009_01_000001/pyspark.zip/pyspark/sql/dataframe.py", line 117, in toJSON
    return RDD(rdd.toJavaRDD(), self._sc, UTF8Deserializer(use_unicode))
  File "/srv/ssd1/yarn/nm/usercache/svc_df_big-st/appcache/application_1701151368547_109009/container_e381_1701151368547_109009_01_000001/py4j-0.10.9.5-src.zip/py4j/java_gateway.py", line 1322, in __call__
    answer, self.gateway_client, self.target_id, self.name)
  File "/srv/ssd1/yarn/nm/usercache/svc_df_big-st/appcache/application_1701151368547_109009/container_e381_1701151368547_109009_01_000001/pyspark.zip/pyspark/sql/utils.py", line 111, in deco
    return f(*a, **kw)
  File "/srv/ssd1/yarn/nm/usercache/svc_df_big-st/appcache/application_1701151368547_109009/container_e381_1701151368547_109009_01_000001/py4j-0.10.9.5-src.zip/py4j/protocol.py", line 328, in get_return_value
    format(target_id, ".", name), value)
py4j.protocol.Py4JJavaError: An error occurred while calling o165.toJavaRDD.
: org.apache.kyuubi.plugin.spark.authz.AccessControlException: Permission denied: user [svc_df_big-st] does not have [select] privilege on [test.customers/snapshot_id_7801393477815178085/id]
	at org.apache.kyuubi.plugin.spark.authz.ranger.SparkRangerAdminPlugin$.verify(SparkRangerAdminPlugin.scala:172)
	at org.apache.kyuubi.plugin.spark.authz.ranger.RuleAuthorization$.$anonfun$checkPrivileges$5(RuleAuthorization.scala:93)
	at org.apache.kyuubi.plugin.spark.authz.ranger.RuleAuthorization$.$anonfun$checkPrivileges$5$adapted(RuleAuthorization.scala:92)
	at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)
	at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55)
	at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49)
	at org.apache.kyuubi.plugin.spark.authz.ranger.RuleAuthorization$.org$apache$kyuubi$plugin$spark$authz$ranger$RuleAuthorization$$checkPrivileges(RuleAuthorization.scala:92)
	at org.apache.kyuubi.plugin.spark.authz.ranger.RuleAuthorization.apply(RuleAuthorization.scala:37)
	at org.apache.kyuubi.plugin.spark.authz.ranger.RuleAuthorization.apply(RuleAuthorization.scala:33)
	at org.apache.spark.sql.catalyst.rules.RuleExecutor.$anonfun$execute$2(RuleExecutor.scala:211)
	at scala.collection.LinearSeqOptimized.foldLeft(LinearSeqOptimized.scala:126)
	at scala.collection.LinearSeqOptimized.foldLeft$(LinearSeqOptimized.scala:122)
	at scala.collection.immutable.List.foldLeft(List.scala:91)
	at org.apache.spark.sql.catalyst.rules.RuleExecutor.$anonfun$execute$1(RuleExecutor.scala:208)
	at org.apache.spark.sql.catalyst.rules.RuleExecutor.$anonfun$execute$1$adapted(RuleExecutor.scala:200)
	at scala.collection.immutable.List.foreach(List.scala:431)
	at org.apache.spark.sql.catalyst.rules.RuleExecutor.execute(RuleExecutor.scala:200)
	at org.apache.spark.sql.catalyst.rules.RuleExecutor.$anonfun$executeAndTrack$1(RuleExecutor.scala:179)
	at org.apache.spark.sql.catalyst.QueryPlanningTracker$.withTracker(QueryPlanningTracker.scala:88)
	at org.apache.spark.sql.catalyst.rules.RuleExecutor.executeAndTrack(RuleExecutor.scala:179)
	at org.apache.spark.sql.execution.QueryExecution.$anonfun$optimizedPlan$1(QueryExecution.scala:125)
	at org.apache.spark.sql.catalyst.QueryPlanningTracker.measurePhase(QueryPlanningTracker.scala:111)
	at org.apache.spark.sql.execution.QueryExecution.$anonfun$executePhase$1(QueryExecution.scala:183)
	at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:775)
	at org.apache.spark.sql.execution.QueryExecution.executePhase(QueryExecution.scala:183)
	at org.apache.spark.sql.execution.QueryExecution.optimizedPlan$lzycompute(QueryExecution.scala:121)
	at org.apache.spark.sql.execution.QueryExecution.optimizedPlan(QueryExecution.scala:117)
	at org.apache.spark.sql.execution.QueryExecution.assertOptimized(QueryExecution.scala:135)
	at org.apache.spark.sql.execution.QueryExecution.executedPlan$lzycompute(QueryExecution.scala:153)
	at org.apache.spark.sql.execution.QueryExecution.executedPlan(QueryExecution.scala:150)
	at org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:172)
	at org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:171)
	at org.apache.spark.sql.Dataset.rdd$lzycompute(Dataset.scala:3247)
	at org.apache.spark.sql.Dataset.rdd(Dataset.scala:3245)
	at org.apache.spark.sql.Dataset.toJavaRDD(Dataset.scala:3257)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
	at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
	at py4j.Gateway.invoke(Gateway.java:282)
	at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
	at py4j.commands.CallCommand.execute(CallCommand.java:79)
	at py4j.GatewayConnection.run(GatewayConnection.java:238)
	at java.lang.Thread.run(Thread.java:748)

However if the test account is granted Hive access to read all databases there's no permission issue, however the * databases read access should not be normally necessary for this access to be allowed. Is there a Kyuubi Spark plugin authorization bug preventing this?
The patch at https://github.com/apache/kyuubi/pull/3931/files doesn't seem to cover this scenario.

Thanks

Affects Version(s)

1.8.0

Kyuubi Server Log Output

No response

Kyuubi Engine Log Output

No response

Kyuubi Server Configurations

No response

Kyuubi Engine Configurations

No response

Additional context

We are using Spark Kyuubi Authorization Plugin with Spark 3.2 and Iceberg 1.0.0.1.3.1 as described here: https://kyuubi.readthedocs.io/en/master/security/authorization/spark/install.html

Are you willing to submit PR?

  • Yes. I would be willing to submit a PR with guidance from the Kyuubi community to fix.
  • No. I cannot submit a PR at this time.
@elisabetao elisabetao added kind:bug This is a clearly a bug priority:major labels Dec 1, 2023
@yaooqinn
Copy link
Member

yaooqinn commented Dec 5, 2023

Can you provide the plan details?

@elisabetao
Copy link
Author

Hello,
Please let me know if more details are needed, this is also after applying patch #5248 724ae93
|== Physical Plan ==\n*(1) Project [id#32, name#33, age#34, address#35, cloth#36]\n+- BatchScan[id#32, name#33, age#34, address#35, cloth#36] iceberg.gns_test.customers [filters=] RuntimeFilters: []\n\n|

which appears to alleviate the access issue for iceberg.test.customers.snapshot_id_X, but introduces another issues where the metadata info like snapshots,history is freely accessible without any ranger security checks.

Thanks a lot

@yaooqinn
Copy link
Member

yaooqinn commented Dec 7, 2023

thanks @elisabetao, we need the full plan

@pravin1406
Copy link

pravin1406 commented Feb 13, 2025

@yaooqinn

I'm having a similar issue. We are not able to access iceberg metadata tables. Have attached table plan with and without authz plugin enabled. we have table level permission, and don't expect to give metadata table level permissions seperately ? is there in work in progress to serve this case ?

| Error occurred during query planning: | | Permission denied: user [dmu_mesh_qa1] does not have [select] privilege on [mesh_qa1_mart.Testicec1/files/content,mesh_qa1_mart.Testicec1/files/file_path,mesh_qa1_mart.Testicec1/files/file_format,mesh_qa1_mart.Testicec1/files/spec_id,mesh_qa1_mart.Testicec1/files/record_count,mesh_qa1_mart.Testicec1/files/file_size_in_bytes,mesh_qa1_mart.Testicec1/files/column_sizes,mesh_qa1_mart.Testicec1/files/value_counts,mesh_qa1_mart.Testicec1/files/null_value_counts,mesh_qa1_mart.Testicec1/files/nan_value_counts,mesh_qa1_mart.Testicec1/files/lower_bounds,mesh_qa1_mart.Testicec1/files/upper_bounds,mesh_qa1_mart.Testicec1/files/key_metadata,mesh_qa1_mart.Testicec1/files/split_offsets,mesh_qa1_mart.Testicec1/files/equality_ids,mesh_qa1_mart.Testicec1/files/sort_order_id,mesh_qa1_mart.Testicec1/files/readable_metrics] |

`
| == Parsed Logical Plan ==
'GlobalLimit 10
+- 'LocalLimit 10
+- 'Project [*]
+- 'UnresolvedRelation [mesh_qa1_mart, Testicec1, files], [], false

== Analyzed Logical Plan ==
content: int, file_path: string, file_format: string, spec_id: int, record_count: bigint, file_size_in_bytes: bigint, column_sizes: map<int,bigint>, value_counts: map<int,bigint>, null_value_counts: map<int,bigint>, nan_value_counts: map<int,bigint>, lower_bounds: map<int,binary>, upper_bounds: map<int,binary>, key_metadata: binary, split_offsets: array, equality_ids: array, sort_order_id: int
GlobalLimit 10
+- LocalLimit 10
+- Project [content#153, file_path#154, file_format#155, spec_id#156, record_count#157L, file_size_in_bytes#158L, column_sizes#159, value_counts#160, null_value_counts#161, nan_value_counts#162, lower_bounds#163, upper_bounds#164, key_metadata#165, split_offsets#166, equality_ids#167, sort_order_id#168]
+- SubqueryAlias spark_catalog.mesh_qa1_mart.Testicec1.files
+- RelationV2[content#153, file_path#154, file_format#155, spec_id#156, record_count#157L, file_size_in_bytes#158L, column_sizes#159, value_counts#160, null_value_counts#161, nan_value_counts#162, lower_bounds#163, upper_bounds#164, key_metadata#165, split_offsets#166, equality_ids#167, sort_order_id#168] spark_catalog.mesh_qa1_mart.Testicec1.files

== Optimized Logical Plan ==
GlobalLimit 10
+- LocalLimit 10
+- RelationV2[content#153, file_path#154, file_format#155, spec_id#156, record_count#157L, file_size_in_bytes#158L, column_sizes#159, value_counts#160, null_value_counts#161, nan_value_counts#162, lower_bounds#163, upper_bounds#164, key_metadata#165, split_offsets#166, equality_ids#167, sort_order_id#168] spark_catalog.mesh_qa1_mart.Testicec1.files

== Physical Plan ==
CollectLimit 10
+- *(1) Project [content#153, file_path#154, file_format#155, spec_id#156, record_count#157L, file_size_in_bytes#158L, column_sizes#159, value_counts#160, null_value_counts#161, nan_value_counts#162, lower_bounds#163, upper_bounds#164, key_metadata#165, split_offsets#166, equality_ids#167, sort_order_id#168]
+- BatchScan[content#153, file_path#154, file_format#155, spec_id#156, record_count#157L, file_size_in_bytes#158L, column_sizes#159, value_counts#160, null_value_counts#161, nan_value_counts#162, lower_bounds#163, upper_bounds#164, key_metadata#165, split_offsets#166, equality_ids#167, sort_order_id#168] spark_catalog.mesh_qa1_mart.Testicec1.files [filters=] RuntimeFilters: []
|
`

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind:bug This is a clearly a bug priority:major
Projects
None yet
Development

No branches or pull requests

3 participants