Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

org.apache.atlas.AtlasServiceException "errorCode":"ATLAS-404-00-00A" #6307

Closed
2 of 4 tasks
2018yinjian opened this issue Apr 15, 2024 · 6 comments
Closed
2 of 4 tasks
Labels
kind:bug This is a clearly a bug priority:major

Comments

@2018yinjian
Copy link

2018yinjian commented Apr 15, 2024

Code of Conduct

Search before asking

  • I have searched in the issues and found no similar issues.

Describe the bug

WARN AtlasLineageDispatcher: =========processEntity=========:AtlasEntity{AtlasStruct{typeName='spark_process', attributes=[executionId:5, qualifiedName:application_1678250195185_77558, name:Spark Job application_1678250195185_77558, currUser:liuqin, details:== Parsed Logical Plan ==
InsertIntoHiveTable `default`.`test_table1`, org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, false, false, [a, d]
+- Project [a#1, (b#2 + c#3) AS d#0]
   +- SubqueryAlias spark_catalog.default.test_table0
      +- HiveTableRelation [`default`.`test_table0`, org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, Data Cols: [a#1, b#2, c#3], Partition Cols: []]

== Analyzed Logical Plan ==
InsertIntoHiveTable `default`.`test_table1`, org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, false, false, [a, d]
+- Project [a#1, (b#2 + c#3) AS d#0]
   +- SubqueryAlias spark_catalog.default.test_table0
      +- HiveTableRelation [`default`.`test_table0`, org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, Data Cols: [a#1, b#2, c#3], Partition Cols: []]

== Optimized Logical Plan ==
InsertIntoHiveTable `default`.`test_table1`, org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, false, false, [a, d]
+- Project [a#1, (b#2 + c#3) AS d#0]
   +- HiveTableRelation [`default`.`test_table0`, org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, Data Cols: [a#1, b#2, c#3], Partition Cols: []]

== Physical Plan ==
Execute InsertIntoHiveTable `default`.`test_table1`, org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, false, false, [a, d]
+- *(1) Project [a#1, (b#2 + c#3) AS d#0]
   +- Scan hive default.test_table0 [a#1, b#2, c#3], HiveTableRelation [`default`.`test_table0`, org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, Data Cols: [a#1, b#2, c#3], Partition Cols: []]
, sparkPlanDescription:Execute InsertIntoHiveTable `default`.`test_table1`, org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, false, false, [a, d]
+- Project [a#1, (b#2 + c#3) AS d#0]
   +- Scan hive default.test_table0 [a#1, b#2, c#3], HiveTableRelation [`default`.`test_table0`, org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, Data Cols: [a#1, b#2, c#3], Partition Cols: []]
]}guid='-20194587027679703', homeId='null', isProxy='false', isIncomplete=false, provenanceType=0, status=null, createdBy='null', updatedBy='null', createTime=null, updateTime=null, version=0, relationshipAttributes=[outputs:[AtlasRelatedObjectId{AtlasObjectId{guid='null', typeName='hive_table', uniqueAttributes={qualifiedName:default.test_table1@primary}}entityStatus='null', displayText='null', qualifiedName='null', relationshipType='process_dataset_outputs', relationshipGuid='null', relationshipStatus='null', relationshipAttributes=null}], inputs:[AtlasRelatedObjectId{AtlasObjectId{guid='null', typeName='hive_table', uniqueAttributes={qualifiedName:default.test_table0@primary}}entityStatus='null', displayText='null', qualifiedName='null', relationshipType='dataset_process_inputs', relationshipGuid='null', relationshipStatus='null', relationshipAttributes=null}]], classifications=[], meanings=[], customAttributes=[], businessAttributes=[], labels=[], pendingTasks=[]}
24/04/15 10:09:40 WARN AtlasLineageDispatcher: =========columnLineageEntities=========
24/04/15 10:09:40 WARN AtlasLineageDispatcher: =========columnLineageEntities=========List(AtlasEntity{AtlasStruct{typeName='spark_column_lineage', attributes=[qualifiedName:application_1678250195185_77558:default.test_table1.a@primary, name:application_1678250195185_77558:default.test_table1.a@primary]}guid='-20194587027679704', homeId='null', isProxy='false', isIncomplete=false, provenanceType=0, status=null, createdBy='null', updatedBy='null', createTime=null, updateTime=null, version=0, relationshipAttributes=[outputs:[AtlasRelatedObjectId{AtlasObjectId{guid='null', typeName='hive_column', uniqueAttributes={qualifiedName:default.test_table1.a@primary}}entityStatus='null', displayText='null', qualifiedName='null', relationshipType='process_dataset_outputs', relationshipGuid='null', relationshipStatus='null', relationshipAttributes=null}], process:AtlasRelatedObjectId{AtlasObjectId{guid='-20194587027679703', typeName='spark_process', uniqueAttributes={qualifiedName:application_1678250195185_77558}}entityStatus='null', displayText='null', qualifiedName='null', relationshipType='spark_process_column_lineages', relationshipGuid='null', relationshipStatus='null', relationshipAttributes=null}, inputs:[AtlasRelatedObjectId{AtlasObjectId{guid='null', typeName='hive_column', uniqueAttributes={qualifiedName:default.test_table0.a@primary}}entityStatus='null', displayText='null', qualifiedName='null', relationshipType='dataset_process_inputs', relationshipGuid='null', relationshipStatus='null', relationshipAttributes=null}]], classifications=[], meanings=[], customAttributes=[], businessAttributes=[], labels=[], pendingTasks=[]}, AtlasEntity{AtlasStruct{typeName='spark_column_lineage', attributes=[qualifiedName:application_1678250195185_77558:default.test_table1.d@primary, name:application_1678250195185_77558:default.test_table1.d@primary]}guid='-20194587027679705', homeId='null', isProxy='false', isIncomplete=false, provenanceType=0, status=null, createdBy='null', updatedBy='null', createTime=null, updateTime=null, version=0, relationshipAttributes=[outputs:[AtlasRelatedObjectId{AtlasObjectId{guid='null', typeName='hive_column', uniqueAttributes={qualifiedName:default.test_table1.d@primary}}entityStatus='null', displayText='null', qualifiedName='null', relationshipType='process_dataset_outputs', relationshipGuid='null', relationshipStatus='null', relationshipAttributes=null}], process:AtlasRelatedObjectId{AtlasObjectId{guid='-20194587027679703', typeName='spark_process', uniqueAttributes={qualifiedName:application_1678250195185_77558}}entityStatus='null', displayText='null', qualifiedName='null', relationshipType='spark_process_column_lineages', relationshipGuid='null', relationshipStatus='null', relationshipAttributes=null}, inputs:[AtlasRelatedObjectId{AtlasObjectId{guid='null', typeName='hive_column', uniqueAttributes={qualifiedName:default.test_table0.b@primary}}entityStatus='null', displayText='null', qualifiedName='null', relationshipType='dataset_process_inputs', relationshipGuid='null', relationshipStatus='null', relationshipAttributes=null}, AtlasRelatedObjectId{AtlasObjectId{guid='null', typeName='hive_column', uniqueAttributes={qualifiedName:default.test_table0.c@primary}}entityStatus='null', displayText='null', qualifiedName='null', relationshipType='dataset_process_inputs', relationshipGuid='null', relationshipStatus='null', relationshipAttributes=null}]], classifications=[], meanings=[], customAttributes=[], businessAttributes=[], labels=[], pendingTasks=[]})
24/04/15 10:09:40 WARN AtlasLineageDispatcher: Send lineage to atlas failed.
org.apache.atlas.AtlasServiceException: Metadata service API org.apache.atlas.AtlasClientV2$API_V2@70cf2fd6 failed with status 404 (Not Found) Response Body ({"errorCode":"ATLAS-404-00-00A","errorMessage":"Referenced entity AtlasObjectId{guid='null', typeName='hive_column', uniqueAttributes={qualifiedName:default.test_table0.a@primary}} is not found"})
        at org.apache.atlas.AtlasBaseClient.callAPIWithResource(AtlasBaseClient.java:427) ~[atlas-client-common-2.3.0.jar:2.3.0]
        at org.apache.atlas.AtlasBaseClient.callAPIWithResource(AtlasBaseClient.java:352) ~[atlas-client-common-2.3.0.jar:2.3.0]
        at org.apache.atlas.AtlasBaseClient.callAPI(AtlasBaseClient.java:228) ~[atlas-client-common-2.3.0.jar:2.3.0]
        at org.apache.atlas.AtlasClientV2.createEntities(AtlasClientV2.java:436) ~[atlas-client-v2-2.3.0.jar:2.3.0]
        at org.apache.kyuubi.plugin.lineage.dispatcher.atlas.AtlasRestClient.send(AtlasClient.scala:51) ~[kyuubi-spark-lineage_2.12-1.8.1.jar:1.8.1]
        at org.apache.kyuubi.plugin.lineage.dispatcher.atlas.AtlasLineageDispatcher.$anonfun$send$2(AtlasLineageDispatcher.scala:42) ~[kyuubi-spark-lineage_2.12-1.8.1.jar:1.8.1]
        at org.apache.kyuubi.plugin.lineage.dispatcher.atlas.AtlasLineageDispatcher.$anonfun$send$2$adapted(AtlasLineageDispatcher.scala:30) ~[kyuubi-spark-lineage_2.12-1.8.1.jar:1.8.1]
        at scala.Option.foreach(Option.scala:407) ~[scala-library-2.12.15.jar:?]
        at org.apache.kyuubi.plugin.lineage.dispatcher.atlas.AtlasLineageDispatcher.send(AtlasLineageDispatcher.scala:30) ~[kyuubi-spark-lineage_2.12-1.8.1.jar:1.8.1]
        at org.apache.kyuubi.plugin.lineage.SparkOperationLineageQueryExecutionListener.$anonfun$onSuccess$1(SparkOperationLineageQueryExecutionListener.scala:35) ~[kyuubi-spark-lineage_2.12-1.8.1.jar:1.8.1]
        at org.apache.kyuubi.plugin.lineage.SparkOperationLineageQueryExecutionListener.$anonfun$onSuccess$1$adapted(SparkOperationLineageQueryExecutionListener.scala:35) ~[kyuubi-spark-lineage_2.12-1.8.1.jar:1.8.1]
        at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62) ~[scala-library-2.12.15.jar:?]
        at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55) ~[scala-library-2.12.15.jar:?]
        at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49) ~[scala-library-2.12.15.jar:?]
        at org.apache.kyuubi.plugin.lineage.SparkOperationLineageQueryExecutionListener.onSuccess(SparkOperationLineageQueryExecutionListener.scala:35) ~[kyuubi-spark-lineage_2.12-1.8.1.jar:1.8.1]
        at org.apache.spark.sql.util.ExecutionListenerBus.doPostEvent(QueryExecutionListener.scala:165) ~[spark-sql_2.12-3.3.1.jar:3.3.1]
        at org.apache.spark.sql.util.ExecutionListenerBus.doPostEvent(QueryExecutionListener.scala:135) ~[spark-sql_2.12-3.3.1.jar:3.3.1]
        at org.apache.spark.util.ListenerBus.postToAll(ListenerBus.scala:117) ~[spark-core_2.12-3.3.1.jar:3.3.1]
        at org.apache.spark.util.ListenerBus.postToAll$(ListenerBus.scala:101) ~[spark-core_2.12-3.3.1.jar:3.3.1]
        at org.apache.spark.sql.util.ExecutionListenerBus.postToAll(QueryExecutionListener.scala:135) ~[spark-sql_2.12-3.3.1.jar:3.3.1]
        at org.apache.spark.sql.util.ExecutionListenerBus.onOtherEvent(QueryExecutionListener.scala:147) ~[spark-sql_2.12-3.3.1.jar:3.3.1]
        at org.apache.spark.scheduler.SparkListenerBus.doPostEvent(SparkListenerBus.scala:100) ~[spark-core_2.12-3.3.1.jar:3.3.1]
        at org.apache.spark.scheduler.SparkListenerBus.doPostEvent$(SparkListenerBus.scala:28) ~[spark-core_2.12-3.3.1.jar:3.3.1]
        at org.apache.spark.scheduler.AsyncEventQueue.doPostEvent(AsyncEventQueue.scala:37) ~[spark-core_2.12-3.3.1.jar:3.3.1]
        at org.apache.spark.scheduler.AsyncEventQueue.doPostEvent(AsyncEventQueue.scala:37) ~[spark-core_2.12-3.3.1.jar:3.3.1]
        at org.apache.spark.util.ListenerBus.postToAll(ListenerBus.scala:117) ~[spark-core_2.12-3.3.1.jar:3.3.1]
        at org.apache.spark.util.ListenerBus.postToAll$(ListenerBus.scala:101) ~[spark-core_2.12-3.3.1.jar:3.3.1]
        at org.apache.spark.scheduler.AsyncEventQueue.super$postToAll(AsyncEventQueue.scala:105) ~[spark-core_2.12-3.3.1.jar:3.3.1]
        at org.apache.spark.scheduler.AsyncEventQueue.$anonfun$dispatch$1(AsyncEventQueue.scala:105) ~[spark-core_2.12-3.3.1.jar:3.3.1]
        at scala.runtime.java8.JFunction0$mcJ$sp.apply(JFunction0$mcJ$sp.java:23) ~[scala-library-2.12.15.jar:?]
        at scala.util.DynamicVariable.withValue(DynamicVariable.scala:62) ~[scala-library-2.12.15.jar:?]
        at org.apache.spark.scheduler.AsyncEventQueue.org$apache$spark$scheduler$AsyncEventQueue$$dispatch(AsyncEventQueue.scala:100) ~[spark-core_2.12-3.3.1.jar:3.3.1]
        at org.apache.spark.scheduler.AsyncEventQueue$$anon$2.$anonfun$run$1(AsyncEventQueue.scala:96) ~[spark-core_2.12-3.3.1.jar:3.3.1]
        at org.apache.spark.util.Utils$.tryOrStopSparkContext(Utils.scala:1446) ~[spark-core_2.12-3.3.1.jar:3.3.1]
        at org.apache.spark.scheduler.AsyncEventQueue$$anon$2.run(AsyncEventQueue.scala:96) ~[spark-core_2.12-3.3.1.jar:3.3.1]
24/04/15 10:09:40 WARN AtlasLineageDispatcher: =========processEntity=========:AtlasEntity{AtlasStruct{typeName='spark_process', attributes=[executionId:4, qualifiedName:application_1678250195185_77558, name:Spark Job application_1678250195185_77558, currUser:liuqin, details:== Parsed Logical Plan ==
'InsertIntoStatement 'UnresolvedRelation [test_table1], [], false, false, false
+- 'Project ['a, ('b + 'c) AS d#0]
   +- 'UnresolvedRelation [test_table0], [], false

== Analyzed Logical Plan ==
InsertIntoHiveTable `default`.`test_table1`, org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, false, false, [a, d]
+- Project [a#1, (b#2 + c#3) AS d#0]
   +- SubqueryAlias spark_catalog.default.test_table0
      +- HiveTableRelation [`default`.`test_table0`, org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, Data Cols: [a#1, b#2, c#3], Partition Cols: []]

== Optimized Logical Plan ==
CommandResult Execute InsertIntoHiveTable `default`.`test_table1`, org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, false, false, [a, d]
   +- InsertIntoHiveTable `default`.`test_table1`, org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, false, false, [a, d]
      +- Project [a#1, (b#2 + c#3) AS d#0]
         +- SubqueryAlias spark_catalog.default.test_table0
            +- HiveTableRelation [`default`.`test_table0`, org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, Data Cols: [a#1, b#2, c#3], Partition Cols: []]

== Physical Plan ==
CommandResult <empty>
   +- Execute InsertIntoHiveTable `default`.`test_table1`, org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, false, false, [a, d]
      +- *(1) Project [a#1, (b#2 + c#3) AS d#0]
         +- Scan hive default.test_table0 [a#1, b#2, c#3], HiveTableRelation [`default`.`test_table0`, org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, Data Cols: [a#1, b#2, c#3], Partition Cols: []]
, sparkPlanDescription:CommandResult <empty>
   +- Execute InsertIntoHiveTable `default`.`test_table1`, org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, false, false, [a, d]
      +- *(1) Project [a#1, (b#2 + c#3) AS d#0]
         +- Scan hive default.test_table0 [a#1, b#2, c#3], HiveTableRelation [`default`.`test_table0`, org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, Data Cols: [a#1, b#2, c#3], Partition Cols: []]
]}guid='-20194587027679706', homeId='null', isProxy='false', isIncomplete=false, provenanceType=0, status=null, createdBy='null', updatedBy='null', createTime=null, updateTime=null, version=0, relationshipAttributes=[outputs:[AtlasRelatedObjectId{AtlasObjectId{guid='null', typeName='hive_table', uniqueAttributes={qualifiedName:default.test_table1@primary}}entityStatus='null', displayText='null', qualifiedName='null', relationshipType='process_dataset_outputs', relationshipGuid='null', relationshipStatus='null', relationshipAttributes=null}], inputs:[AtlasRelatedObjectId{AtlasObjectId{guid='null', typeName='hive_table', uniqueAttributes={qualifiedName:default.test_table0@primary}}entityStatus='null', displayText='null', qualifiedName='null', relationshipType='dataset_process_inputs', relationshipGuid='null', relationshipStatus='null', relationshipAttributes=null}]], classifications=[], meanings=[], customAttributes=[], businessAttributes=[], labels=[], pendingTasks=[]}
24/04/15 10:09:40 WARN AtlasLineageDispatcher: =========columnLineageEntities=========
24/04/15 10:09:40 WARN AtlasLineageDispatcher: =========columnLineageEntities=========List(AtlasEntity{AtlasStruct{typeName='spark_column_lineage', attributes=[qualifiedName:application_1678250195185_77558:default.test_table1.a@primary, name:application_1678250195185_77558:default.test_table1.a@primary]}guid='-20194587027679707', homeId='null', isProxy='false', isIncomplete=false, provenanceType=0, status=null, createdBy='null', updatedBy='null', createTime=null, updateTime=null, version=0, relationshipAttributes=[outputs:[AtlasRelatedObjectId{AtlasObjectId{guid='null', typeName='hive_column', uniqueAttributes={qualifiedName:default.test_table1.a@primary}}entityStatus='null', displayText='null', qualifiedName='null', relationshipType='process_dataset_outputs', relationshipGuid='null', relationshipStatus='null', relationshipAttributes=null}], process:AtlasRelatedObjectId{AtlasObjectId{guid='-20194587027679706', typeName='spark_process', uniqueAttributes={qualifiedName:application_1678250195185_77558}}entityStatus='null', displayText='null', qualifiedName='null', relationshipType='spark_process_column_lineages', relationshipGuid='null', relationshipStatus='null', relationshipAttributes=null}, inputs:[AtlasRelatedObjectId{AtlasObjectId{guid='null', typeName='hive_column', uniqueAttributes={qualifiedName:default.test_table0.a@primary}}entityStatus='null', displayText='null', qualifiedName='null', relationshipType='dataset_process_inputs', relationshipGuid='null', relationshipStatus='null', relationshipAttributes=null}]], classifications=[], meanings=[], customAttributes=[], businessAttributes=[], labels=[], pendingTasks=[]}, AtlasEntity{AtlasStruct{typeName='spark_column_lineage', attributes=[qualifiedName:application_1678250195185_77558:default.test_table1.d@primary, name:application_1678250195185_77558:default.test_table1.d@primary]}guid='-20194587027679708', homeId='null', isProxy='false', isIncomplete=false, provenanceType=0, status=null, createdBy='null', updatedBy='null', createTime=null, updateTime=null, version=0, relationshipAttributes=[outputs:[AtlasRelatedObjectId{AtlasObjectId{guid='null', typeName='hive_column', uniqueAttributes={qualifiedName:default.test_table1.d@primary}}entityStatus='null', displayText='null', qualifiedName='null', relationshipType='process_dataset_outputs', relationshipGuid='null', relationshipStatus='null', relationshipAttributes=null}], process:AtlasRelatedObjectId{AtlasObjectId{guid='-20194587027679706', typeName='spark_process', uniqueAttributes={qualifiedName:application_1678250195185_77558}}entityStatus='null', displayText='null', qualifiedName='null', relationshipType='spark_process_column_lineages', relationshipGuid='null', relationshipStatus='null', relationshipAttributes=null}, inputs:[AtlasRelatedObjectId{AtlasObjectId{guid='null', typeName='hive_column', uniqueAttributes={qualifiedName:default.test_table0.b@primary}}entityStatus='null', displayText='null', qualifiedName='null', relationshipType='dataset_process_inputs', relationshipGuid='null', relationshipStatus='null', relationshipAttributes=null}, AtlasRelatedObjectId{AtlasObjectId{guid='null', typeName='hive_column', uniqueAttributes={qualifiedName:default.test_table0.c@primary}}entityStatus='null', displayText='null', qualifiedName='null', relationshipType='dataset_process_inputs', relationshipGuid='null', relationshipStatus='null', relationshipAttributes=null}]], classifications=[], meanings=[], customAttributes=[], businessAttributes=[], labels=[], pendingTasks=[]})
24/04/15 10:09:41 WARN AtlasLineageDispatcher: Send lineage to atlas failed.
org.apache.atlas.AtlasServiceException: Metadata service API org.apache.atlas.AtlasClientV2$API_V2@70cf2fd6 failed with status 404 (Not Found) Response Body ({"errorCode":"ATLAS-404-00-00A","errorMessage":"Referenced entity AtlasObjectId{guid='null', typeName='hive_column', uniqueAttributes={qualifiedName:default.test_table0.a@primary}} is not found"})
        at org.apache.atlas.AtlasBaseClient.callAPIWithResource(AtlasBaseClient.java:427) ~[atlas-client-common-2.3.0.jar:2.3.0]
        at org.apache.atlas.AtlasBaseClient.callAPIWithResource(AtlasBaseClient.java:352) ~[atlas-client-common-2.3.0.jar:2.3.0]
        at org.apache.atlas.AtlasBaseClient.callAPI(AtlasBaseClient.java:228) ~[atlas-client-common-2.3.0.jar:2.3.0]
        at org.apache.atlas.AtlasClientV2.createEntities(AtlasClientV2.java:436) ~[atlas-client-v2-2.3.0.jar:2.3.0]
        at org.apache.kyuubi.plugin.lineage.dispatcher.atlas.AtlasRestClient.send(AtlasClient.scala:51) ~[kyuubi-spark-lineage_2.12-1.8.1.jar:1.8.1]
        at org.apache.kyuubi.plugin.lineage.dispatcher.atlas.AtlasLineageDispatcher.$anonfun$send$2(AtlasLineageDispatcher.scala:42) ~[kyuubi-spark-lineage_2.12-1.8.1.jar:1.8.1]
        at org.apache.kyuubi.plugin.lineage.dispatcher.atlas.AtlasLineageDispatcher.$anonfun$send$2$adapted(AtlasLineageDispatcher.scala:30) ~[kyuubi-spark-lineage_2.12-1.8.1.jar:1.8.1]
        at scala.Option.foreach(Option.scala:407) ~[scala-library-2.12.15.jar:?]
        at org.apache.kyuubi.plugin.lineage.dispatcher.atlas.AtlasLineageDispatcher.send(AtlasLineageDispatcher.scala:30) ~[kyuubi-spark-lineage_2.12-1.8.1.jar:1.8.1]
        at org.apache.kyuubi.plugin.lineage.SparkOperationLineageQueryExecutionListener.$anonfun$onSuccess$1(SparkOperationLineageQueryExecutionListener.scala:35) ~[kyuubi-spark-lineage_2.12-1.8.1.jar:1.8.1]
        at org.apache.kyuubi.plugin.lineage.SparkOperationLineageQueryExecutionListener.$anonfun$onSuccess$1$adapted(SparkOperationLineageQueryExecutionListener.scala:35) ~[kyuubi-spark-lineage_2.12-1.8.1.jar:1.8.1]
        at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62) ~[scala-library-2.12.15.jar:?]
        at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55) ~[scala-library-2.12.15.jar:?]
        at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49) ~[scala-library-2.12.15.jar:?]
        at org.apache.kyuubi.plugin.lineage.SparkOperationLineageQueryExecutionListener.onSuccess(SparkOperationLineageQueryExecutionListener.scala:35) ~[kyuubi-spark-lineage_2.12-1.8.1.jar:1.8.1]
        at org.apache.spark.sql.util.ExecutionListenerBus.doPostEvent(QueryExecutionListener.scala:165) ~[spark-sql_2.12-3.3.1.jar:3.3.1]
        at org.apache.spark.sql.util.ExecutionListenerBus.doPostEvent(QueryExecutionListener.scala:135) ~[spark-sql_2.12-3.3.1.jar:3.3.1]
        at org.apache.spark.util.ListenerBus.postToAll(ListenerBus.scala:117) ~[spark-core_2.12-3.3.1.jar:3.3.1]
        at org.apache.spark.util.ListenerBus.postToAll$(ListenerBus.scala:101) ~[spark-core_2.12-3.3.1.jar:3.3.1]
        at org.apache.spark.sql.util.ExecutionListenerBus.postToAll(QueryExecutionListener.scala:135) ~[spark-sql_2.12-3.3.1.jar:3.3.1]
        at org.apache.spark.sql.util.ExecutionListenerBus.onOtherEvent(QueryExecutionListener.scala:147) ~[spark-sql_2.12-3.3.1.jar:3.3.1]
        at org.apache.spark.scheduler.SparkListenerBus.doPostEvent(SparkListenerBus.scala:100) ~[spark-core_2.12-3.3.1.jar:3.3.1]
        at org.apache.spark.scheduler.SparkListenerBus.doPostEvent$(SparkListenerBus.scala:28) ~[spark-core_2.12-3.3.1.jar:3.3.1]
        at org.apache.spark.scheduler.AsyncEventQueue.doPostEvent(AsyncEventQueue.scala:37) ~[spark-core_2.12-3.3.1.jar:3.3.1]
        at org.apache.spark.scheduler.AsyncEventQueue.doPostEvent(AsyncEventQueue.scala:37) ~[spark-core_2.12-3.3.1.jar:3.3.1]
        at org.apache.spark.util.ListenerBus.postToAll(ListenerBus.scala:117) ~[spark-core_2.12-3.3.1.jar:3.3.1]
        at org.apache.spark.util.ListenerBus.postToAll$(ListenerBus.scala:101) ~[spark-core_2.12-3.3.1.jar:3.3.1]
        at org.apache.spark.scheduler.AsyncEventQueue.super$postToAll(AsyncEventQueue.scala:105) ~[spark-core_2.12-3.3.1.jar:3.3.1]
        at org.apache.spark.scheduler.AsyncEventQueue.$anonfun$dispatch$1(AsyncEventQueue.scala:105) ~[spark-core_2.12-3.3.1.jar:3.3.1]
        at scala.runtime.java8.JFunction0$mcJ$sp.apply(JFunction0$mcJ$sp.java:23) ~[scala-library-2.12.15.jar:?]
        at scala.util.DynamicVariable.withValue(DynamicVariable.scala:62) ~[scala-library-2.12.15.jar:?]
        at org.apache.spark.scheduler.AsyncEventQueue.org$apache$spark$scheduler$AsyncEventQueue$$dispatch(AsyncEventQueue.scala:100) ~[spark-core_2.12-3.3.1.jar:3.3.1]
        at org.apache.spark.scheduler.AsyncEventQueue$$anon$2.$anonfun$run$1(AsyncEventQueue.scala:96) ~[spark-core_2.12-3.3.1.jar:3.3.1]
        at org.apache.spark.util.Utils$.tryOrStopSparkContext(Utils.scala:1446) ~[spark-core_2.12-3.3.1.jar:3.3.1]
        at org.apache.spark.scheduler.AsyncEventQueue$$anon$2.run(AsyncEventQueue.scala:96) ~[spark-core_2.12-3.3.1.jar:3.3.1]

Affects Version(s)

kyuubi-spark-lineage

Kyuubi Server Log Output

No response

Kyuubi Engine Log Output

No response

Kyuubi Server Configurations

No response

Kyuubi Engine Configurations

No response

Additional context

kyuubi-spark-lineage:

spark-shell --driver-memory 1g --executor-memory 2g --num-executors 2 --executor-cores 1 --conf spark.executor.memoryOverhead=2g --conf spark.sql.catalog.v2_catalog=org.apache.spark.sql.connector.catalog.InMemoryTableCatalog --conf spark.sql.queryExecutionListeners=org.apache.kyuubi.plugin.lineage.SparkOperationLineageQueryExecutionListener --conf spark.kyuubi.plugin.lineage.dispatchers=ATLAS --conf spark.kyuubi.plugin.lineage.skip.parsing.permanent.view.enabled=true

spark.sql("create table test_table0(a string, b int, c int)")
spark.sql("create table test_table1(a string, d int)")
spark.sql("insert into test_table1 select a, b + c as d from test_table0").collect()	

Are you willing to submit PR?

  • Yes. I would be willing to submit a PR with guidance from the Kyuubi community to fix.
  • No. I cannot submit a PR at this time.
@2018yinjian 2018yinjian added kind:bug This is a clearly a bug priority:major labels Apr 15, 2024
Copy link

Hello @2018yinjian,
Thanks for finding the time to report the issue!
We really appreciate the community's efforts to improve Apache Kyuubi.

@wForget
Copy link
Member

wForget commented Apr 15, 2024

Referenced entity AtlasObjectId{guid='null', typeName='hive_column', uniqueAttributes={qualifiedName:default.test_table0.a@primary}} is not found

Kyuubi AtlasLineageDispatcher only creates the spark_progress entity. The premise is that the table entities (hive_table/hive_column) have been ingested into atlas through other way (eg: atlas hive bridge).

@pan3793
Copy link
Member

pan3793 commented Apr 15, 2024

@wForget similar questions were asked previously, can we update the docs or FAQ to clarify?

@wForget
Copy link
Member

wForget commented Apr 15, 2024

@wForget similar questions were asked previously, can we update the docs or FAQ to clarify?

I will submit a new feature for this soon.

@wForget
Copy link
Member

wForget commented Apr 15, 2024

@wForget similar questions were asked previously, can we update the docs or FAQ to clarify?

task: #6309

@2018yinjian
Copy link
Author

Thank you very much. Will there be lineage support for Spark operations on Hudi tables in the future?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind:bug This is a clearly a bug priority:major
Projects
None yet
Development

No branches or pull requests

3 participants