-
Notifications
You must be signed in to change notification settings - Fork 722
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Problematic frame: C [libtensorflow_framework.so.1+0x744da9] _GLOBAL__sub_I_loader.cc+0x99 #923
Comments
It seems a global symbol table that conflicts with TensorFlow's protobuf usage. So if it's not one of the dependencies causing this conflict, my next best guess is NOTE: I've seen this error on TensorFlow GitHub and they mostly closed them as |
What's inside it? Do you have any C++ code being executed by Pipe? Any Java code? |
It's not really a Spark NLP error so no matter what version you try as long as what you use has TensorFlow inside it will give you that conflict. For If what's up there didn't work, the only way to narrow down the actual cause is to remove everything one by one until there is nothing but Spark NLP and that few lines of codes that use the pre-trained pipeline. We don't have a way to reproduce and we've never seen an error like this so the only way is to create a simple package with only Spark NLP, a simple DataFrame without any external source, and try to run it without any other configs/dependencies except spark-nlp jar. I would do this with my clusters all the time. Then you can add your concept one by one and see when it crashes and that is the cause which I might be able to help if spark-nlp is the cause. |
I have removed all dependencies as you suggests, but the error still raise (always during loading of NerDLModel in stage 4). CMD:
Driver's stdout:
Executor's Logs:
|
Ok this good, we have a clean way of testing now. The error is about TensorFlow complaining about a conflict which is normal that happens during NerDL since that is the annotator using TensorFlow. I see 2.5.0 in spark.jars but I see another jar with different version from local. Could you please remove this config entirely from your Spark Session and startup? spark.repl.local.jars |
spark.repl.local.jars is setted by
It seems like spark.jars affects only executor. |
You are using spark.jars with the fat jar which comes with everything, so first there is no need for packages. Second, where is this packages? I don’t see it in what you pasted. And if there is a packages somewhere why is it downloading 2.4.5 instead of 2.5.0 which is the version of the fat jar being used. |
Ok I just saw it’s in the spark-submit. Still doesn’t answer why 2.5.0 is having 2.4.5 jar in your logs? In your spark-submit please try to use --jars and point to the same fat jar there as well and remove packages and see what happens |
Still doesn’t answer why 2.5.0 is having 2.4.5 jar in your logs? In your spark-submit please try to use --jars and point to the same fat jar there as well and remove packages and see what happens |
I tried using the fatjar in spark-submit but it keeps giving me the same error. |
This is good, then your code has an issue. Could you please also paste the code you are using your sbt packaging? We are keep coming into what's in that jar and we need to see what and how are things being used in there. PS: You are using PretrainedPipeline with a
Now that you have a clean environment and the spark-submit and SparkSession are synced without anything else in conflict please try the correct code and also mentioned anything else you have in your code. (is it possible to launch spark-shell with the same config to access your spark cluster? It's an easier way to keep try stuff there and see the results with logs as supposed in spark-submit) |
SBT:
Is it possible to launch spark-shell with the same config to access your spark cluster? Anyway I have tried with correct code but it keeps giving me the same error. |
OK, what if we use |
Second question, in your build.sbt I don't see any An example of how I package my code to be executed on the Apache Spark cluster that provides Apache Spark so I don't have to include them: https://github.com/multivacplatform/multivac-pubmed/blob/master/build.sbt |
OK, what if we use spark.jars.packages in spark-submit (--packages) and SparkSession? Second question, in your build.sbt I don't see any provided or assembly strategy like merge, so your sbt package is a fat jar that includes Apache Spark? It won't use the Apache Spark provided by your cluster, am I correct? |
Now I try, but I don't think it depends on this. The library is imported correctly, it is during the loading of the stages that it goes wrong. val document = new DocumentAssembler()
.setInputCol("text")
.setOutputCol("document")
val token = new Tokenizer()
.setInputCols("document")
.setOutputCol("token")
val normalizer = new Normalizer()
.setInputCols("token")
.setOutputCol("normal")
val finisher = new Finisher()
.setInputCols("normal")
val pipeline = new Pipeline().setStages(Array(document, token, normalizer, finisher))
pipeline.fit(YOUR_DATAFRAME).transform(YOUR_DATAFRAME).show() |
Also, please share how you package your last jar |
The last thing is to build and use this repo as a test: https://github.com/maziyarpanahi/spark-nlp-starter |
Ok, the code works. Also, please share how you package your last jar sparkscala_2.11-0.1.jar that doesn't include other dependencies. The error clearly says it cannot find something which should be presented unless somewhere it was excluded.
|
I have tried to use the code in repo, but I get this error:
|
The code in the repo is a solid and simple example of how you can use spark-nlp in your app and package it for a cluster/external Apache Spark. If it fails, then it's about how you package it or how you are using it in your setup. |
Any success in resolving this issue? I have the same problem. |
I also haven same problem
|
same problem, if do have some solution? |
Many things can cause this error. This issue has no solution. That’s being said, your issue might have a same error but may not be related. |
Description
I have to perform a spark job, which uses the recognize_entities_dl pretrained pipeline, in a mesos (dockerized) cluster. The cmd is as follows:
/opt/spark/spark-2.4.5-bin-hadoop2.7/bin/spark-submit --packages com.johnsnowlabs.nlp:spark-nlp_2.11:2.5.0,com.couchbase.client:spark-connector_2.11:2.3.0 --master mesos://zk://remote_ip:2181/mesos --deploy-mode client --class tags_extraction.tags_extraction_eng /opt/sparkscala_2.11-0.1.jar
This is the code:
While the pipeline is being downloaded, this error is raised when loading stage 4:
Expected Behavior
Download pretrained pipeline with
val pipeline = new PretrainedPipeline("recognize_entities_dl", "en")
Current Behavior
Driver's stdout:
Executor's Logs:
Your Environment
Docker environment:
Versions:
**JRE version: OpenJDK Runtime Environment (8.0_252-b09) (build 1.8.0_252-8u252-b09-1~16.04-b09)
**Java VM: OpenJDK 64-Bit Server VM (25.252-b09 mixed mode linux-amd64 compressed oops)
The text was updated successfully, but these errors were encountered: