-
Notifications
You must be signed in to change notification settings - Fork 176
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
spark.comet.memory.overhead.min
not respected when submitting jobs with Comet with Spark on Kubernetes
#605
Comments
Thats true, thanks for raising this. Spark Kubernetes pods respect Spark params but has no any idea about Comet params for now. Once pod memory allocated its not possible to change it. Since Comet has no control over Spark params, I'm inclining into including |
I am inclined to including it to the spark overhead, however, I am not sure to understand why its not taken into consideration here. That line of code should provide the new value of the memory overhead, which is then passed to spark and the RM to create the pod template that gets applied to create the pod. |
I think what might happen is the executor pod already started by the time Comet tries to tweak the memory and once the pod is up, its not possible to change the allocated memory size. What Comet does is correct, but I have some feeling it should have done earlier |
If |
We need to document the I did set the |
I'll run some tests on this today |
How is the documentation for the configuration generated? The only mention there is of the plugin is in the code base itself. I can open a PR to add the conf, if its done in the documentation itself and not auto generated. |
Depends on #643 |
Depends on #689 |
Describe the bug
Currently when submitting a job on kubernetes, the total memory of the driver or executor is the sum of the memory defined in the spark configuration and the overhead (
spark.{executor|driver}.memoryOverhead
+spark.{executor|driver}.memory
). This does not included the default values defined byspark.comet.memory.overhead.factor
.Steps to reproduce
No response
Expected behavior
No response
Additional context
No response
The text was updated successfully, but these errors were encountered: