-
Notifications
You must be signed in to change notification settings - Fork 59
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
control-service: Classify OOM as User Errors #479
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Let's merge this. But we need to make sure we can report to the user what is going on correctly (see my other comment). We can handle this in separate PR.
...rol_service/src/main/java/com/vmware/taurus/service/execution/JobExecutionResultManager.java
Outdated
Show resolved
Hide resolved
Are you sure that the
|
95bf58e
to
ec81690
Compare
...rol_service/src/main/java/com/vmware/taurus/service/execution/JobExecutionResultManager.java
Outdated
Show resolved
Hide resolved
...cts/pipelines_control_service/src/main/java/com/vmware/taurus/service/KubernetesService.java
Outdated
Show resolved
Hide resolved
...cts/pipelines_control_service/src/main/java/com/vmware/taurus/service/KubernetesService.java
Outdated
Show resolved
Hide resolved
ec81690
to
64935b2
Compare
64935b2
to
c54d9ed
Compare
...cts/pipelines_control_service/src/main/java/com/vmware/taurus/service/KubernetesService.java
Outdated
Show resolved
Hide resolved
...rol_service/src/main/java/com/vmware/taurus/service/execution/JobExecutionResultManager.java
Show resolved
Hide resolved
c54d9ed
to
5535add
Compare
...cts/pipelines_control_service/src/main/java/com/vmware/taurus/service/KubernetesService.java
Show resolved
Hide resolved
5535add
to
e8891c2
Compare
Currently, if a data job execution exceeds the allowed memory quota, the pod of the execution is immediately killed by Kubernetes and restarted. Which in turn usually results in a similar failure. This causes a Platform Error to be raised, whereas it should be classified as a User Error. This change re-classifies such errors as User Errors. Testing Done: Added unit test. Signed-off-by: Andon Andonov <[email protected]>
e8891c2
to
4c1359d
Compare
Currently, if a data job execution exceeds the allowed memory
quota, the pod of the execution is immediately killed by
Kubernetes and restarted. Which in turn usually results in a
similar failure. This causes a Platform Error to be raised, whereas
it should be classified as a User Error.
This change re-classifies such errors as User Errors.
Testing Done: Added unit test.
Signed-off-by: Andon Andonov [email protected]