-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
wsagent abruptly stops requiring workspace restart #11932
Comments
@mjshashank i wonder if it's related to ingresses. Client runs periodical checks if ws agent is alive, if it's unreachable then server tries to reach it. I'd take a look at workspace ingresses when the issue occurs again. |
@eivantsov Thank you for your reply. I looked into this further and realised workspace pod are being rescheduled as part of kubernetes autoscaling and the new pod that comes up does not have the bootstrapper binary or the command from the che server to execute the same. Hence the agents are all down and the workspace is unusable. Is this a known issue? If so, is there an expected way to handle this at scale so that the necessary startup steps (like bootstrapper) are done to keep the workspace usable? Note: This seems to be an issue since workspaces were converted to deployments since cluster autoscalers don't reschedule individual pods not backed by deployments. |
can you try fresh Che version? |
@skabashnyuk Will do and get back to you. This issue has been addressed is it? |
@skabashnyuk I do not think it has been addressed, as least with Che 6. If you try Che 6 without ws-agent and other agents, the issue is gone since provisioning happens in the entrypoint, thus, pod restart will not cause anything disrupting for a workspace. |
@eivantsov Got it. But ws-agent is necessary for a usable IDE right? Is it just the bootstrapping process that needs to be performed on the restarted pod? If that is the case, would a hacky hotfix to detect the same and perform it through something like a |
@mjshashank yes, it's a must for Che 6. In Che 7, all tooling is launched as sidecars (containers in one pod), and there are no execs and interfering with runtime. Everything happens in containers entrypoints. |
I experienced an unreachable ws-agent as well. Che then prompted for a restart and proceeded as expected. Ubuntu 18.04.1 LTS It's my own server at home, nothing special. Additional Update: My docker installation is affected by docker/for-linux#476 (comment) and I applied the solution found later in the thread
|
@mjshashank @gnoejuan I can suggest to set up centralized logs collection for workspaces. In this way, we will have some data about what has happened. |
Description
I have Che running on kubernetes with 100+ workspaces. Lately I have been noticing that for a few workspaces the ws-agent becomes unreachable leading to a notification prompting the user to restart his/her workspace.
This is a significant number of workspaces (10% of running workspaces daily) and restarts are cumbersome for our use-case.
We have ruled out the resource constraints. Mem/CPU/Disk peak usages are all below 50%.
Looking through the bootstrapper/catalina logs hasn't yielded anything so far.
This is a sporadic issue, seems to occur randomly.
Are there any leads regarding a possible root cause for this issue? Any help is much appreciated.
Env
Che: 6.9.0
GKE Kubernetes(1.9)
Ingress controller: Traefik:1.6.6
The text was updated successfully, but these errors were encountered: