wsagent abruptly stops requiring workspace restart #11932

mjshashank · 2018-11-14T11:19:15Z

Description

I have Che running on kubernetes with 100+ workspaces. Lately I have been noticing that for a few workspaces the ws-agent becomes unreachable leading to a notification prompting the user to restart his/her workspace.

This is a significant number of workspaces (10% of running workspaces daily) and restarts are cumbersome for our use-case.

We have ruled out the resource constraints. Mem/CPU/Disk peak usages are all below 50%.

Looking through the bootstrapper/catalina logs hasn't yielded anything so far.

This is a sporadic issue, seems to occur randomly.

Are there any leads regarding a possible root cause for this issue? Any help is much appreciated.

Env
Che: 6.9.0
GKE Kubernetes(1.9)
Ingress controller: Traefik:1.6.6

ghost · 2018-11-15T20:09:07Z

@mjshashank i wonder if it's related to ingresses. Client runs periodical checks if ws agent is alive, if it's unreachable then server tries to reach it. I'd take a look at workspace ingresses when the issue occurs again.

mjshashank · 2018-11-16T09:07:43Z

@eivantsov Thank you for your reply. I looked into this further and realised workspace pod are being rescheduled as part of kubernetes autoscaling and the new pod that comes up does not have the bootstrapper binary or the command from the che server to execute the same. Hence the agents are all down and the workspace is unusable.

Is this a known issue? If so, is there an expected way to handle this at scale so that the necessary startup steps (like bootstrapper) are done to keep the workspace usable?

Note: This seems to be an issue since workspaces were converted to deployments since cluster autoscalers don't reschedule individual pods not backed by deployments.

skabashnyuk · 2018-11-17T17:59:51Z

can you try fresh Che version?

mjshashank · 2018-11-19T05:07:25Z

@skabashnyuk Will do and get back to you. This issue has been addressed is it?

ghost · 2018-11-19T05:35:01Z

@skabashnyuk I do not think it has been addressed, as least with Che 6.

If you try Che 6 without ws-agent and other agents, the issue is gone since provisioning happens in the entrypoint, thus, pod restart will not cause anything disrupting for a workspace.

mjshashank · 2018-11-19T05:44:08Z

@eivantsov Got it. But ws-agent is necessary for a usable IDE right?

Is it just the bootstrapping process that needs to be performed on the restarted pod? If that is the case, would a hacky hotfix to detect the same and perform it through something like a kubectl exec help us keep the workspace usable?

ghost · 2018-11-19T05:50:17Z

@mjshashank yes, it's a must for Che 6. In Che 7, all tooling is launched as sidecars (containers in one pod), and there are no execs and interfering with runtime. Everything happens in containers entrypoints.

gnoejuan · 2018-11-29T06:00:51Z

~~I'll update this comment around 8 a.m Central Time tomorrow morning with more information. My local time is currently midnight, but I'm at work.~~

I experienced an unreachable ws-agent as well. Che then prompted for a restart and proceeded as expected.

Ubuntu 18.04.1 LTS
~~Pretty sure~~ Che 6.14.2 ~~Will update.~~
Docker version 18.09.0, build 4d60db4

It's my own server at home, nothing special.

Additional Update:

My docker installation is affected by docker/for-linux#476 (comment) and I applied the solution found later in the thread


remove the -H fd:// from the ExecStart (if you don't have other -H options set)
change to -H unix:// instead.```

skabashnyuk · 2018-12-01T15:46:53Z

@mjshashank @gnoejuan I can suggest to set up centralized logs collection for workspaces. In this way, we will have some data about what has happened.

ghost added the kind/question Questions that haven't been identified as being feature requests or bugs. label Nov 15, 2018

gorkem closed this as completed Aug 24, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

wsagent abruptly stops requiring workspace restart #11932

wsagent abruptly stops requiring workspace restart #11932

mjshashank commented Nov 14, 2018

ghost commented Nov 15, 2018

mjshashank commented Nov 16, 2018 •

edited

Loading

skabashnyuk commented Nov 17, 2018

mjshashank commented Nov 19, 2018

ghost commented Nov 19, 2018

mjshashank commented Nov 19, 2018

ghost commented Nov 19, 2018

gnoejuan commented Nov 29, 2018 •

edited

Loading

skabashnyuk commented Dec 1, 2018

wsagent abruptly stops requiring workspace restart #11932

wsagent abruptly stops requiring workspace restart #11932

Comments

mjshashank commented Nov 14, 2018

Description

ghost commented Nov 15, 2018

mjshashank commented Nov 16, 2018 • edited Loading

skabashnyuk commented Nov 17, 2018

mjshashank commented Nov 19, 2018

ghost commented Nov 19, 2018

mjshashank commented Nov 19, 2018

ghost commented Nov 19, 2018

gnoejuan commented Nov 29, 2018 • edited Loading

skabashnyuk commented Dec 1, 2018

mjshashank commented Nov 16, 2018 •

edited

Loading

gnoejuan commented Nov 29, 2018 •

edited

Loading