Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GUI freezes causes timeouts to be incorrectly reported #6660

Closed
2 tasks
xvcgreg opened this issue May 12, 2023 · 9 comments
Closed
2 tasks

GUI freezes causes timeouts to be incorrectly reported #6660

xvcgreg opened this issue May 12, 2023 · 9 comments
Assignees

Comments

@xvcgreg
Copy link

xvcgreg commented May 12, 2023

Discord username

No response

What type of issue is this?

Permanent – Occurring repeatably

Is this issue blocking you from using Enso?

  • Yes, I can't use Enso because of this issue.

Is this a regression?

  • Yes, previous version of Enso did not have this issue.

What issue are you facing?

When I try to exit node creation (Component browser open and exit tab->esc) it leaves blank node for a while then either writes "nothing" inside the node or removes it from the canvas.

2305121355_shareX.mp4
2305121357_shareX.mp4

Expected behaviour

closing CB without node creation is instant and leaves no traces (nodes behind)

How we can reproduce it?

  1. open larger project
    Unnamed_5 (2).zip

  2. drag connection from an output port

  3. cancel node creation

Screenshots or screencasts

No response

Enso Version

2023.5.12 nightly

Browser or standalone distribution

Standalone distribution

Browser Version or standalone distribution

standalone

Operating System

Windows

Operating System Version

Win11pro 22H2 22621.1555

Hardware you are using

12th Gen Intel(R) Core(TM) i9-12900HK / RTX3060 Laptop / Nvidia Drivers 531.68

@hubertp
Copy link
Collaborator

hubertp commented May 15, 2023

I can confirm the timeouts, will try to find the source of them.

@jdunkerley jdunkerley moved this from ❓New to 📤 Backlog in Issues Board May 16, 2023
@Akirathan
Copy link
Member

I cannot reproduce it on my machine on the Enso_201 project. But this really seems like an issue with GUI, even though there are some timeouts of messages to/from LS.

@Frizi
Copy link
Contributor

Frizi commented May 18, 2023

I did reproduce it, but I needed to enable CPU throttling in the browser. Then I didn't even need to open the component browser, and I have got a timeout. This time I was running the project manager with verbose logs enabled, so I was able to pinpoint the issue.

Here is an interesting case:
image

The IDE has reported a timeout of a getComponentGroups request. We can see the request websocket frame being sent at 13:35:19.292 (local time - UTC+2). Then the response is received at 13:35:58.214, which is about 39 seconds later. Hence the timeout. But interestingly, the language server logs tell a different story:

image
Here we can see that the request was received by the language server at 11:35:19.293 (UTC), one milliseconds later than seen in the browser, which is about right. But the processing did finish and the response was sent at 11:35:19.364, just 71ms later.

This means that the browser wasn't actually able to process the response for a good while, and our in-app timeout logic was triggering before yielding any CPU to the browser event loop.

This explains why we can see that the timeouts are usually hit when the application is completely frozen for a while, and also makes it a purely GUI-side issue, mostly related to performance but also could be treated as a buggy timeout logic.

@Frizi Frizi changed the title Language Server timing out - big lag on empty node removal after canceling creation with ESC GUI freezes causes timeouts to be incorrectly reported May 18, 2023
@Akirathan
Copy link
Member

Akirathan commented May 19, 2023

LS says it sent the response at 11:35:19.364, and in the Network web console tab, we can see it was received at 11:35:58.214 - 39 seconds later. Based on a quick chat with @4e6, this is very suspicious. The timestamps of the received messages in the web console should correspond to the time when the browser received them, not to the time when the IDE processed them. Assuming there, is nothing wrong with the network, since this all happens on localhost, how is this possible?

The only other explanation is that the timestamp reported in the LS logs is not precise, and the actual response was sent later than logged The portion of the response logic that logs the timestamp of the message being sent is in https://github.com/enso-org/enso/blob/develop/lib/scala/json-rpc-server/src/main/scala/org/enso/jsonrpc/JsonRpcServer.scala#L84.

GitHub
Hybrid visual and textual functional programming. Contribute to enso-org/enso development by creating an account on GitHub.

@farmaazon
Copy link
Contributor

The timestamps of the received messages in the web console should correspond to the time when the browser received them, not to the time when the IDE processed them

Are you sure about it? Given that web is an extremely single-threaded environment, I just assumed the browser does not handle incoming messages until it gets back the steering.

But if there is a guarantee (or you just tested that), then indeed looks suspicious.

@farmaazon
Copy link
Contributor

I would start with making sure IDE, having both message received and timeout fired, prefers the former.

@Akirathan
Copy link
Member

Are you sure about it? Given that web is an extremely single-threaded environment, I just assumed the browser does not handle incoming messages until it gets back the steering.

I am not sure about that. So, the most probable scenario is that IDE does process the message very slowly because it is blocked by other stuff.

@Frizi
Copy link
Contributor

Frizi commented May 24, 2023

Are you sure about it? Given that web is an extremely single-threaded environment, I just assumed the browser does not handle incoming messages until it gets back the steering.

Yep, this is what happens. Everything that has to touch the JS environment (e.g. async http responses or ws frames) is handled through the JS event reactor as an independent task. The requests though are send immediately, as the action of actually sending it doesn't have any observable effect on the JS side. The scheduled reactor tasks can only execute one at a time, so if we keep processing one for >10s, the timeout can fire before the response processing task is executed.

I would start with making sure IDE, having both message received and timeout fired, prefers the former.

I believe that this is the general solution, but specifics can be a bit tricky to get right. Currently, the response and timeout are handled like this:

let millis = self.timeout().as_millis();
future::select(ret, sleep(self.timeout()).boxed_local()).map(move |either| match either {
future::Either::Left((x, _)) => x,
future::Either::Right((_, _)) => Err(RpcError::TimeoutError { millis }),
})

The timeout is created with a sleep method, which is our wrapper around gloo_timers::future::TimeoutFuture, which itself registers a callback using javascripts setTimeout. Notably, it also spawns a new JS event reactor task. So the issue is that the timeout task is still sometimes processed earlier than the websocket frame handling task. Unfortunately, the ordering between those two is actually non-deterministic. The timers are a special case within the browser event loop, because those are processed outside of the usual FIFO queue. Instead, those are scheduled to try to match given timing guarantees as closely as possible, using different scheduling strategies. They will often inject themselves between other events that were scheduler earlier. A peek into some possibilities can be found here:
https://source.chromium.org/chromium/chromium/src/+/refs/heads/main:base/task/delay_policy.h;l=13;drc=fbfe4e0aa1584885048b3e3c75415a68702d7cc1;bpv=0;bpt=1

So, we have to do something more involved than a single timer. We could try waiting an extra render frame after the setTimeout timer has executed, and only then actually handle the timeout as an error. That would allow the browser some more time to empty its event queue. Even better, we could actually register an "idle callback" after the timeout. That should give us pretty strong intication that the real event processing work has already been done.

@jdunkerley
Copy link
Member

Closing as doesn't seem to be reproducible in newest develop.

@jdunkerley jdunkerley closed this as not planned Won't fix, can't repro, duplicate, stale Aug 31, 2023
@github-project-automation github-project-automation bot moved this from 📤 Backlog to 🟢 Accepted in Issues Board Aug 31, 2023
@jdunkerley jdunkerley moved this from 🟢 Accepted to 🗄️ Archived in Issues Board Aug 31, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Archived in project
Development

No branches or pull requests

6 participants