Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

poor batch delay spirals out of control #1911

Closed
totaam opened this issue Jul 7, 2018 · 16 comments
Closed

poor batch delay spirals out of control #1911

totaam opened this issue Jul 7, 2018 · 16 comments

Comments

@totaam
Copy link
Collaborator

totaam commented Jul 7, 2018

Issue migrated from trac ticket # 1911

component: core | priority: minor | resolution: needinfo | keywords: latency ubuntu

2018-07-07 22:21:48: tc424 created the issue


Opening ticket after this discussion on IRC.

Using 2.3.2 on Ubuntu 16.04 and 18.04 over wifi, 1Gb/s ethernet and local I see performance which starts out as fairly poor and then spirals rapidly out of control.

Have attached bug report tool output from the 18.04 machine used in local mode (mmap enabled.)

@totaam
Copy link
Collaborator Author

totaam commented Jul 7, 2018

2018-07-07 22:25:02: tc424 uploaded file xpra-ether-1804-1404.txt (137.7 KiB)

stats log over ethernet to 14.04 server

@totaam
Copy link
Collaborator Author

totaam commented Jul 7, 2018

2018-07-07 22:29:49: tc424 uploaded file xpra-bug.zip (45.7 KiB)

Bug report tool output

@totaam
Copy link
Collaborator Author

totaam commented Jul 7, 2018

2018-07-07 22:38:59: tc424 uploaded file xpra-info.txt (132.2 KiB)

xpra info output for local connection on 18.04

@totaam
Copy link
Collaborator Author

totaam commented Jul 7, 2018

2018-07-07 22:53:02: tc424 uploaded file xpra-1804-local.txt (151.3 KiB)

stats log for local connection on 18.04

@totaam
Copy link
Collaborator Author

totaam commented Jul 7, 2018

2018-07-07 23:16:54: tc424 commented


Please read "16.04" where I've written "14.04" above - heat has turned my brain to jelly :(

@totaam
Copy link
Collaborator Author

totaam commented Jul 8, 2018

2018-07-08 05:45:57: antoine commented


The problem comes from the congestion detection code, but I would have expected you to see notifications alerting you to the problem (as per #1855).
You can disable the detection with: XPRA_BANDWIDTH_DETECTION=0 (server side) and the problems will go away.
But it would be better if you could post the -d bandwidth output so we can figure out why the code thinks that your bandwidth is limited, so we can fix the root cause instead.
Wifi and LAN should give different results, you may want to try the beta builds here: [https://xpra.org/beta/] to take advantage of the new network adapter detection code, which tolerates latency jitter on wireless links.

@totaam
Copy link
Collaborator Author

totaam commented Jul 8, 2018

2018-07-08 10:10:07: tc424 uploaded file xpra-local-congestion.txt (195.3 KiB)

log for local connection with congestion logging on

@totaam
Copy link
Collaborator Author

totaam commented Jul 8, 2018

2018-07-08 11:01:21: tc424 uploaded file xpra-no-bw-det.txt (59.4 KiB)

log of XPRA_BANDWIDTH_DETECTION=0 xpra start -d stats,bandwidth --start=xfce4-terminal

@totaam
Copy link
Collaborator Author

totaam commented Jul 8, 2018

2018-07-08 11:15:55: antoine commented


The batch delay increased rapidly because of congestion events and bandwidth limits:

2018-07-07 21:38:09,208 update_batch_delay: bandwidth-limit              : 6.77,45.88  {'used': 32245920, 'budget': 5242880}
2018-07-07 21:38:09,209 update_batch_delay: congestion                   : 1.87,8.74  {}

And sometimes also because of client-latency:

2018-07-07 21:38:23,080 update_batch_delay: client-latency               : 2.86,0.69  {'target': 8, 'weight_multiplier': 503, 'smoothing': 'sqrt', 'aim': 800, 'aimed_avg': 8178, 'div': 1000, 'avg': 233, 'recent': 446}

And to a lesser extent the client decode speed:

2018-07-07 21:38:27,286 update_batch_delay: client-decode-speed          : 2.15,4.59  {'avg': 131, 'recent': 449}

This all points towards a network / CPU performance bottleneck on the client.


You can now turn off bandwidth detection more easily, see #1912.

With this turned off, it should now be impossible to get update_batch_delay: congestion to raise the batch delay since the congestion-value should always be zero.
(it is calculated from the congestion_send_speed list which is only updated in record_congestion_event and this method is bypassed when bandwidth detection is turned off).

This won't fix the massive jitter you are seeing though:

  • this is very wrong (1.2 second for getting the paint packet ack):
2018-07-08 10:59:12,969 record_latency: took 1294.6 ms round trip, 1294.5 for echo,   14.0 for decoding of      240 pixels,       59 bytes sent over the network in 1280.4 ms, 1280.3 ms for echo
  • whereas later it goes fine for similar (tiny) packets later:
2018-07-08 10:59:30,733 record_latency: took   14.3 ms round trip,   14.3 for echo,    1.0 for decoding of    44888 pixels,       60 bytes sent over the network in   12.8 ms,   12.7 ms for echo

@totaam
Copy link
Collaborator Author

totaam commented Jul 8, 2018

2018-07-08 22:39:01: tc424 commented


Just updating to note that it looks like some / all of this problem is to do with having the "Session Info" window open.

@totaam
Copy link
Collaborator Author

totaam commented Jul 9, 2018

2018-07-09 16:00:33: antoine commented


Just updating to note that it looks like some / all of this problem is to do with having the "Session Info" window open.
We had similar problems in the past where "xpra info" (which collects the same information as the session info dialog) would slow down the system too much: and since this is used for diagnostics, trying to identify a problem would actually create new ones and hide the real cause.

The session info dialog requests updated information from the server every second.
Some of the data collected needs access to the UI and does so from the UI thread.

This regression may have been caused by the more modular code refactoring of #1761. If that's the case, the problem should mostly go away if you downgrade to 2.1 or even 1.0

@totaam
Copy link
Collaborator Author

totaam commented Sep 25, 2018

2018-09-25 07:00:59: antoine changed priority from major to minor

@totaam
Copy link
Collaborator Author

totaam commented Sep 25, 2018

2018-09-25 07:00:59: antoine commented


Lowering priority: the workaround is to close the session info window.

@totaam
Copy link
Collaborator Author

totaam commented Dec 12, 2019

2019-12-12 08:51:31: antoine commented


Does it still spiral out of control?

There are many mitigations in newer versions.

@totaam
Copy link
Collaborator Author

totaam commented Nov 4, 2020

2020-11-04 11:06:11: antoine changed status from new to closed

@totaam
Copy link
Collaborator Author

totaam commented Nov 4, 2020

2020-11-04 11:06:11: antoine set resolution to needinfo

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant