split header and data functions #244

totaam · 2016-08-06T15:42:34Z

Main benefits:

the functions can be re-used and tested more easily
it is possible to send the data without first concatenating the header and data together, which can make quite a bit of difference for performance - but note that this depends on a number of factors: size of the payload, memory pressure, network speed, naggle enabled or not, etc..
The threshold is set arbitrarily at 4KB, you could turn it into a constant and if this is set high enough you would not see any difference with the current version of the code, except it will perform better with large packets.

…ting them together

kanaka · 2016-08-10T18:29:39Z

websockify/websocket.py

-
-                self.send_parts.append(encbuf)
+                opcode = 2-int(self.base64)     #based64: opcode=1, binary: opcode=2
+                encbufs = self.send_hybi(buf, opcode, base64=self.base64, record=self.rec)



You dropped the recording functionality. Also, the condition opcode calculation, while more concise, obfuscates the actual logic that is happening; the original is preferable to me.

Oh, nevermind about the record functionality, I'm just blind I guess.

kanaka · 2016-08-10T18:31:17Z

I'm okay with this change (apart from my inline comment). Have you tested with python 2.4 and python 3.X?

kanaka · 2016-08-10T18:40:34Z

@totaam out of curiosity, how has the html5 interface in xpra been working out? I assume this was a performance enhancement discovered as part of using websockify in xpra?

totaam · 2016-08-11T06:57:31Z

Have you tested with python 2.4 and python 3.X?

Can't remember about 2.4, will re-check and get back to you.

apart from my inline comment

The opcode thing, sure. Back to a simple "if-else" instead?

how has the html5 interface in xpra been working out?

Great. In order to support some of the best features of xpra (sound forwarding, video encodings, etc), we've had to make a lot of drastic changes to the html5 code recently.
Many of those changes are due to land for the next release... overdue by 2 months already... Real soon now I am told.

The network performance issues we found turned out to be in the xpra html5 network code, but looking into websockify was useful anyway. I was going to quantify the benefits of this patch, but moved on to other things.

kanaka · 2016-08-11T14:15:25Z

Yeah, simple if-else is preferable to me.

It's been a couple years since I've played with xpra, I'll have to try it again soon.

…ting them together - V2

totaam · 2016-08-12T15:49:27Z

Done.
Also changed the header and data merge code back to using "+" as this works on all versions of Python.
Tested on CentOS5 (Python 2.4) and Fedora 24 (Python 2.7.12 and 3.5.1)

DirectXMan12 · 2016-08-23T19:34:30Z

websockify/websocket.py

-
-                self.send_parts.append(encbuf)
+                    opcode = 2
+                encbufs = self.send_hybi(buf, opcode, base64=self.base64, record=self.rec)


send_hybi doesn't look like it returns anything. Also, doesn't this make the whiel self.send_parts below redundant?

DirectXMan12 · 2016-08-23T19:35:26Z

LGTM modulo the comment above. Will merge once that's addressed.

samhed · 2016-08-23T19:52:17Z

but note that this depends on a number of factors: size of the payload, memory pressure, network speed, naggle enabled or not, etc..

Nagle was disabled for websockify in f23780e how does this affect things here?

totaam · 2016-08-24T03:08:49Z

@DirectXMan12 right you are, the send_parts code wasn't used at all either.
The commit above gets rid of it. (removes 17 lines of unused code)

@samhed this shouldn't make much of a difference even with naggle disabled: the packet join size of 4K is high enough that you won't be sending just the packet header alone. Even then, it may or may not cost an extra TCP packet for payload sizes above 4K. (depending on the OS network stack, send queue levels, etc..)
If you're concerned about the network traffic vs cpu trade-off of this change, we can bump the threshold to 32KB or higher. (at which point the potential network cost increase becomes completely negligible, whereas the extra memory copy becomes an unnecessary burden on the CPU)

What some transports do is to disable naggle when sending multiple chunks and re-enable it immediately after. (I don't think it is worth the hassle - could be worth a try)

* no daemonizing support * no SIGCHLD signal * no multiprocessing support * override copyfile so we can retry on WSAEWOULDBLOCK

DirectXMan12 · 2016-08-29T15:07:16Z

websockify/websocket.py

+        if record:
+            record.write("%s,\n" % repr("{%s{" % tdelta + encbufs[1]))
+        if len(buf)<=4096:
+            self.request.send(header+buf)


I didn't catch this earlier, but it looks live we've lost detection of partial sends here, which could be bad.

How so?
I don't see it in the quoted lines.

because socket.send isn't guaranteed to have sent all the requested data, and it returns however much it sends. Previously, the related methods here would return the result of self.request.send, and we would check to make sure the entirety of the buffer was sent. If not, we'd remember the rest, and send it later. This patch removes that functionality entirely (it can be seen here in the fact that we don't do anything with the result of self.request.send).

CendioOssman · 2017-02-01T15:19:11Z

I'm a bit sceptical about this. Can you share some tests that show the performance issue/improvement?

split header and data functions so we can send them without concatena…

06d88d0

…ting them together

kanaka reviewed Aug 10, 2016
View reviewed changes

totaam added 2 commits August 12, 2016 22:31

split header and data functions so we can send them without concatena…

212e7d0

…ting them together - V2

Merge branch 'master' of https://github.com/totaam/websockify

002a8c1

DirectXMan12 reviewed Aug 23, 2016
View reviewed changes

remove unused send_parts code

fd846b3

win32 platform support:

966ba9e

* no daemonizing support * no SIGCHLD signal * no multiprocessing support * override copyfile so we can retry on WSAEWOULDBLOCK

DirectXMan12 reviewed Aug 29, 2016
View reviewed changes

DirectXMan12 added the norespone label Mar 25, 2017

CendioOssman closed this Oct 18, 2017

totaam mentioned this pull request Jan 22, 2021

fix websockify bottleneck Xpra-org/xpra#1134

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

split header and data functions #244

split header and data functions #244

totaam commented Aug 6, 2016

kanaka Aug 10, 2016

kanaka Aug 10, 2016

kanaka commented Aug 10, 2016

kanaka commented Aug 10, 2016

totaam commented Aug 11, 2016 •

edited

Loading

kanaka commented Aug 11, 2016

totaam commented Aug 12, 2016

DirectXMan12 Aug 23, 2016

DirectXMan12 commented Aug 23, 2016

samhed commented Aug 23, 2016

totaam commented Aug 24, 2016

DirectXMan12 Aug 29, 2016

totaam Aug 29, 2016

DirectXMan12 Aug 29, 2016

CendioOssman commented Feb 1, 2017

split header and data functions #244

split header and data functions #244

Conversation

totaam commented Aug 6, 2016

kanaka Aug 10, 2016

Choose a reason for hiding this comment

kanaka Aug 10, 2016

Choose a reason for hiding this comment

kanaka commented Aug 10, 2016

kanaka commented Aug 10, 2016

totaam commented Aug 11, 2016 • edited Loading

kanaka commented Aug 11, 2016

totaam commented Aug 12, 2016

DirectXMan12 Aug 23, 2016

Choose a reason for hiding this comment

DirectXMan12 commented Aug 23, 2016

samhed commented Aug 23, 2016

totaam commented Aug 24, 2016

DirectXMan12 Aug 29, 2016

Choose a reason for hiding this comment

totaam Aug 29, 2016

Choose a reason for hiding this comment

DirectXMan12 Aug 29, 2016

Choose a reason for hiding this comment

CendioOssman commented Feb 1, 2017

totaam commented Aug 11, 2016 •

edited

Loading