-
Notifications
You must be signed in to change notification settings - Fork 2.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Crash when using Firefox with WebSockets #2349
Comments
The libasan trace talks of a heap buffer overflow in |
@danflu is a size of 127393 bytes expected from your client? Then I noticed that you are using the http/2 transport for websocket
switching to http might be worth a try. |
I will check what is this huge payload... |
Hi @atoppi, you are rigth! |
Tbh this allocation / reallocation and overflow seems coming from outside lws. You can get better insight usually running under valgrind, it often can pinpoint where it first went off the rails, long before the crash. Generally if user code does buffer allocation and that breaks, you may see bad dereferences inside lws if it hands the broken pointers or lengths to lws_write, but it doesn't indicate the problem coming from there. Just a guess, is Janus realloc logic taking care about LWS_PRE over-alloc behind the pointer passed to lws_write()? |
The buffer is at least |
You must 'hide' LWS_PRE allocated bytes behind the pointer passed to lws_write()... they can be uninitialized but must be writeable. So for a usable buffer p of length x, The reason is lws can then prepend ws or h2 framing headers without any memcpy. |
... the docs on lws_write() should make this very clear
https://libwebsockets.org/git/libwebsockets/tree/include/libwebsockets/lws-write.h#n125 |
@lws-team thanks for the clarification! I thought we were doing the right thing already, because as Tristan said we allocate (same thing when we realloc) these amount of bytes:
and then pass
Do I need fences in the code to behave differently, e.g., for lws 3 or 4, or can we safely assume all lws versions will use that properly? |
Ok, looks like I really failed to keep up to date 🙂 But this means that, despite the legacy syntax (that I should fix), I'm apparently already doing the right thing: we do allocate the buffer as
and I'm passing the right buffer to |
What was said was just
Six years ago or so The combination of a "buffer overflow" reported and a blowup in free if something was reallocated there makes it sound like all may not well in that path... it could be some other problem, if lws still suspected try it against master lws and see if problems still coming. |
Got it, thanks! Yes, you were absolutely right in pointing out we were using very old semantics, I'll fix it right away. I just wanted to make sure I wasn't doing something wrong in the send code despite that: the realloc code follows the same logic as the alloc code, so I think it should be fine, but I'll double check. I think the cause of the issue here was the incredibly long buffer that was being passed to the transport, but we're waiting for more details on that. If this was the issue, I should be able to replicate it locally to investigate. |
@lws-team @danflu I have managed to reproduce the issue by enabling HTTP2 in lws configuration. Steps:
// Helper method to create random identifiers (e.g., transaction)
Janus.randomString = function(len) {
var charSet = 'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789';
var randomString = '';
for (var i = 0; i < 130000; i++) {
var randomPoz = Math.floor(Math.random() * charSet.length);
randomString += charSet.substring(randomPoz,randomPoz+1);
}
return randomString;
};
Janus will crash, here the stack trace with libasan https://pastebin.com/2yTK4Zc9 One of the following will be a workaround:
@lws-team should you need any further debugging just ask |
Forgot to mention I'm on |
Crash with |
Can you try a quick hack on your user code to allocate and point at (LWS_PRE + 3) instead of LWS_PRE? I am not sure if I understand what asan is telling me but it sounds like it's complaining we are using 3 bytes behind. In the mode you describe actually lws is doing RFC8441 encapsulation of ws inside h2 frames... the layout is
by the time it is sent, where Normally the ws header is not huge but actually, it uses variable length integer coding for the bulk payload length... if it faces a monster payload it may overflow LWS_PRE with additional bytes used to encode the length... |
I've replaced Here's the log from |
During the h2 negotiation Firefox leaves it to use the default h2 max frame size of 16384... it's trying to shove 130K down there in one frame, and the client should hang up on it for that which seems to be what happens. So I think we understand what goes on... 1) 16 bytes LWS_PRE isn't enough if you have h2 + RFC8441 plus a huge length, 2) but RFC8441 huge length won't fly anyway in an h2 encapsulation framing limited to 16KB itself In other words we can't support these huge ws fragments directly. I'll try to add something to complain and reject it, but there isn't a "fix" at lws level. What should happen is if you want to forward a huge ws message, break it up into ideally ~2mtu (eg, 3KB) ws fragments using ws fragmentation arrangements and send those with one writeable / lws_write() each https://tools.ietf.org/html/rfc6455#section-5.4 in lws_write(), there are flags to express the different fragmentation states, in recent lws there is a helper https://libwebsockets.org/git/libwebsockets/tree/include/libwebsockets/lws-write.h#n221-246 all ws clients understand the native ws fragmentation and you can produce endless messages that way, without needing to know the whole message length when you get started. You can either use that to compute the flags or refer to what it does to pick your own flags directly. Then it'll all hang together keeping LWS_PRE at 16, ws-over-h2 and huge ws messages (made up of not-huge ws fragments). |
Hi @lws-team, thanks a lot for your clarifications! |
@lws-team thanks for the detailed reply. However I don't understand why this happens only in in WSS mode. |
You can only negotiate h2 by alpn, over tls... otherwise you never can meet this encapsulation. |
@lws-team ok thanks again for clarifying and suggesting a solution. |
@lws-team I'm looking into how to start using the fragmentation stuff, and it does seem relatively easy indeed. There's one thing that I'm a bit confused about though. When sending a packet, I can set in the flags that I'm sending the last fragment, e.g., if I'm sending all the data that's left: there's no guarantee all the data will actually be sent, though, as for instance |
For a few years lws will handle partial writes transparently, it will only return a number less than you asked if the connection is dead. It will malloc up a buffer for any remainder, and suppress WRITEABLE on the connection while sending it in the background. When it has flushed the remainder it will allow WRITEABLE to be seen by the connection again. It's not efficient to copy stuff around into buffers like that, but it will handle it. That's why there is the recommendation for 2 x mtu, this is the value that send() or write() will almost always accept on linux. You don't have to worry about the flags in this partial case, the flags are not per tcp fragment (which you can't control) but per ws fragment, each of which can extend over a large number of tcp fragments. So if it sends the first part of a ws fragment with FIN on it, the peer won't process it as the end of the ws fragment until it receives the expected whole length of the fragment. When it sees it has that, and if it originally came with a FIN, if it's a browser, typically it will then present the whole ws message - all of the fragments it had been saving up - at once to the client. |
Got it, thanks! Then I'll reuse the existing buffering approach we have, to send fragments instead of what it does right now. I'll try to come up with a PR people can test soon. |
Hello Janus Team,
I'm getting random server crashes when using Firefox 80.0.1 (64-bit) + Secure WebSockets as transport.
Using janus v0.10.5 (b26cbb1)
Pastebin link with janus log (level 7) + libasan output
https://pastebin.com/0NdfQ1J8
Pastebin link with core stack trace:
https://pastebin.com/FWw8gJLE
It appears to be something related with buffer overflow when realocating ws_client buffer.
The crash always happens consistently in the same place on file janus_websockets.c at line 904:
g_free(ws_client->buffer);
Please, let me know if there is anything else I can do to help debugging this issue.
Thanks a lot for your attention,
Daniel
The text was updated successfully, but these errors were encountered: