-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
rcmgr: connection-level memory base limits are too small for performant QUIC transfers #1706
Comments
Looks like autoscaling is doing it wrong; at the very least it should allow the full 16M per conn, but with the sum of conns averaging at 1M. The other half is returning memory from the transport or oversubscribing with some kill semantics. The attack you describe is so primitive there are many ways to counter; a simple timeout will do. What does google do in their servers? I would be very surprised if you could just OOM them by opening 1M conns. |
There are many variations of this attack. Essentially, you’re in slow-loris territory here, which it’s notoriously difficult to defend against in the general case. Google doesn’t care about DoS attacks, they’ll just load-balance to more machines (Google doesn’t even care about DoS attacks in their code that let you take out one of their servers by sending 100 kb of data. I reported such vulnerabilities to them multiple times, and they didn’t even consider me for their bug bounty program. Won’t do again.).
That would mean starting with a very small receive buffer, which would be bad for performance. |
You misunderstand, we can set the connection limit to 16M, but the total aggregate to less than conns x 16M (say 1M pet conn). To make things maximally effective, we need ways to return memory. |
How would that prevent the QUIC transport / the muxer from consuming all the memory in the system scope? |
actually thats the peer scope limit, conn memory not relevant. |
It can only consume up to 16M per peer. What is important here, is finding ways to return memory/shrink windows/buffers. |
Our connection-level memory base limit is 1 MB: https://github.com/libp2p/go-libp2p-resource-manager/blob/9bb1bbdc782e0f7c43768e1946e659432edbc6b0/limit_defaults.go#L507-L513
quic-go starts with a stream flow control window of 512 kB and a connection flow control window of 768 kB: https://github.com/lucas-clemente/quic-go/blob/8c0c481da1644f9934df399c50649d65967d7f22/internal/protocol/params.go#L24-L28
This means that with the base limit, we won't be allowed to ever increase the flow control limit of a QUIC connection. Throughput is then limited by flow control.
How bad is this?
I ran some benchmark between two servers (US East to Europe) located about 80ms RTT apart. With TCP, I was able to achieve transfer speeds of roughly 33 MB/s, while QUIC was only able to achieve 6 MB/s.
The situation gets better for larger servers, as limit autoscaling will allocate more memory to the connection. I don't think QUIC connections being significantly slower than TCP connections is tenable on any size of machine though.
Problem Analysis
The problem here is the following:
Possible Solution
I have the feeling that this brings us back to #1708: Memory management is really only useful for muxers on the libp2p layer.
If we limit memory accounting to muxers (the only real consumers of memory using the rcmgr), and introduce a dedicated muxer memory pool, this would give us 128 MB for QUIC connections. Each connection would start with 768 kB of connection flow control limit, consuming 96 MB of memory (worst case). With the remaining 32 MB some QUIC connections (the ones that are currently getting used for high BDP transfers) would be able to increase their flow control windows.
We might also be able to teach quic-go to decrease flow control windows once utilization of the connection goes down, so that a long-living connection that transferred a lot of data an hour ago doesn't keep occupying a lot of data.
cc @vyzo @MarcoPolo @Stebalien
The text was updated successfully, but these errors were encountered: