Infinite retransmission on the first failed packet #240

RuiCunhaM · 2021-11-04T16:04:49Z

Hello!
When doing some tests I stumble on a somewhat weird behavior, that I can't understand if it's a bug or a intended behavior.

I conducted the test in the following network topology:

On host2 I have a simple message server and on host1 a message client. After establishing a connection from host1 -> host2, everything works as expected, host1 announces both available interfaces, etc...

The interesting part happens during the communication, when sending messages from host1 to host2, these travel through r1, the initial subflow, then, I proceed to introduce 100% packet loss in the BLUE link. Again as expected the first message will not reach the server, r1 will actually return an ICMP message saying that the host is unreachable, and eventually, the original message is sent using the upper subflow, and the server will receive it and acknowledge it. The problem starts here, even tho the client has received the ACK from the server through the second subflow, it keeps re-transmitting over and over the message in the first subflow, it never stops.

Another interesting thing is that this only happens with the first message/packet that fails, if I send more messages from this point (still with 100% loss in the blue link), messages will go through the second subflow, with no problem at all (just some extra delay), while the re-transmission attempt continues ONLY with the first failed message.

To be clear this behavior stops if I turn the interface on and off, this is only observed when causing failures in the link.

So my doubt is if this is some bug, or if it's a intended behavior in order to "probe" the link, testing if this one is back online or not.
I'm sorry if this is something trivial that I'm missing but I would expect that after the ACK the re-transmission would stop.

mjmartineau · 2021-11-04T17:29:55Z

@RuiCunhaM, it sounds like you are seeing TCP-level retransmission on the blue link and this is intended behavior. In short, a MPTCP-level ACK of your sent data on the upper subflow does not affect TCP-level retransmissions on the lower subflow.

To understand this, it helps to think of the upper and lower subflows as completely separate TCP connections. When your blue link goes to 100% packet loss, the lower subflow does not get TCP ACKs, so it keeps resending.

The MPTCP layer gets a timeout, and tries to resend. It sees that the lower subflow is stalled and sends on the other one. The upper subflow's TCP connection sees the TCP ACK (so does not retransmit), and also passes the MPTCP ACK up to the MPTCP socket so transmission can continue. New data gets sent on the upper subflow because the MPTCP layer can see that the lower subflow is stalled.

Meanwhile, the lower subflow still knows it sent out a TCP packet that has not been ACKed, and it cannot move ahead until its data stream is acknowledged. It keeps retransmitting. If the blue link is restored, the peer will receive the data and send a TCP ACK, use the MPTCP sequence number to determine it is seeing duplicate data, and discard the data.

The lower subflow can only be closed, or continue retransmitting. Once the data has been sent, it can't be skipped without corrupting the TCP stream.

matttbe · 2021-11-05T09:50:16Z

Thanks Mat for the good description!

it keeps re-transmitting over and over the message in the first subflow, it never stops.

Like any TCP connection in this situation, it should stop at some points:

Either because it has reached the retransmission's limit: see tcp_retries2 sysctl
or because it has reached a timeout: Netfliter (e.g. ConnTrack) or userspace configuration.

Does this answer your question? Can we close this ticket?

RuiCunhaM · 2021-11-05T10:09:36Z

In short, a MPTCP-level ACK of your sent data on the upper subflow does not affect TCP-level retransmissions on the lower subflow.

I see...My mistake was assuming that a MPTCP-level ACK could also affect the TCP-level. But obviously this is probably not possible in the upstream implementation since MPTCP cannot interfere with the TCP-level.
Anyways thank you very much for the explanation.

Does this answer your question? Can we close this ticket?

Yes it does, thank you.

matttbe · 2021-11-05T10:49:29Z

But obviously this is probably not possible in the upstream implementation since MPTCP cannot interfere with the TCP-level.

The main reason is that MPTCP is an extension to TCP and we cannot "break" TCP: intermediate hosts will see each subflow as an independent TCP connection. It means that if a packet is lost, it needs to be retransmitted otherwise when the link will be up again, you will have a "whole" and an intermediate "smart" host could close the connection.

Add a big batch of test coverage to assert all aspects of the tcx opts attach, detach and query API: # ./vmtest.sh -- ./test_progs -t tc_opts [...] #238 tc_opts_after:OK #239 tc_opts_append:OK #240 tc_opts_basic:OK #241 tc_opts_before:OK #242 tc_opts_chain_classic:OK #243 tc_opts_demixed:OK #244 tc_opts_detach:OK #245 tc_opts_detach_after:OK #246 tc_opts_detach_before:OK #247 tc_opts_dev_cleanup:OK #248 tc_opts_invalid:OK #249 tc_opts_mixed:OK #250 tc_opts_prepend:OK #251 tc_opts_replace:OK #252 tc_opts_revision:OK Summary: 15/0 PASSED, 0 SKIPPED, 0 FAILED Signed-off-by: Daniel Borkmann <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Alexei Starovoitov <[email protected]>

Add several new tcx test cases to improve test coverage. This also includes a few new tests with ingress instead of clsact qdisc, to cover the fix from commit dc644b5 ("tcx: Fix splat in ingress_destroy upon tcx_entry_free"). # ./test_progs -t tc [...] #234 tc_links_after:OK #235 tc_links_append:OK #236 tc_links_basic:OK #237 tc_links_before:OK #238 tc_links_chain_classic:OK #239 tc_links_chain_mixed:OK #240 tc_links_dev_cleanup:OK #241 tc_links_dev_mixed:OK #242 tc_links_ingress:OK #243 tc_links_invalid:OK #244 tc_links_prepend:OK #245 tc_links_replace:OK #246 tc_links_revision:OK #247 tc_opts_after:OK #248 tc_opts_append:OK #249 tc_opts_basic:OK #250 tc_opts_before:OK #251 tc_opts_chain_classic:OK #252 tc_opts_chain_mixed:OK #253 tc_opts_delete_empty:OK #254 tc_opts_demixed:OK #255 tc_opts_detach:OK #256 tc_opts_detach_after:OK #257 tc_opts_detach_before:OK #258 tc_opts_dev_cleanup:OK #259 tc_opts_invalid:OK #260 tc_opts_mixed:OK #261 tc_opts_prepend:OK #262 tc_opts_replace:OK #263 tc_opts_revision:OK [...] Summary: 44/38 PASSED, 0 SKIPPED, 0 FAILED Signed-off-by: Daniel Borkmann <[email protected]> Link: https://lore.kernel.org/r/8699efc284b75ccdc51ddf7062fa2370330dc6c0.1692029283.git.daniel@iogearbox.net Signed-off-by: Martin KaFai Lau <[email protected]>

matttbe added the question label Nov 5, 2021

RuiCunhaM closed this as completed Nov 5, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Infinite retransmission on the first failed packet #240

Infinite retransmission on the first failed packet #240

RuiCunhaM commented Nov 4, 2021

mjmartineau commented Nov 4, 2021 •

edited

Loading

matttbe commented Nov 5, 2021

RuiCunhaM commented Nov 5, 2021 •

edited

Loading

matttbe commented Nov 5, 2021

Infinite retransmission on the first failed packet #240

Infinite retransmission on the first failed packet #240

Comments

RuiCunhaM commented Nov 4, 2021

mjmartineau commented Nov 4, 2021 • edited Loading

matttbe commented Nov 5, 2021

RuiCunhaM commented Nov 5, 2021 • edited Loading

matttbe commented Nov 5, 2021

mjmartineau commented Nov 4, 2021 •

edited

Loading

RuiCunhaM commented Nov 5, 2021 •

edited

Loading