Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Infinite retransmission on the first failed packet #240

Closed
RuiCunhaM opened this issue Nov 4, 2021 · 4 comments
Closed

Infinite retransmission on the first failed packet #240

RuiCunhaM opened this issue Nov 4, 2021 · 4 comments
Labels

Comments

@RuiCunhaM
Copy link

Hello!
When doing some tests I stumble on a somewhat weird behavior, that I can't understand if it's a bug or a intended behavior.

I conducted the test in the following network topology:

topo

On host2 I have a simple message server and on host1 a message client. After establishing a connection from host1 -> host2, everything works as expected, host1 announces both available interfaces, etc...

The interesting part happens during the communication, when sending messages from host1 to host2, these travel through r1, the initial subflow, then, I proceed to introduce 100% packet loss in the BLUE link. Again as expected the first message will not reach the server, r1 will actually return an ICMP message saying that the host is unreachable, and eventually, the original message is sent using the upper subflow, and the server will receive it and acknowledge it. The problem starts here, even tho the client has received the ACK from the server through the second subflow, it keeps re-transmitting over and over the message in the first subflow, it never stops.

Another interesting thing is that this only happens with the first message/packet that fails, if I send more messages from this point (still with 100% loss in the blue link), messages will go through the second subflow, with no problem at all (just some extra delay), while the re-transmission attempt continues ONLY with the first failed message.

To be clear this behavior stops if I turn the interface on and off, this is only observed when causing failures in the link.

So my doubt is if this is some bug, or if it's a intended behavior in order to "probe" the link, testing if this one is back online or not.
I'm sorry if this is something trivial that I'm missing but I would expect that after the ACK the re-transmission would stop.

@mjmartineau
Copy link
Member

mjmartineau commented Nov 4, 2021

@RuiCunhaM, it sounds like you are seeing TCP-level retransmission on the blue link and this is intended behavior. In short, a MPTCP-level ACK of your sent data on the upper subflow does not affect TCP-level retransmissions on the lower subflow.

To understand this, it helps to think of the upper and lower subflows as completely separate TCP connections. When your blue link goes to 100% packet loss, the lower subflow does not get TCP ACKs, so it keeps resending.

The MPTCP layer gets a timeout, and tries to resend. It sees that the lower subflow is stalled and sends on the other one. The upper subflow's TCP connection sees the TCP ACK (so does not retransmit), and also passes the MPTCP ACK up to the MPTCP socket so transmission can continue. New data gets sent on the upper subflow because the MPTCP layer can see that the lower subflow is stalled.

Meanwhile, the lower subflow still knows it sent out a TCP packet that has not been ACKed, and it cannot move ahead until its data stream is acknowledged. It keeps retransmitting. If the blue link is restored, the peer will receive the data and send a TCP ACK, use the MPTCP sequence number to determine it is seeing duplicate data, and discard the data.

The lower subflow can only be closed, or continue retransmitting. Once the data has been sent, it can't be skipped without corrupting the TCP stream.

@matttbe
Copy link
Member

matttbe commented Nov 5, 2021

Thanks Mat for the good description!

it keeps re-transmitting over and over the message in the first subflow, it never stops.

Like any TCP connection in this situation, it should stop at some points:

  • Either because it has reached the retransmission's limit: see tcp_retries2 sysctl
  • or because it has reached a timeout: Netfliter (e.g. ConnTrack) or userspace configuration.

Does this answer your question? Can we close this ticket?

@RuiCunhaM
Copy link
Author

RuiCunhaM commented Nov 5, 2021

In short, a MPTCP-level ACK of your sent data on the upper subflow does not affect TCP-level retransmissions on the lower subflow.

I see...My mistake was assuming that a MPTCP-level ACK could also affect the TCP-level. But obviously this is probably not possible in the upstream implementation since MPTCP cannot interfere with the TCP-level.
Anyways thank you very much for the explanation.

Does this answer your question? Can we close this ticket?

Yes it does, thank you.

@matttbe
Copy link
Member

matttbe commented Nov 5, 2021

But obviously this is probably not possible in the upstream implementation since MPTCP cannot interfere with the TCP-level.

The main reason is that MPTCP is an extension to TCP and we cannot "break" TCP: intermediate hosts will see each subflow as an independent TCP connection. It means that if a packet is lost, it needs to be retransmitted otherwise when the link will be up again, you will have a "whole" and an intermediate "smart" host could close the connection.

jenkins-tessares pushed a commit that referenced this issue Jul 20, 2023
Add a big batch of test coverage to assert all aspects of the tcx opts
attach, detach and query API:

  # ./vmtest.sh -- ./test_progs -t tc_opts
  [...]
  #238     tc_opts_after:OK
  #239     tc_opts_append:OK
  #240     tc_opts_basic:OK
  #241     tc_opts_before:OK
  #242     tc_opts_chain_classic:OK
  #243     tc_opts_demixed:OK
  #244     tc_opts_detach:OK
  #245     tc_opts_detach_after:OK
  #246     tc_opts_detach_before:OK
  #247     tc_opts_dev_cleanup:OK
  #248     tc_opts_invalid:OK
  #249     tc_opts_mixed:OK
  #250     tc_opts_prepend:OK
  #251     tc_opts_replace:OK
  #252     tc_opts_revision:OK
  Summary: 15/0 PASSED, 0 SKIPPED, 0 FAILED

Signed-off-by: Daniel Borkmann <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Alexei Starovoitov <[email protected]>
matttbe pushed a commit that referenced this issue Aug 17, 2023
Add several new tcx test cases to improve test coverage. This also includes
a few new tests with ingress instead of clsact qdisc, to cover the fix from
commit dc644b5 ("tcx: Fix splat in ingress_destroy upon tcx_entry_free").

  # ./test_progs -t tc
  [...]
  #234     tc_links_after:OK
  #235     tc_links_append:OK
  #236     tc_links_basic:OK
  #237     tc_links_before:OK
  #238     tc_links_chain_classic:OK
  #239     tc_links_chain_mixed:OK
  #240     tc_links_dev_cleanup:OK
  #241     tc_links_dev_mixed:OK
  #242     tc_links_ingress:OK
  #243     tc_links_invalid:OK
  #244     tc_links_prepend:OK
  #245     tc_links_replace:OK
  #246     tc_links_revision:OK
  #247     tc_opts_after:OK
  #248     tc_opts_append:OK
  #249     tc_opts_basic:OK
  #250     tc_opts_before:OK
  #251     tc_opts_chain_classic:OK
  #252     tc_opts_chain_mixed:OK
  #253     tc_opts_delete_empty:OK
  #254     tc_opts_demixed:OK
  #255     tc_opts_detach:OK
  #256     tc_opts_detach_after:OK
  #257     tc_opts_detach_before:OK
  #258     tc_opts_dev_cleanup:OK
  #259     tc_opts_invalid:OK
  #260     tc_opts_mixed:OK
  #261     tc_opts_prepend:OK
  #262     tc_opts_replace:OK
  #263     tc_opts_revision:OK
  [...]
  Summary: 44/38 PASSED, 0 SKIPPED, 0 FAILED

Signed-off-by: Daniel Borkmann <[email protected]>
Link: https://lore.kernel.org/r/8699efc284b75ccdc51ddf7062fa2370330dc6c0.1692029283.git.daniel@iogearbox.net
Signed-off-by: Martin KaFai Lau <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants