Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix QUIC buffer warning for DSN. #2195

Merged
merged 3 commits into from
Nov 6, 2023
Merged

Fix QUIC buffer warning for DSN. #2195

merged 3 commits into from
Nov 6, 2023

Conversation

shamil-gadelshin
Copy link
Contributor

This PR addresses the warning from quinn library used in libp2p::quic protocol that appears in DSN apps.

2023-11-02T14:51:57.888170Z  WARN quinn_udp: sendmsg error: Os { code: 10040, kind: Uncategorized, message: "A message sent on a datagram socket was larger than the internal message buffer or some other network limit, or the buffer used to receive a datagram into was smaller than the datagram itself." }, Transmit: { destination: 199.175.98.73:30533, src_ip: None, enc: None, len: 1326, segment_size: None }

The message consistently appears on Windows and is the result of the "path MTU discovery" process in this environment: quic-go/quic-go#3276

The PR introduces a fork for the libp2p framework from v.0.52.3. We'll likely push this fix upstream in the near future. The fix adds a configuration option for libp2p::quic that propagates to the underlying quinn library, it disables it for Windows and leaves the default configuration for other OS.

The PR also contain fixes for failed test after changing the swarm-test dependency: introducing different delay mechanism for reserved-peers protocol closer to connected-peers protocol.

The alternative implementation might introduce a CLI argument for enabling/disabling "path MTU discovery".

Code contributor checklist:

@shamil-gadelshin shamil-gadelshin added the networking Subspace networking (DSN) label Nov 3, 2023
@shamil-gadelshin shamil-gadelshin self-assigned this Nov 3, 2023
Copy link
Member

@nazar-pc nazar-pc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see Golang libp2p maintainers smply don't care about Windows too much and no one volunteered to fix it, but we do want ro fiz it. Otherwise we may have connectivity issues on Windows with VPN and such. Please create upstream Rust libp2p issue about this.

Also please explain the fix for reserved peers, I don't think I understand where the issue was.

@shamil-gadelshin
Copy link
Contributor Author

shamil-gadelshin commented Nov 3, 2023

Please create upstream Rust libp2p issue about this.

There is an issue related to the problem: libp2p/rust-libp2p#4686
I will address it when I have time after the current fixes.

Also please explain the fix for reserved peers, I don't think I understand where the issue was.

As you remember, libp2p2 didn't use wakers in all the protocols but they commented that it was an error and they would disable such behavior. I'm not sure what changed but it seemed that without wakers one of the tests became flaky after I switched to our branch and the original v.0.52.3 didn't show such problems (I double-checked).

@nazar-pc
Copy link
Member

nazar-pc commented Nov 3, 2023

As you remember, libp2p2 didn't use wakers in all the protocols but they commented that it was an error and they would disable such behavior. I'm not sure what changed but it seemed that without wakers one of the tests became flaky after I switched to our branch and the original v.0.52.3 didn't show such problems (I double-checked).

I thought it might have been an alternative fix for #2074. Why timeout changes then?

@shamil-gadelshin
Copy link
Contributor Author

I thought it might have been an alternative fix for #2074. Why timeout changes then?

Different protocol (it's reserved-peers instead of connected peers), however, I have Windows machine fully set now, and I'm going to try to reproduce it.

Timeout changed because it stopped working with 1 sec likely because of the new algorithm we should have set higher value anyway because of the default delay interval.

@nazar-pc
Copy link
Member

nazar-pc commented Nov 5, 2023

Timeout changed because it stopped working with 1 sec likely because of the new algorithm we should have set higher value anyway because of the default delay interval.

Doesn't sound very reliable and feels like it could break again, but if it works it is fine I guess

@shamil-gadelshin shamil-gadelshin added this pull request to the merge queue Nov 6, 2023
Merged via the queue into main with commit 0309f7b Nov 6, 2023
@shamil-gadelshin shamil-gadelshin deleted the quic-buffer-fix branch November 6, 2023 06:47
mergify bot pushed a commit to libp2p/rust-libp2p that referenced this pull request Nov 12, 2023
Path MTU discovery seems to be broken in QUIC right now on Windows and we had to disable it in Subspace.

Related: autonomys/subspace#2195.

Pull-Request: #4823.

Co-authored-by: Shamil Gadelshin <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
networking Subspace networking (DSN)
Projects
Development

Successfully merging this pull request may close these issues.

2 participants