-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(mdns): allow setting an initial delay before sending queries to discover peers. #3323
feat(mdns): allow setting an initial delay before sending queries to discover peers. #3323
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd appreciate if we don't add another configuration option for this. I consider #3319 to be a bug that we should fix.
This PR only adds a configuration that in its default value does not fix the bug.
Have you measured, what such an initial delay would have to be set to to be effective? I'd be happy to merge a PR that adds a fixed 500ms delay like identify as long as we don't expose it as a config option.
On our test platform, 500ms does the job (this is the value we use) but this highly depend on the way your dockers are connected and init (in our case we manage the network connections between our dockers by hands, thus it take some time to become ready). Identify has 500ms as default value and a method on Config to configure the initial_delay. I can modify the PR to hardcode the 500ms and remove the configuration part, or i can also use 500ms as default value (to fix the issue for some case) and have the configuration token available to fix other cases where 500ms is not enough. What do you prefer ? |
It would be good to get @mxinden's opinion. My personal preference would be to not expose another configuration option. I also consider the initial delay in identify a hack if it was added for the same reason. In an ideal world, I think we should fix the actual bug which IMO is because we unconditionally wait for the delay and don't care about no / failure responses. |
Sure, lets wait for mxinden's opinion |
did you get any update from @mxinden ? |
I vaguely remember that we briefly discussed briefly and I think we are on the same page. Our components should do the right thing in their default configuration in ideally every situation and not just when they are configured in a certain way. Not increasing the public API but making the implementation more resilient to a corner case like this is a low-friction change that I am happy to merge without further input as it doesn't have any downsides. Let me know if you want to work on this! |
To summarize, are you fine with
If so i can re-work the PR and submit an updated version. |
Uhm no, that is not quite what I meant. I don't think an initial delay is the appropriate solution. It doesn't address the actual problem: not having received a response for a request that we sent out. mDNS is multicast though and based on UDP, so a response is not necessarily guaranteed. Additionally, we could also just be in a network where there are no other mDNS devices. Thus I think an appropriate solution would be: Instead of sending out requests with a fixed interval build a state machine that:
Essentially, what this does is search for devices at an increased rate until we either find some or we consider there to be no devices. This should be a lot more resilient to any form of problems in the network. Let me know what you think of this solution. |
Agreed.
That sounds reasonable to me. Maybe also worth checking out whether go-libp2p has a solution for this issue. https://github.com/libp2p/go-libp2p/blob/master/p2p/discovery/mdns/mdns.go |
Go implementation uses zeroconf library which does this for the mdns client part :
Do you want we use the same values for the rust state automata ? |
Ok we (stormshield) will work on this new solution, you can drop this PR we will submit a new one with the updated code. It may take some time as we have other work to do internally. |
This pull request has merge conflicts. Could you please resolve them @stormshield-damiend? 🙏 |
Obsoleted by #3975. |
Peer discovery with mDNS can be very slow, particularly if the first mDNS query is lost. This patch resolves it by adjusting the timer with an adaptive initial interval. We start with a very short timer (500 ms). If a peer is discovered before the end of the timer, then the timer is reset to the normal query interval value (300s), otherwise the timer's value is multiplied by 2 until it reaches the normal query interval value. Related: #3323. Resolves: #3319. Pull-Request: #3975.
Description
feat(mdns): allow setting an initial delay before sending queries to discover peers (#3319)
Notes
In a similar way to identify behaviour, manage an initial delay before the first mdns query is sent on an interface.
In a docker environment, the first mdns query can be lost if the interface is not ready yet. As the interval query is by default rather large (5 minutes), using an initial delay before the first mdns query is sent would improve the speed of peers discovery.
Links to any relevant issues
Fixes #3319
Open Questions
I have kept the original behaviour, the initial delay is set by default to 0, perhaps this could be set to another default value like for example 500ms as done for identify.
Change checklist