-
Notifications
You must be signed in to change notification settings - Fork 882
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
agressive disconnects on holesky #6842
Comments
Possibly another heuristic that matters? When you have low peers and the peer you're evaluating is useful in discovery. I say that because for every bootnode, there's seemingly 10 Nodecrawlers. Which may or may not be useful in discovery. So only having a low water mark might fill those up with crawlers? |
🤔 the timeout might also be hit faster than expected. When syncing we're basically waiting on a good Geth node. Geth because...In the context of #6805 trying to resync with Using Fast sync will drop all Geth connections:
Using Snap sync will drop Nethermind:
Note the timestamps here. Edit: also seen a few peers dropped because they were still syncing.
|
I wonder if the culprit is:
I wonder if it's even reasonable to expect this account range to be served within 5 seconds? Perhaps if we make these ranges shorter we can both resolve the timeouts and effectively throttle these requests more by waiting on outstanding requests? |
It's probably not just Instead I found there tends to be a disconnect message before with a different reason, and these pending requests fail. Being attributed as timeouts. |
Possibly fixed by #6877 |
So far seeing better success with https://github.com/hyperledger/besu/releases/tag/24.4.0-RC2 on holesky, not disconnecting the bootnodes on new instance syncing from 0. I tried restarting it halfway through (see gap in bandwidth use) just to try and cause peer counts to drop, but never dropped below 2. |
reasons for disconnects (inbound and outbound) with holesky bootnodes (running #6609 built from source)
Don't think there's much we can do about the TCP_SUBSYSTEM error but the incoming SUBPROTOCOL_TRIGGERED may be worth investigating |
Further analysis from 2 sets of 3 holesky nodes
and some relevant trace logging (from 6609-00 since I didn't have TRACE enabled on 6609-90 but I think it should be the same)
|
and then I started 6609-00, 6609-01, 6609-02
|
since they both had zero peers, on 6609-00 and 6609-02 I forcibly added (using admin_addPeer) the 2 holesky bootnodes as peers. So now besu will try to reconnect even if we drop them - and we still are dropping them. But now the reason is useless responses. 6609-00
6609-90 also has 0 peers but will leave it going overnight to see if it discovers some |
This issue is stale because it has been open for 6 months with no activity. |
This issue was closed because it has been inactive for 14 days since being marked as stale. |
holesky has 2 bootnodes and besu is disconnecting them perhaps too aggressively
timeout threshold is 3
and useless response threshold is 5
The text was updated successfully, but these errors were encountered: