-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[server]: Start Peers Asynchronously #1658
[server]: Start Peers Asynchronously #1658
Conversation
Running this on the faucet now! |
c2b8d35
to
921112f
Compare
Added commit to bump peer write timeout to 50s, and rebased on master |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
utACK
server.go
Outdated
@@ -1719,10 +1719,10 @@ func (s *server) findPeerByPubStr(pubStr string) (*peer, error) { | |||
// the cleanup routine to exit early. | |||
// | |||
// NOTE: This MUST be launched as a goroutine. | |||
func (s *server) peerTerminationWatcher(p *peer) { | |||
func (s *server) peerTerminationWatcher(p *peer, ready chan struct{}) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ready
should be explained in the godoc
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed!
|
||
// Otherwise, signal to the peerTerminationWatcher that the peer startup | ||
// was successful, and to begin watching the peer's wait group. | ||
close(ready) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the reason for doing this here, and not as the last thing in this method?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It could be done later, as at this point even a Disconnect would unblock the select in WaitForDisconnect. I chose to put it here for clarity, and to stop selecting as soon as possible to avoid futex inflation
90f5fb2
to
b90365f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM ⚡️
2965f67
to
329a69c
Compare
Can be rebased now that the two dependent PR's have been merged. |
This commit adds additional synchronization logic to WaitForDisconnect, such that it can be spawned before Start has been executed by the server. Without modification, the current version will return immediately since no goroutines will have been spawned. To solve this, we modify WaitForDisconnect to block until: 1) the peer is disconnected, 2) the peer is successfully started, before watching the waitgroup. In the first case, the waitgroup will block until all (if any) spawned goroutines have exited. Otherwise, if the Start is successful, we can switch to watching the waitgroup, knowing that waitgroup counter is positive.
This commit adds asynchronous starting of peers, in order to avoid potential DOS vectors. Currently, we block with the server's mutex while peers exchange Init messages and perform other setup. Thus, a remote peer that does not reply with an init message will cause server to block for 15s per attempt. We also modify the startup behavior to spawn peerTerminationWatchers before starting the peer itself, ensuring that a peer is properly cleaned up if the initialization fails. Currently, failing to start a peer does not execute the bulk of the teardown logic, since it is not spawned until after a successful Start occurs.
Sometimes when performing an initial sync, the remote node isn't able to pull messages off the wire because of long running tasks and queues are saturated. With a shorter write timeout, we will give up trying to send messages and teardown the connection, even though the peer is still active.
329a69c
to
d4d9097
Compare
rebased and 💚 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM 🚜
// call to Start returns no error. Otherwise, if the peer fails to start, | ||
// calling Disconnect will signal the quit channel and the method will not | ||
// block, since no goroutines were spawned. | ||
func (p *peer) WaitForDisconnect(ready chan struct{}) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
This PR adds asynchronous starting of peers,
in order to avoid potential DOS vectors. Currently,
we block with the server's mutex while peers exchange
Init messages and perform other setup. Thus, a remote
peer that does not reply with an init message will
cause server to block for 15s per attempt.
We also modify the startup behavior to spawn
peerTerminationWatchers before starting the
peer itself, ensuring that a peer is properly
cleaned up if the initialization fails. Currently,
failing to start a peer does not execute the bulk
of the teardown logic, since it is not spawned
until after a successful Start occurs.
The final commit is purely just a code move to place
the relevant methods in closer proximity, and
organize them roughly in the expected execution
order.
Prerequesities: