-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: refactor logic for identifying connections #898
Conversation
42ddaed
to
56e3fc2
Compare
0. NEVER call `peerstore.SetProtocols(p)` (clear the protocol set). Given the new identify events, if someone looked in the peerstore at the wrong time, they could decide that the peer no longer speaks some protocol. 1. Reliably wait for identify before trying to open a stream. The old logic was _really_ racy. 2. Avoids potentially calling identify on the same connection multiple times. 3. Calls identify as early as possible. Previously, we'd invoke identify on inbound connections using an event that was only invoked _after_ all `Connected` event handlers completed. Now we invoke identify from a `Connected` handler.
56e3fc2
to
3d676b6
Compare
@@ -605,20 +586,13 @@ func (h *BasicHost) dialPeer(ctx context.Context, p peer.ID) error { | |||
return err | |||
} | |||
|
|||
// Clear protocols on connecting to new peer to avoid issues caused | |||
// by misremembering protocols between reconnects | |||
h.Peerstore().SetProtocols(p) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bug:
This could run after identify has completed but before it has signaled that it has completed. Then below, we'd try to run identify, not run it (because it looks like it's running), and eventually return from this function with no protocols set.
All this would take is two calls to Connect at the same time which actually has a pretty high chance of happening in the new DHT code.
// identify the connection before returning. | ||
done := make(chan struct{}) | ||
go func() { | ||
h.ids.IdentifyConn(c) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bug: This could end up running multiple times, forcing us to do a bunch of unnecessary work.
@@ -199,6 +199,12 @@ func TestHostProtoPreference(t *testing.T) { | |||
t.Fatal(err) | |||
} | |||
|
|||
// force the lazy negotiation to complete | |||
_, err = s.Write(nil) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is now necessary because, unlike before, we now wait for identify to complete. This shouldn't slow anything down as we'd have to wait one round trip either way. Really, it should be faster because identify will wait exactly one round-trip while a protocol negotiation could take multiple.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what are the semantics of writing nil
on streams?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIUC, nothing gets written over the wire, this just triggers everything down to that point.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this relies on the specific behaviour of the msmux lazy stream to not nop on zero length writes
|
||
// wait for reciever to see the conn. | ||
for i := 0; i < 10 && len(n3.Conns()) == 0; i++ { | ||
time.Sleep(time.Duration(10*i) * time.Millisecond) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
timing changes
// IdentifyWait triggers an identify (if the connection has not already been | ||
// identified) and returns a channel that is closed when the identify protocol | ||
// completes. | ||
func (ids *IDService) IdentifyWait(c network.Conn) <-chan struct{} { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The name of this function is now a bit funky but this is the least-breaking way I could do it. Before, IdentifyWait
would never trigger an identify operation. Now it does to be consistent everywhere.
@@ -280,21 +314,14 @@ func (ids *IDService) broadcast(proto protocol.ID, payloadWriter func(s network. | |||
go func(p peer.ID, conns []network.Conn) { | |||
defer wg.Done() | |||
|
|||
// if we're in the process of identifying the connection, let's wait. | |||
// we don't use ids.IdentifyWait() to avoid unnecessary channel creation. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is no longer a problem. If the connection has not been identified, we'll identify it. Otherwise, the channel will already exist.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't check how through existing test coverage is. hopefully enough to have some confidence things won't end up in odd states. I follow the logic itself, and it looks like an improvement.
Holding the extra mapping of all identified connections may eventually be something worth de-duplicating, since we can implicitly learn that if we have protocols stored for the peer as well, right?
@@ -199,6 +199,12 @@ func TestHostProtoPreference(t *testing.T) { | |||
t.Fatal(err) | |||
} | |||
|
|||
// force the lazy negotiation to complete | |||
_, err = s.Write(nil) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what are the semantics of writing nil
on streams?
// identified) and returns a channel that is closed when the identify protocol | ||
// completes. | ||
func (ids *IDService) IdentifyWait(c network.Conn) <-chan struct{} { | ||
ids.connsMu.RLock() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the reason for two sets of locking here (when you could really just get away with one) is for perf?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. We call this for every stream. It may not be critical, but I'd rather not find out later and this was already a rw lock.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alright, LGTM. That must have been fun to debug
It's just an empty write. Really, it's a hack. The write way to do this is to either:
However, we're doing something funny here were we don't actually want to send/receive any data on the stream. |
Right now identify is a bit funny. It's per-connection but the state is per-peer. We'll have to reconcile that at some point. |
|
||
// respect don contexteone | ||
// TODO: Consider removing this? On one hand, it's nice because we can | ||
// assume that things like the agent version are usually set when this |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Stebalien Didn't we discuss that it's nice to block on Identify
for outgoing connections so we can then directly check the peerstore for supported remote protocols, etc. ?
Why this TODO ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did say that. I left this TODO because we may want to consider not doing that.
The DHT now assumes that the protocols will be set in the peerstore once the identify event has fired. However, if we ended up identifying the peer multiple times (multiple calls to Connect at the same time, multiple connections, etc.), we'd end up unsetting the protocols temporarily.
This change:
Chang 2 should not cause any performance degradation:
(so it's actually faster when we're trying to open a new stream with multiple possible protocol choices)