Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: refactor logic for identifying connections #898

Merged
merged 3 commits into from
Apr 25, 2020

Conversation

Stebalien
Copy link
Member

@Stebalien Stebalien commented Apr 25, 2020

The DHT now assumes that the protocols will be set in the peerstore once the identify event has fired. However, if we ended up identifying the peer multiple times (multiple calls to Connect at the same time, multiple connections, etc.), we'd end up unsetting the protocols temporarily.

This change:

  1. Avoids ever unsetting the protocols, ever.
  2. Avoids running the identify protocol on the same connection multiple times, ever.
  3. Always waits for identify to complete (or fail) before using a connection.

Chang 2 should not cause any performance degradation:

  1. Before this change, if the connection had not yet been identified, we'd wait at least one round-trip per new stream to negotiate the protocol.
  2. After the change, we now wait exactly one round trip to complete the identify handshake.

(so it's actually faster when we're trying to open a new stream with multiple possible protocol choices)

@Stebalien Stebalien force-pushed the fix/set-protocols-race branch 2 times, most recently from 42ddaed to 56e3fc2 Compare April 25, 2020 01:51
Stebalien and others added 2 commits April 24, 2020 19:05
0. NEVER call `peerstore.SetProtocols(p)` (clear the protocol set). Given the
   new identify events, if someone looked in the peerstore at the wrong time, they
   could decide that the peer no longer speaks some protocol.
1. Reliably wait for identify before trying to open a stream. The old logic was
   _really_ racy.
2. Avoids potentially calling identify on the same connection multiple times.
3. Calls identify as early as possible. Previously, we'd invoke identify on
   inbound connections using an event that was only invoked _after_ all `Connected`
   event handlers completed. Now we invoke identify from a `Connected` handler.
@Stebalien Stebalien force-pushed the fix/set-protocols-race branch from 56e3fc2 to 3d676b6 Compare April 25, 2020 02:05
@@ -605,20 +586,13 @@ func (h *BasicHost) dialPeer(ctx context.Context, p peer.ID) error {
return err
}

// Clear protocols on connecting to new peer to avoid issues caused
// by misremembering protocols between reconnects
h.Peerstore().SetProtocols(p)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug:

This could run after identify has completed but before it has signaled that it has completed. Then below, we'd try to run identify, not run it (because it looks like it's running), and eventually return from this function with no protocols set.

All this would take is two calls to Connect at the same time which actually has a pretty high chance of happening in the new DHT code.

// identify the connection before returning.
done := make(chan struct{})
go func() {
h.ids.IdentifyConn(c)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: This could end up running multiple times, forcing us to do a bunch of unnecessary work.

@@ -199,6 +199,12 @@ func TestHostProtoPreference(t *testing.T) {
t.Fatal(err)
}

// force the lazy negotiation to complete
_, err = s.Write(nil)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is now necessary because, unlike before, we now wait for identify to complete. This shouldn't slow anything down as we'd have to wait one round trip either way. Really, it should be faster because identify will wait exactly one round-trip while a protocol negotiation could take multiple.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what are the semantics of writing nil on streams?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIUC, nothing gets written over the wire, this just triggers everything down to that point.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this relies on the specific behaviour of the msmux lazy stream to not nop on zero length writes


// wait for reciever to see the conn.
for i := 0; i < 10 && len(n3.Conns()) == 0; i++ {
time.Sleep(time.Duration(10*i) * time.Millisecond)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

timing changes

// IdentifyWait triggers an identify (if the connection has not already been
// identified) and returns a channel that is closed when the identify protocol
// completes.
func (ids *IDService) IdentifyWait(c network.Conn) <-chan struct{} {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The name of this function is now a bit funky but this is the least-breaking way I could do it. Before, IdentifyWait would never trigger an identify operation. Now it does to be consistent everywhere.

@@ -280,21 +314,14 @@ func (ids *IDService) broadcast(proto protocol.ID, payloadWriter func(s network.
go func(p peer.ID, conns []network.Conn) {
defer wg.Done()

// if we're in the process of identifying the connection, let's wait.
// we don't use ids.IdentifyWait() to avoid unnecessary channel creation.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is no longer a problem. If the connection has not been identified, we'll identify it. Otherwise, the channel will already exist.

Copy link
Contributor

@willscott willscott left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't check how through existing test coverage is. hopefully enough to have some confidence things won't end up in odd states. I follow the logic itself, and it looks like an improvement.

Holding the extra mapping of all identified connections may eventually be something worth de-duplicating, since we can implicitly learn that if we have protocols stored for the peer as well, right?

@@ -199,6 +199,12 @@ func TestHostProtoPreference(t *testing.T) {
t.Fatal(err)
}

// force the lazy negotiation to complete
_, err = s.Write(nil)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what are the semantics of writing nil on streams?

// identified) and returns a channel that is closed when the identify protocol
// completes.
func (ids *IDService) IdentifyWait(c network.Conn) <-chan struct{} {
ids.connsMu.RLock()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the reason for two sets of locking here (when you could really just get away with one) is for perf?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. We call this for every stream. It may not be critical, but I'd rather not find out later and this was already a rw lock.

Copy link
Contributor

@whyrusleeping whyrusleeping left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alright, LGTM. That must have been fun to debug

@Stebalien
Copy link
Member Author

what are the semantics of writing nil on streams?

It's just an empty write. Really, it's a hack. The write way to do this is to either:

  1. Write a message.
  2. Read a message.

However, we're doing something funny here were we don't actually want to send/receive any data on the stream.

@Stebalien
Copy link
Member Author

Holding the extra mapping of all identified connections may eventually be something worth de-duplicating, since we can implicitly learn that if we have protocols stored for the peer as well, right?

Right now identify is a bit funny. It's per-connection but the state is per-peer. We'll have to reconcile that at some point.

@Stebalien Stebalien merged commit af58b80 into master Apr 25, 2020

// respect don contexteone
// TODO: Consider removing this? On one hand, it's nice because we can
// assume that things like the agent version are usually set when this
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Stebalien Didn't we discuss that it's nice to block on Identify for outgoing connections so we can then directly check the peerstore for supported remote protocols, etc. ?

Why this TODO ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did say that. I left this TODO because we may want to consider not doing that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants