fix: refactor logic for identifying connections #898

Stebalien · 2020-04-25T01:46:32Z

The DHT now assumes that the protocols will be set in the peerstore once the identify event has fired. However, if we ended up identifying the peer multiple times (multiple calls to Connect at the same time, multiple connections, etc.), we'd end up unsetting the protocols temporarily.

This change:

Avoids ever unsetting the protocols, ever.
Avoids running the identify protocol on the same connection multiple times, ever.
Always waits for identify to complete (or fail) before using a connection.

Chang 2 should not cause any performance degradation:

Before this change, if the connection had not yet been identified, we'd wait at least one round-trip per new stream to negotiate the protocol.
After the change, we now wait exactly one round trip to complete the identify handshake.

(so it's actually faster when we're trying to open a new stream with multiple possible protocol choices)

0. NEVER call `peerstore.SetProtocols(p)` (clear the protocol set). Given the new identify events, if someone looked in the peerstore at the wrong time, they could decide that the peer no longer speaks some protocol. 1. Reliably wait for identify before trying to open a stream. The old logic was _really_ racy. 2. Avoids potentially calling identify on the same connection multiple times. 3. Calls identify as early as possible. Previously, we'd invoke identify on inbound connections using an event that was only invoked _after_ all `Connected` event handlers completed. Now we invoke identify from a `Connected` handler.

Stebalien · 2020-04-25T01:56:20Z

p2p/host/basic/basic_host.go

@@ -605,20 +586,13 @@ func (h *BasicHost) dialPeer(ctx context.Context, p peer.ID) error {
 		return err
 	}

-	// Clear protocols on connecting to new peer to avoid issues caused
-	// by misremembering protocols between reconnects
-	h.Peerstore().SetProtocols(p)


Bug:

This could run after identify has completed but before it has signaled that it has completed. Then below, we'd try to run identify, not run it (because it looks like it's running), and eventually return from this function with no protocols set.

All this would take is two calls to Connect at the same time which actually has a pretty high chance of happening in the new DHT code.

Stebalien · 2020-04-25T01:59:25Z

p2p/host/basic/basic_host.go

-	// identify the connection before returning.
-	done := make(chan struct{})
-	go func() {
-		h.ids.IdentifyConn(c)


Bug: This could end up running multiple times, forcing us to do a bunch of unnecessary work.

Stebalien · 2020-04-25T02:00:31Z

p2p/host/basic/basic_host_test.go

@@ -199,6 +199,12 @@ func TestHostProtoPreference(t *testing.T) {
 		t.Fatal(err)
 	}

+	// force the lazy negotiation to complete
+	_, err = s.Write(nil)


This is now necessary because, unlike before, we now wait for identify to complete. This shouldn't slow anything down as we'd have to wait one round trip either way. Really, it should be faster because identify will wait exactly one round-trip while a protocol negotiation could take multiple.

what are the semantics of writing nil on streams?

IIUC, nothing gets written over the wire, this just triggers everything down to that point.

this relies on the specific behaviour of the msmux lazy stream to not nop on zero length writes

Stebalien · 2020-04-25T02:01:13Z

p2p/net/mock/mock_test.go

+
+	// wait for reciever to see the conn.
+	for i := 0; i < 10 && len(n3.Conns()) == 0; i++ {
+		time.Sleep(time.Duration(10*i) * time.Millisecond)


timing changes

Stebalien · 2020-04-25T02:07:29Z

p2p/protocol/identify/id.go

+// IdentifyWait triggers an identify (if the connection has not already been
+// identified) and returns a channel that is closed when the identify protocol
+// completes.
+func (ids *IDService) IdentifyWait(c network.Conn) <-chan struct{} {


The name of this function is now a bit funky but this is the least-breaking way I could do it. Before, IdentifyWait would never trigger an identify operation. Now it does to be consistent everywhere.

Stebalien · 2020-04-25T02:09:08Z

p2p/protocol/identify/id.go

@@ -280,21 +314,14 @@ func (ids *IDService) broadcast(proto protocol.ID, payloadWriter func(s network.
 		go func(p peer.ID, conns []network.Conn) {
 			defer wg.Done()

-			// if we're in the process of identifying the connection, let's wait.
-			// we don't use ids.IdentifyWait() to avoid unnecessary channel creation.


This is no longer a problem. If the connection has not been identified, we'll identify it. Otherwise, the channel will already exist.

willscott

I didn't check how through existing test coverage is. hopefully enough to have some confidence things won't end up in odd states. I follow the logic itself, and it looks like an improvement.

Holding the extra mapping of all identified connections may eventually be something worth de-duplicating, since we can implicitly learn that if we have protocols stored for the peer as well, right?

willscott · 2020-04-25T02:40:30Z

p2p/host/basic/basic_host_test.go

@@ -199,6 +199,12 @@ func TestHostProtoPreference(t *testing.T) {
 		t.Fatal(err)
 	}

+	// force the lazy negotiation to complete
+	_, err = s.Write(nil)


what are the semantics of writing nil on streams?

whyrusleeping · 2020-04-25T02:59:46Z

p2p/protocol/identify/id.go

+// identified) and returns a channel that is closed when the identify protocol
+// completes.
+func (ids *IDService) IdentifyWait(c network.Conn) <-chan struct{} {
+	ids.connsMu.RLock()


the reason for two sets of locking here (when you could really just get away with one) is for perf?

Yes. We call this for every stream. It may not be critical, but I'd rather not find out later and this was already a rw lock.

whyrusleeping

Alright, LGTM. That must have been fun to debug

Stebalien · 2020-04-25T03:07:44Z

what are the semantics of writing nil on streams?

It's just an empty write. Really, it's a hack. The write way to do this is to either:

Write a message.
Read a message.

However, we're doing something funny here were we don't actually want to send/receive any data on the stream.

Stebalien · 2020-04-25T03:08:19Z

Holding the extra mapping of all identified connections may eventually be something worth de-duplicating, since we can implicitly learn that if we have protocols stored for the peer as well, right?

Right now identify is a bit funny. It's per-connection but the state is per-peer. We'll have to reconcile that at some point.

aarshkshah1992 · 2020-04-27T04:46:24Z

p2p/host/basic/basic_host.go

-
-	// respect don contexteone
+	// TODO: Consider removing this? On one hand, it's nice because we can
+	// assume that things like the agent version are usually set when this


@Stebalien Didn't we discuss that it's nice to block on Identify for outgoing connections so we can then directly check the peerstore for supported remote protocols, etc. ?

Why this TODO ?

I did say that. I left this TODO because we may want to consider not doing that.

Stebalien force-pushed the fix/set-protocols-race branch 2 times, most recently from 42ddaed to 56e3fc2 Compare April 25, 2020 01:51

Stebalien and others added 2 commits April 24, 2020 19:05

skip test local addr filtering

3d676b6

Stebalien force-pushed the fix/set-protocols-race branch from 56e3fc2 to 3d676b6 Compare April 25, 2020 02:05

Stebalien commented Apr 25, 2020

View reviewed changes

Stebalien mentioned this pull request Apr 25, 2020

fix: re-validate peers whenever their state changes libp2p/go-libp2p-kad-dht#607

Merged

test: fix a flaky test

d3b7b3b

willscott approved these changes Apr 25, 2020

View reviewed changes

whyrusleeping reviewed Apr 25, 2020

View reviewed changes

whyrusleeping approved these changes Apr 25, 2020

View reviewed changes

Stebalien merged commit af58b80 into master Apr 25, 2020

aarshkshah1992 reviewed Apr 27, 2020

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: refactor logic for identifying connections #898

fix: refactor logic for identifying connections #898

Stebalien commented Apr 25, 2020 •

edited

Loading

Stebalien Apr 25, 2020

Stebalien Apr 25, 2020

Stebalien Apr 25, 2020

willscott Apr 25, 2020

whyrusleeping Apr 25, 2020

whyrusleeping Apr 25, 2020

Stebalien Apr 25, 2020

Stebalien Apr 25, 2020

Stebalien Apr 25, 2020

willscott left a comment

willscott Apr 25, 2020

whyrusleeping Apr 25, 2020

Stebalien Apr 25, 2020

whyrusleeping left a comment

Stebalien commented Apr 25, 2020

Stebalien commented Apr 25, 2020

aarshkshah1992 Apr 27, 2020

Stebalien Apr 27, 2020

fix: refactor logic for identifying connections #898

fix: refactor logic for identifying connections #898

Conversation

Stebalien commented Apr 25, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

willscott left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

whyrusleeping left a comment

Choose a reason for hiding this comment

Stebalien commented Apr 25, 2020

Stebalien commented Apr 25, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Stebalien commented Apr 25, 2020 •

edited

Loading