call Stream.Reset instead of Stream.Close #76

Stebalien · 2019-05-08T22:21:29Z

May fix ipfs/kubo#6237

Basically,

We hang while closing a stream (because Close waits). Also, Closing a stream should cancel pending writes. go-mplex#9
This blocks the connection manager because it assumes that close doesn't wait.

This may also fix a stream leak.

vyzo · 2019-05-08T22:26:00Z

tests fail...

Stebalien · 2019-05-08T22:26:13Z

Yeah, we expect a nice EOF.

vyzo · 2019-05-08T22:26:36Z

we need the Shutdown call already.

Stebalien · 2019-05-08T22:28:20Z

We need Close to mean "close both ways" (with a sane default timeout), Reset to mean "throw everything away", and then yeah, CloseWrite and CloseRead.

vyzo · 2019-05-09T08:10:26Z

The test failure seems insurmountable, we'll need to change the stream API for this to work :(

Stebalien · 2019-05-09T08:41:21Z

Not necessarily. The test failure is due to the assumption that conn.Close() flushes. However, conn.Close() actually resets in our TCP transport as well (for efficiency) so this isn't really an issue.

Stebalien · 2019-05-09T08:42:02Z

But I agree this sucks.

hsanjuan · 2019-05-09T09:23:34Z

conn.go

@@ -28,29 +29,53 @@ func (n *NetAddr) String() string {
 	return fmt.Sprintf("relay[%s-%s]", n.Remote, n.Relay)
 }

+func (c *Conn) Close() error {
+	return c.stream.Reset()


Maybe this is relevant here. This is how I ended up doing it in gostream :

https://github.com/hsanjuan/go-libp2p-gostream/blob/master/conn.go#L38

// Close closes the connection. // Any blocked Read or Write operations will be unblocked and return errors. func (c *conn) Close() error { if err := c.s.Close(); err != nil { c.s.Reset() return err } go pnet.AwaitEOF(c.s) return nil }

If I remember well, bluntly resetting libp2p streams on Close() caused errors on the other side on situations where the stream closing was supposed to be a clean operation.

Yeah, but we need to be careful:

Concurrent reads aren't safe. We need to interrupt any concurrent reads (somehow) before we can do this. We should be able to do this by setting a read deadline but... we'll have to make sure to fix any/all of our multiplexers/transports (deadlines currently don't interrupt).

We probably want to set some timeouts/deadlines.

We should probably call Close() in the background as well. That will mimic a normal close(tcpFileDescriptor) call (which will let the kernel flush in the background).

IMO, the real issue here is that TCPConn.Close is actually a very fancy best-effort function that does a bunch of dirty work in the background. Ideally, even if we make Close close both directions, we wouldn't be that sloppy.

Stebalien · 2019-05-21T05:06:52Z

@vyzo, this is getting really critical and both the TCP and the QUIC transports reset the underlying connection on close without flushing anything (I specifically modified the TCP transport to do this to avoid unnecessary work). This bug likely just crashed our gateways and has been causing problems for a ton of people.

The only other solution I can think of is to:

Close the stream. We'll have to fix any bugs we have where closing streams doesn't necessarily interrupt any in-progress writes (an issue for both yamux and mplex).
Set a read deadline on the stream. Make deadlines interrupt and coalesce writes whyrusleeping/yamux#28 fixes read deadlines for yamux but they're still broken in mplex.
Wait for the current reader to exit (read lock).
Read on the stream waiting for an EOF or a timeout.
On timeout, reset.
Do all this in a goroutine. so we don't stall anything.

But that requires fixing multiple annoying bugs. I've started on this path (with the yamux fix) but it feels like a waste given that we don't actually make any guarantees about Close flushing.

vyzo · 2019-05-21T08:42:26Z

Sigh... If it has been identified as the critical bug, then we got to merge it.

vyzo

let's fix the test and merge it as it is a critical bug.

Stebalien · 2019-05-21T16:17:14Z

Sigh... If it has been identified as the critical bug, then we got to merge it.

Well, Close is stalling. I'll try shipping a patch to some of our partners to make sure this is the bug.

vyzo · 2019-05-22T13:28:40Z

This needs a rebase for the go mod debacle (go.sum conflicts)

May fix ipfs/kubo#6237 Basically, 1. We hang while closing a stream (because `Close` waits). 2. This blocks the connection manager because it assumes that close _doesn't_ wait. This may also fix a stream leak.

vyzo · 2019-05-22T13:31:30Z

rebased and resolved the conflict; should be ready for merge.

vyzo · 2019-05-22T13:57:37Z

we also need the companion in #77, so that the connection manager doesn't kill the underlying hop relay connections when they have stop streams.

raulk

Initial feedback; still reviewing.

conn.go

raulk

Previous status should've been "request changes".

fixed bug identfied, the wrapping is not an issue.

Tag the hop relay when creating stop streams

ghost assigned Stebalien May 8, 2019

ghost added the status/in-progress In progress label May 8, 2019

vyzo self-requested a review May 8, 2019 22:24

hsanjuan reviewed May 9, 2019

View reviewed changes

Stebalien mentioned this pull request May 21, 2019

Too many open files (regression?) ipfs/kubo#6237

Closed

vyzo reviewed May 21, 2019

View reviewed changes

Stebalien mentioned this pull request May 21, 2019

dep: update go-libp2p ipfs/kubo#6361

Merged

vyzo approved these changes May 21, 2019

View reviewed changes

Stebalien added 2 commits May 22, 2019 16:30

call Stream.Reset instead of Stream.Close

e972c1f

May fix ipfs/kubo#6237 Basically, 1. We hang while closing a stream (because `Close` waits). 2. This blocks the connection manager because it assumes that close _doesn't_ wait. This may also fix a stream leak.

fix tests for reset on close

4a96ae7

vyzo force-pushed the fix/stream-close branch from 46b4e95 to 4a96ae7 Compare May 22, 2019 13:30

vyzo mentioned this pull request May 22, 2019

Tag the hop relay when creating stop streams #77

Merged

vyzo mentioned this pull request May 22, 2019

Production go-ipfs going bananas open-services/open-registry#39

Open

raulk reviewed May 22, 2019

View reviewed changes

conn.go Outdated Show resolved Hide resolved

conn.go Show resolved Hide resolved

vyzo reviewed May 22, 2019

View reviewed changes

conn.go Outdated Show resolved Hide resolved

raulk previously requested changes May 22, 2019

View reviewed changes

fix bug in conn.SetWriteDeadline

1804298

tag hop relays for active stop connections

2ec9f71

vyzo and others added 3 commits May 22, 2019 19:55

add comments for tagging logic

623ae60

store the host instead of the Relay instance in Conn

5495b30

Merge pull request #77 from libp2p/feat/stop-tags

776794b

Tag the hop relay when creating stop streams

Stebalien merged commit 1f1395c into master May 22, 2019

Stebalien deleted the fix/stream-close branch May 22, 2019 17:43

vyzo mentioned this pull request Oct 23, 2019

Interop tests for circuit reconnect fail #82

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

call Stream.Reset instead of Stream.Close #76

call Stream.Reset instead of Stream.Close #76

Stebalien commented May 8, 2019 •

edited

Loading

vyzo commented May 8, 2019

Stebalien commented May 8, 2019

vyzo commented May 8, 2019

Stebalien commented May 8, 2019

vyzo commented May 9, 2019

Stebalien commented May 9, 2019

Stebalien commented May 9, 2019

hsanjuan May 9, 2019

Stebalien May 9, 2019

Stebalien commented May 21, 2019 •

edited

Loading

vyzo commented May 21, 2019

vyzo left a comment

Stebalien commented May 21, 2019

vyzo commented May 22, 2019

vyzo commented May 22, 2019

vyzo commented May 22, 2019

raulk left a comment

raulk left a comment

call Stream.Reset instead of Stream.Close #76

call Stream.Reset instead of Stream.Close #76

Conversation

Stebalien commented May 8, 2019 • edited Loading

vyzo commented May 8, 2019

Stebalien commented May 8, 2019

vyzo commented May 8, 2019

Stebalien commented May 8, 2019

vyzo commented May 9, 2019

Stebalien commented May 9, 2019

Stebalien commented May 9, 2019

hsanjuan May 9, 2019

Choose a reason for hiding this comment

Stebalien May 9, 2019

Choose a reason for hiding this comment

Stebalien commented May 21, 2019 • edited Loading

vyzo commented May 21, 2019

vyzo left a comment

Choose a reason for hiding this comment

Stebalien commented May 21, 2019

vyzo commented May 22, 2019

vyzo commented May 22, 2019

vyzo commented May 22, 2019

raulk left a comment

Choose a reason for hiding this comment

raulk left a comment

Choose a reason for hiding this comment

Stebalien commented May 8, 2019 •

edited

Loading

Stebalien commented May 21, 2019 •

edited

Loading